Achieving Semantic Interoperability

Background

Semantic interoperability exists when different systems can make effective use of the terms that are used in an interaction. For example, a meteorology model may use the general term “temperature” for air temperature and provide a more specific term for water temperature. This may work fine with another meteorological system, but it will cause confusion when interacting with an ocean model where the more generic “temperature” means water temperature and “air temperature” is more specifically named.

When two people communicate verbally in the same language, a lot of redundant information is available for confirming assumptions and refining understanding. Facial expressions, tone of voice, repetition using different words, gestures, actions, and physical objects all guide the participants' understanding.

However, computers typically do not transmit redundant information, and they have an extremely limited set of communication protocols to rely on when terms do not produce expected results. Therefore, they require precise correspondence of terms.

Just as related businesses must achieve semantic interoperability across computer systems, scientific software increasingly demands semantic interoperability. As data and metadata move from manufactured devices into observatories through local repositories, post-processing algorithms, and national archives and clearinghouses, confusion about the meaning of terms makes information increasingly difficult to find and use. The passage of time between the original experiment and the analyses that may take place years or decades later contributes to the loss of usability for data.

Semantic interoperability may never be perfectly seamless and automatic, but with proper data stewardship it can be nearly so for most systems. At a minimum, the originators of data can make sure their data will remain usable by scientists and educators for many decades and across all science disciplines.

Core Concepts of Semantic Interoperability

For people speaking different languages, dictionaries, phrase books, and translation systems are used to provide some minimal level of semantic interoperability (that is, communication). Similar concepts apply in computer science, where metadata dictionaries, controlled vocabularies, ontologies, and a standards-based semantic framework correspond to the tools of human language translation. The sections below describe how these tools fit together and what steps data managers need to take.

Standards, specifications, and vocabularies

First, a content standard or specification provides a format to describe data and metadata. Ideally, the specification requires the structure of the data to be fully and precisely described, so that computers can automatically parse the data into its original components. A good content standard will define all the fields and terms that it uses; for instance, the standard would make clear whether "Data Originator" means the principal investigator, the device operator, the institution paying for the experiment, or the device itself.

As a particular description is filled out according to the specification, individual elements must be filled in with data or words. When an element is filled in with specific text like keywords or codes, good specifications describe what terms can be used to fill in the element. The terms are defined in the controlled vocabulary or dictionary. (Alternatively, the specification may tell the preparer how to specify which vocabulary was used to fill in the element.) In either case, a computer should be able to automatically look up each term and its meaning if it knows how to interpret descriptions that follow the specification.

Mappings

If a computer can find only a free-text definition for a term, it may be impossible for it to recognize that a term in one dataset has the same definition as a term in another data set. For example, if looking up "tropo" gives the definition "tropospheric region of the earth's atmosphere" in one data set, it may not equate it with ”troposphere” in another data set. A mapping must exist between the vocabulary terms that are used to describe the data set and other vocabulary terms used to describe other data sets. With such a mapping of terms, a computer can evaluate all the controlled vocabulary fields in the metadata, such as quality control flags, units, science domains, and topic keywords. This ability to connect concepts across data sets and data systems is what ultimately creates semantic interoperability.

Semantic framework

A semantic framework provides an infrastructure that can use mappings, and the information associated with them, to solve real-world issues for science data management. A consistent semantic framework will include specifications for how to refer to a specific term from a specific vocabulary; how to create and understand mappings from one specific term to one or more other terms--possibly in another vocabulary--and how to build software that uses these services to give the users what they really want. So, if a user types in ”troposphere” as a search term, the developer who used a good semantic framework will have built functionality into the software that translates that term to all the other terms that have been mapped to it. The search will return results to the user that could not be found without the semantic framework.

Planning for Semantic Interoperability

Defining data and metadata structures

The first steps of semantic interoperability depend on a foundation of good data practices that define in a standard way how data and metadata are structured. The time to do this is when data is created, because that is when it is easiest to understand how the data are organized. Metadata are used to describe the structure of data, and typically, a project will select a content standard to organize its metadata. Other guides in this series describe this process.

Creating understandable variable names

Once data and metadata structures have been defined, the next steps focus on describing the data in a semantically interoperable way. Where the structural information might define three variables in ASCII format separated by tabs, semantic interoperability demands that the three variables be named and that some correspondence exists between the chosen variable names and the names that other people and computers recognize. Making names understandable can take several forms:

  • Single vocabulary: the easiest method in some situations is to choose a vocabulary that can describe all of your variables, and declare that all the element names in a project will be specified using that vocabulary. For example, the COARDS Climate and Forecast Standard Variables is a common standard.
  • Multiple vocabularies: some projects may need to specify terms from multiple vocabularies, using a syntax that includes the vocabulary name.
  • Local names: project-specific names can be clarified by specifying a relationship, or vocabulary mapping, that people can use to relate the project’s element names to more common terms in another vocabulary. Similarly, mappings can connect terms to multiple vocabularies by using a standard framework that can reference each vocabulary and term. Even if no term from another vocabulary fits exactly, relationships between a local term and one in another vocabulary can be described (for example, Term A is "narrower than" Term B).

Not all of these solutions are appropriate for every scenario, but they will provide a starting strategy for most projects. In most cases, the initial cost of planning for semantic interoperability will be only a small part of the total metadata planning process, and the return on investment will be evident. As more projects include interoperability as part of their metadata plan, future modifications and improvements to individual projects will become simpler.

Suggested Citation

Graybeal, J. 2011. "Achieving Semantic Interoperability." In The MMI Guides: Navigating the World of Marine Metadata. http://marinemetadata.org/guides/vocabs/cvchooseimplement/cvsemint. Accessed December 11, 2019.