Achieving Semantic Interoperability

Introduction to semantic interoperability, including definitions, core concepts, and steps toward semantic interoperability

This guide is written at an Intermediate level, and assumes some familiarity with most metadataData about data. Metadata provides a context for research findings, ideally in a machine-readable format. It enables discovery of data via an electronic interface, and correct use and attribution of findings. Related Guide concepts.

Background

Semantic interoperabilityThe ability of multiple systems to exchange information in useful ways; in particular, the ability for each system to 'understand' the terms of the other sufficiently to use those terms correctly. Related Guide exists when different systems - in our case, computer systems, interacting with each other or with people - can interact and make effective use of the terms that are used in the interaction. For example, a meteorology model may use 'temperature' for air temperature, and provide a more complete term for 'water temperature'. This may work fine with another meteorological system, but will cause confusion when interacting with an ocean model where 'temperature' means water temperature and 'air temperature' is the alternative.

The same concept exists in human interactions, with the most obvious lack of semantic interoperability occurring when people speak different languages. The key difference in creating semantic interoperability for people and computers is the amount of precision that is needed.

When two people try to communicate, a lot of redundant information is available for confirming assumptions and refining understanding. Facial expressions, tone of voice, repetition using different words, gestures, actions, and physical objects themselves guide the participants' understanding. Also, many terms can be (and are) approximately translated - for many purposes, 'friend' and 'colleague' can substitute for each other.

However, computers typically do not transmit redundant information, do require precise correspondence of terms, and have an extremely limited set of communication protocolsA strategy for transmitting data between systems. A protocol can be used not only over the internet, between computers, but also between applications running anywhere. Examples: FTP, SNMP, SSH. to fall back on when terms do not produce expected results. Thus, creating semantic interoperability among computer systems requires significantly more attention to detail than creating it among people.

Just as related businesses must achieve semantic interoperability across all their computer systems, scientific software increasingly demands semantic interoperability. As data and metadata go from device manufacturers, into observatories, through local repositories, post-processing algorithms, and national archives and clearinghouses, confusion about the meaning of the terms will make the information increasingly difficult to find or use. The passage of time between the original experiment, and the analysis that increasingly often take place years or decades later, only increases the loss of usability.

Core Concepts

Semantic interoperability may never be perfectly seamless and automatic, but with proper data stewardship it can be nearly so for most systems. At a minimum, the originators of data can make sure their data will remain usable by scientists and educators for many decades, and across all science disciplines.

For people speaking different languages, dictionariesIn the context of metadata, a dictionary is a type of controlled flat vocabulary, which provides a list of metadata terms, definitions and additional information within a specific domain. Related Guide, phrase books, and translation systems are used to provide some minimal level of semantic interoperability (i.e., communication). Similar concepts apply in computer science, with metadata dictionaries (controlled vocabulariesA managed list of terms. In the context of vocabularies, management typically includes careful selection of terms, maintenance of terms over time (i.e. addition, deprecation, modification), and presentation of the vocabulary in an accessible format. Related Guide), ontologiesA type of relational controlled vocabulary, which provides for categories, relationships, rules and axioms among metadata elements. Typically a hierarchy of classes and terms, an ontology is a machine-readable way of relating metadata terminology. Related Guide, and a standards-based semantic frameworkA semantic framework guides a specific development to make use of computer-interpretable programming languages, such as XML, to create systems which promote and allow semantic interoperability. Both semantic interoperability and the Semantic Web rely on the backbone of a semantic framework. May also refer to the Marine Metadata Interoperability's own Semantic Framework. Related Guide corresponding to the human-oriented tools. This guide describes how these pieces fit together, and what steps you will need to take to ensure your data fits into the puzzle successfully.

First, a content standardA list or hierarchy of required metadata elements to be included in the metadata description. Related Guide or specification provides a format and specification to describe data and metadata. Ideally the specification requires the structure of the data to be fully and precisely described, so that computers can automatically parse the data into its original components. A good content standard will define all the fields and terms that it uses, making clear whether 'Data Originator' means the principal investigator, the device operator, the institution paying for the experiment, or the device itself.

As a particular description is filled out according to the specification, individual elements must be filled in with data or words. Again, where an element is filled in with specific text like keywords or codes, good specificationsAny description of how to store metadata. Specifications have no limitations on the level of required documentation and no requirement for formal approval, publishing or governance by a broad community-based organization. Related Guide describe what terms can be used to fill in the element - the 'controlled vocabulary' or dictionary to use when filling out that element. (Alternatively, the specification may tell the preparer how to specify which vocabulary was used to fill in the element.) In any case, a computer should be able to automatically look up each term and its meaning if it knows how to interpret descriptions that follow the specification.

But if all the computer can find is a free-text definition for the term - for example, looking up "tropo" gives the definition "tropospheric region of the earth's atmosphere" - it may be impossible for the computer to recognize that 'tropo' in this data set is the same thing as 'troposphere' in another data set. A mapping must exist between the vocabulary termsA potential metadata value that is part of a set intended to restrict the available options in a particular metadata element. that are used to describe the data set, and other vocabulary terms used to describe other data sets. With such a mapping of terms, the computer can make informed judgments about all the controlled vocabulary fields in the metadata, such as quality control flags, units, science domains, and topic keywords. Building on the first two components, this ability to connect concepts across data sets and data systems is what finally creates semantic interoperability.

A semantic framework provides an infrastructure that can use these mappings, and the information associated with them, to solve real-world science data management issues. A consistent semantic framework will include specifications for how to refer to a specific term from a specific vocabulary; how to create and understand mappings from one specific term to one or more other terms, possibly in another vocabulary; and how to build software that uses these services to give the user what he or she really wants. So if the user types in 'troposphere' as a search term to a web interface, the developer who uses a good semantic framework can build in the software tools that translate that term to all the other terms that have been mapped to it, and so the web interface will return resources to the user that it could never have found without the semantic framework.

Steps Toward Semantic Interoperability

The first steps of semantic interoperability depends on a foundation of good data practice, principally defining (in a standard way) the way your data and metadata are structured. The time to do this is when you create the data, because that is when you best understand how the data are organized. Other guides in this series describe this process.

Once your data and metadata structures have been defined - typically by using a content standard to organize your metadata, and metadata to describe your data's structure - then you can focus on describing the data in a semantically interoperable way. Where the structural information might say you have 3 variables in ASCIIAmerican Standard Code for Information Interchange format separated by tabs, semantic interoperability demands that you name those variables, and that some correspondence exist between your variable names and the names that other people - and computers - recognize.

Making your names understandable can take several forms. The easiest in some situations is to choose a vocabulary that can describe all of your variables - for example, the COARDS Climate and Forecast Standard Variables is one with extensive coverage - and declare that all your names will be specified using that vocabulary.

Other options are available for other situations. Need to specify terms from multiple vocabularies? You can specify the names using a syntax that includes the vocabulary name. Need to use your own local names? You can specify a relationshipConnections between metadata terms within a vocabulary. These relationships can connect terms by scope, provenance, or other well-defined criteria., or vocabulary mappingDocuments that map metadata terms between different controlled vocabularies. Related Guide, that people can use to relate your names to more common terms in another vocabulary. You need to use multiple vocabularies? Such mappings can connect your terms to many different vocabularies, by using a standard framework that can reference the vocabulary and the term. Even if no term from another vocabulary fits exactly, you can describe the relationship between your term and one in another vocabulary (e.g., my term is "narrower than" the other term).

Not all of these solutions are totally in place for every scenario, but most of the pieces are in place, and groups like MMIMarine Metadata Interoperability can help guide you in more advanced cases. As more projects need semantic interoperability, and implement the approaches that have been created, you can be sure that your investment will either produce fairly widespread interoperability, or make it trivial to do so as missing pieces of the infrastructure develop. In most cases, the initial cost will be only a small increment above the work you will be doing anyway, and the return on your investment will be evident early and often.

Suggested Citation

2009. "Achieving Semantic Interoperability." In The MMI Guides: Navigating the World of Marine Metadata. http://marinemetadata.org/guides/vocabs/cvchooseimplement/cvsemint. Accessed April 20, 2014.