Mapping Among Controlled Vocabularies

If you need to map the terms among different vocabularies -- or you think you might need to do so, but aren't sure -- this guide provides the key information to help you decide what you need to do and how to start doing it.

The Reasons for Mapping Controlled Vocabularies

Why would you want to relate the terms in different controlled vocabularies? (Henceforth in this document, we will use the term 'vocabularies' to refer to controlled vocabularies.) There are a surprising number of applications that are enabled by such mappings, and some longer-term features that will be part of the semantic webThe transformation of the web from an inherently human-interpretable medium to an inherently computer-interpretable medium. In the semantic web, machines can read and understand the content published in the network. of the future. These motivations are outlined below.

If you are already convinced of the value of mapping controlled vocabularies, you may skip to the next section.

You need a simple term translator

If you have a set of terms in a particular field, for example marine habitats, and you come across another set of terms in the same field, there may be very little correspondence between the two sets of terms. Yet it can be very helpful, as you read materials based on the unfamiliar vocabulary, to know how those terms relate to the ones you are already familiar with.

This experience may be even more compelling when the two vocabularies cover the same domain, but were created by different practitioners. For example, the relationship between the terms used by the physical oceanography community, and those used by atmospheric scientists, are not always apparent, and the differences can be confusing.

Just having a simple presentation of the relationship among the terms in the two vocabularies can bridge this gap significantly. It is like having a thesaurusA type of relational controlled vocabulary which provides a list of terms, with specific relationships between the terms. Related Guide or small dictionaryIn the context of metadata, a dictionary is a type of controlled flat vocabulary, which provides a list of metadata terms, definitions and additional information within a specific domain. Related Guide for your vocabularies.

You need to translate metadata records

A more rigorous subset of the previous requirement is the explicit translation of metadata records which use vocabulary termsA potential metadata value that is part of a set intended to restrict the available options in a particular metadata element.. In some cases, one metadata standardA set of documented rules which define the creation of metadata by providing a combination of terminology (vocabularies), syntactical rules, format rules, and other requirements. Metadata standards are approved, published and governed by a formal body or organization with broad community-based representation (international or national). Related Guide like ISOInternational Standards Organization 19115 may use one vocabulary, while another like FGDCFederal Geographic Data Committee 's Content StandardA list or hierarchy of required metadata elements to be included in the metadata description. Related Guide for Digital Geospatial Metadata uses different terms for the same field. Someone may want to have metadata in ISO 19115 for a set of data files that was documented in the CSDGMContent Standard for Digital Geospatial Metadata standard, and creating them by hand is time consuming.

A carefully built vocabulary mapping provides a tool that can automatically translate between the two languages. Note that just like language translations, some information is likely to be lost in the process, because vocabularies rarely have perfect matches for each other's terms.

You want users to work in their own vocabulary (but...)

You may be a data manager for users in a particular specialized or insulated community. You know that there are more terms and vocabularies than those users are aware of, but you despair over teaching them to use the additional resources. (This may be desirable for entering metadata, performing searches for content, or advertising your own data to other communities.)

Rather than teach your users about other vocabularies, you can build a set of explicit relationshipsConnections between metadata terms within a vocabulary. These relationships can connect terms by scope, provenance, or other well-defined criteria. to bridge from their terms to those in the rest of the science world. You can use these mapping relationships quietly, behind the scenes, or advertise them to users as a means of training them (and verifying the mappings).

You want to provide more natural (or more official) search terms

If you are searching on the web for stores that sell diamonds, you are probably also interested in stores that have 'gems' in their name. Similarly, you may want users to be able to enter 'SSTSea Surface Temperature ', 'sea surface temperature', or 'water temperature' and still find similar data of interest.

Vocabulary mappingsDocuments that map metadata terms between different controlled vocabularies. Related Guide make it possible to add an arbitrary number of relationships between the terms your users use, and the ones that are used to labelA descriptor for a metadata value. This can be thought of as a question to which the value is providing an answer. For example, for the metadata label "date", the metadata value could be "March 16, 2008". and discover the data.

In the other direction, you may want to provide an interface that uses formal terminology, like the Integrated Ocean Observing System's controlled vocabulary, but not incorporate the keywords from that vocabulary directly within your own data files. Mapping lets you move easily from one set of terms to the other.

You want to document relationships within your own vocabularies

Mapping can even be used within a vocabulary. By indicating how terms are related—which ones overlap with or totally include others, or are synonyms or opposites—you provide important details for users working with your vocabulary, and enable their systems to work more effectively with your data.

You want to document a vocabulary for posterity

Mapping the terms of a vocabulary to other vocabularies provides critical context, explaining how the vocabulary fits into the larger domain. A good mapping can provide a permanent record of the meaning of many terms in the vocabulary, in terms other people and computers can understand.

You want to validate and improve your own vocabulary

By mapping your vocabulary to others, especially to vocabularies that are more rigorous or more standardized, you will come to understand the strengths and weaknesses of all the vocabularies involved. You can improve your own vocabulary with that knowledge, and if you share your results, you will help improve the other vocabulary as well. This will make both vocabularies part of a community knowledge-building process, benefitting all their users in the future.

The longer term: Supporting Semantic Interoperability on the web

Finally, the biggest value of vocabulary mapping may be visible not in a particular data set or search program, but in the operation of the semantic web in coming decades. The semantic web involves the automated interaction of large numbers of computers in ways that understand the meaning of the words that are used. Explicit descriptions of the relationships between terms allow the semantic web to provide far more effective and advanced services, as it deals with the data described by those vocabularies.

Methods and Madness

If you are familiar with the technologies listed here, you may want to skip to our recommended approach below.

There are many technologies that can be used to map vocabulary terms. From simplest to most advanced, these include:

  • tables, whether in ASCIIAmerican Standard Code for Information Interchange files, Word tables, or Excel spreadsheets: a simple synonym table will just list terms that agree, while a relationship table will include the terms from each vocabulary, and the relationship between them
  • Resource Description Framework (RDFResource Description Framework ) files: this format supports simple relationship definitions, using 'triples' of subject, predicate or relationship, and object, to describe the connection between terms in different vocabularies; note that all 3 parts of the triple correspond to a well defined term. The relationships in the Simple Knowledge Organization System, or SKOSSimple Knowledge Organization System , are often used with RDF files.
  • OntologiesA type of relational controlled vocabulary, which provides for categories, relationships, rules and axioms among metadata elements. Typically a hierarchy of classes and terms, an ontology is a machine-readable way of relating metadata terminology. Related Guide, often represented in the Web Ontology Language (OWL) format: Ontologies allow more complex relationships to be described, in particular supporting axioms about the relationships between classesGrouping of metadata values, based on shared criteria. Related Guide of individuals. It supports a more formal language than SKOS.
  • Concept maps, for example in the CMAP format: Concept maps are typically less formally developed than OWL or RDF relationships, but also have the subject-predicate-object triple as a basis. In concept maps, it is usually the case that the relationships (predicates) are not part of a controlled vocabulary themselves. These are often used to quickly document knowledge about a domain, but not to create a formal description of that knowledge that can be used for computer reasoning or inference.

For a simple mapping project, for example between two relatively structured vocabularies, an Excel or similar table is manageable and, in the short term, the fastest approach (since virtually no time is lost understanding the tool).

As the mapping project becomes more complex, though, tables quickly become insufficient, and the problem of data entry is quickly less important than the problems of finding matches, matching multiple terms simultaneously, and presenting the results in a standard format that can help verify and further the work.

For example, imagine a scenario in which 4 vocabularies have overlapping and often similar terms, and the goal is to map all the terms in 3 of the vocabularies to a fourth, 'reference' vocabulary, then produce a set of all the inferred relationships that result. Searching for relationships among the terms, and quickly assigning them in the map, becomes a dominant time sink; and quickly seeing and using the results is critical.

Concept maps focus not so much on relating vocabularies, as on describing the context of a domain, thus they are not considered further here.

Preferred Technical Approach: OWL and RDF-SKOS Files

Assuming your project requires mapping, and not the construction of a new vocabulary or ontology, certain approaches can be recommended.

If you know the mapping project will be complex or persistent (referenced for 12 months or longer), then for long-term maintainability and standardization, we recommend starting with either the OWL or RDF-SKOS technologies. Although each has some start-up costs to learn the technology, they provide more advanced and reusable results, and more useful tools can quickly recoup the up-front costs.

RDF-SKOS is being adopted for many vocabulary mapping projects, in which simple relationships and associations are the principle objective. The mapping relationships that have been developed in SKOS are useful for such projects, and quite complex sets of mappings among vocabularies can be developed.

On the other hand, OWL enables complex statements about many aspects of the relationships between vocabulary terms, and can be used in a more formal framework. It is the appropriate tool when the vocabularies themselves are already in OWL, and a rich semantic frameworkA semantic framework guides a specific development to make use of computer-interpretable programming languages, such as XML, to create systems which promote and allow semantic interoperability. Both semantic interoperability and the Semantic Web rely on the backbone of a semantic framework. May also refer to the Marine Metadata Interoperability's own Semantic Framework. Related Guide is likely to be of value in the future. OWL may also be the more appropriate tool if many different types of relationships are anticipated, or one or more vocabularies are planned to be the basis of an upper ontology.

A fairly straightforward discussion of the differences between OWL and SKOS can be found in section 1.3 of the W3C SKOS reference, with additional context elsewhere in that document.

Tool Support

There are many sophisticated tools for working with ontologies. Among free tools, Protégé is a full-featured and widely used choice, and is developing collaboration features that will be extremely helpful when developing domain ontologies for a community. You may wish to visit the comprehensive list of ontology tools on the MMI site for more options when working with ontologies.

For the relatively simple task of producing vocabulary mappingsDocuments that map metadata elements between different metadata standards. Related Guide, however, a more simple approach may be desirable.

If you have a vocabulary list but it is just a text file, the first step will be to convert your list into an RDF or OWL file. The open source MMI tools Voc2OWL and Voc2RDF can perform this transformationIn the context of crosswalking, transformation is the process of creating a target instance of the metadata description from the source instance. Related Guide fairly quickly. Procedures for using these tools are provided as part of their documentation, though we continue to improve on that information.

Once your files are in RDF or OWL, the vocabulary mapping tool VINE—also an open source tool from MMI—is available to perform mappings. This tool has been optimized for the purpose of finding similar terms and easily mapping them, and is highly recommended for performing a lot of mappings quickly.

The Community Challenges

In addition to the technical challenges, there are challenges associated with the science community that must be addressed. These can be summarized as participant perspectives, roughly as follows:

  • Don't need it right now myself, so don't ask me to help;
  • This is too hard and I don't see the value of it;
  • I'm too busy to do this, but I can send a software developer to do it;
  • Why are you making this so complicated?; and
  • I know what I need to know, all those other terms and standards aren't important.

Some of these responses are very difficult to overcome. It is time-consuming and technically challenging to create effective vocabulary mappings. For that reason, we strongly encourage people who need to produce vocabulary mappings to obtain the buy-in of key members of their scientific community. At least several domain science experts will be needed, for a day or two on several occasions, to produce a robust mapping.

Preferred Community Approach: Vocabulary Mapping Workshops

To address the goal of producing effective mappings, we encourage community adoption of vocabulary mapping workshops as a practical approach. We have held such workshops and developed a comprehensive template for others to hold them. All are welcome to adopt this template for their own purposes, though we do suggest you contact us if you plan to use it, and we request appropriate credit during your workshop.

Conclusion

Hopefully we have given you a good idea of the context and possible approaches to use in your vocabulary mapping projects. We are eager to hear what you think of this material, and how we could improve it.


Have a specific question about choosing and implementing controlled vocabularies? Ask MMI!

Suggested Citation

2009. "Mapping Among Controlled Vocabularies." In The MMI Guides: Navigating the World of Marine Metadata. http://marinemetadata.org/guides/vocabs/cvchooseimplement/cvmap. Accessed: 03/18/2010