Mapping Among Controlled Vocabularies

Reasons for Mapping Controlled Vocabularies

Mapping provides long-term advantages that help improve the utility and longevity of a project’s data.

Understanding the motivations behind mapping and the tools that are available to associate the metadata elements in one project with the terms in another vocabulary will assist a metadata manager in making good decisions for a project. Motivations for mapping may include the following and are described below:

  • Creation of a simple term translator for terms in related communities.
  • Automated translation between vocabularies created in different general standards.
  • Education of users and expansion of vocabularies used in specialized communities.
  • Enhancement of searchability for a project’s data and metadata.
  • Clarification of relationships within a project’s own vocabulary.
  • Documentation of a project’s vocabulary for posterity.
  • Validation and improvement of a project’s vocabulary.
  • Long-term interoperability with the Semantic Web.

Creation of a simple term translator for related communities

A vocabulary mapping can be very helpful to the user in understanding how the terms relate to each other. This situation occurs with data and related publications from different practitioners in the same domain as well as with material from different scientific areas, such as physical oceanography and atmospheric science. Having a simple presentation of the relationship among the terms in the two vocabularies can bridge this gap significantly. The mapping serves as a thesaurus or small dictionary for the vocabularies.

Automated translation between vocabularies created in different general standards

A more rigorous subset of the previous motivation for mapping is the need for an explicit translation of metadata records. In some cases, one metadata standard, such as ISO 19115, may use one vocabulary, while another, such as FGDC’s Content Standard for Digital Geospatial Metadata, uses different terms for the same thing. Someone may want to have metadata in ISO 19115 for a set of data files that were documented in the CSDGM standard. Creating them by hand is time consuming. A carefully built vocabulary mapping provides a tool that can automatically translate between the two languages. Note that just as in language translation, some information is likely to be lost in the process, because vocabularies rarely have perfect matches for each other's terms.

Education of users and expansion of vocabularies used in specialized communities

For practitioners in a community, the data manager can build a set of explicit relationships as a bridge from their terms to those in the rest of the science world. The mappings can enhance advertisement of their data to other communities, serve as an educational tool that will improve the entry of metadata, and expose the community’s users to other vocabularies and standards. The community, in turn, can assist in verifying the mappings as they modify their own practices.

Enhancement of searchability for a project’s data and metadata

Vocabulary mappings make it possible to add an arbitrary number of relationships between the terms used for searches and the ones that are used to label and discover the data. This enables more comprehensive and accurate search results. For example, users should be able to find similar data of interest whether they search on "sea surface temperature," "SST," or "water temperature" in the same way that an online shopper might expect to find stores that sell diamonds, whether they searched on "diamonds" or "gems."

Conversely, a community may wish to provide an interface that uses formal terminology, like the Integrated Ocean Observing System’s controlled vocabulary, but not incorporate the keywords from that vocabulary directly within its own data files. Mapping allows users to move easily from one set of terms to the other.

Clarification of relationships within a project’s own vocabulary

A data manager may wish to document relationships between terms within the project’s own vocabulary. Mapping provides a method to indicate which terms overlap with or totally include others—which are synonyms, or which are opposites. These important details assist users working with the project vocabulary and enable their systems to work more effectively with project data.

Documentation of a project’s vocabulary for posterity

Mapping the terms of a vocabulary to other vocabularies provides critical context, explaining how the vocabulary fits into the larger domain. A good mapping can provide a permanent record of the meaning of many terms in the vocabulary in words that other people and computers can understand.

Validation and improvement of project vocabulary

By mapping a project’s vocabulary to others, especially to a vocabulary that is more rigorous or more standardized, data managers will come to understand the strengths and weaknesses of both vocabularies involved. This kind of evaluation, paired with sharing of results can improve a project’s vocabulary and help make both vocabularies part of a community knowledge-building process.

Long-term interoperability with the Semantic Web

The most far-reaching advantage of vocabulary mapping may not produce immediate value for users of a particular data set or search program but may have eventual benefits in the operation of the Semantic Web in the future. The Semantic Web is a model proposed by the W3C that involves the automated interaction of large numbers of computers in ways that understand the meaning of the words that are used. Explicit descriptions of the relationships between terms will allow the Semantic Web to provide far more effective and advanced services related to the data described by those vocabularies.

Methods for Mapping Vocabularies

There are many technologies that can be used to map vocabulary terms. From simplest to most advanced, these include the following techniques:

  • Tables: these can be in ASCII files, MS Word tables, or MS Excel spreadsheets. A simple synonym table will list terms that agree, while a relationship table will include the terms from each vocabulary and the relationship between them.
  • Resource Description Framework (RDF) files: this format supports simple relationship definitions, using triples of subject, predicate or relationship, and object, to describe the connection between terms in different vocabularies. Note that all three parts of the triple correspond to a well-defined term. The relationships in the Simple Knowledge Organization System, or SKOS, are often used with RDF files.
  • Ontologies: these are often represented in the Web Ontology Language (OWL) format. Ontologies allow description of more complex relationships, in particular, supporting axioms about the relationships between classes of individuals. Ontologies support a more formal language than SKOS.
  • Concept maps: these are typically less formally developed than OWL or RDF relationships but also have the subject-predicate-object triple as a basis. In concept maps, which can be in the CMAP format, relationships (predicates) are often not part of a controlled vocabulary themselves. Concept maps are often used to quickly document knowledge about a domain but not to create a formal description of that knowledge that can be used for computer reasoning or inference. That is, concept maps focus not so much on relating vocabularies, as on describing the context of a domain; thus, they are not directly helpful in the type of mapping that is the subject of this guide.

For a simple mapping project, for example between two relatively structured vocabularies, an MS Excel or similar table is a manageable and, in the short term, the fastest approach (since many people are already familiar with Microsoft Office).

As a mapping project becomes more complex, tables quickly become insufficient. The problem of data entry becomes less important than the problems of finding matches, matching multiple terms simultaneously, and presenting the results in a standard format that can help verify and advance the mapping project goals.

For example, imagine a scenario in which four vocabularies have overlapping and often similar terms. The goal is to map all the terms in three of the vocabularies to a fourth, reference vocabulary, then produce a set of all the inferred relationships that result. Searching for relationships among the terms and assigning them in the map is extremely time-consuming, and obtaining fast results becomes difficult.

Preferred Technical Approach: OWL and RDF-SKOS Files

The following approaches assume that the project under consideration requires mapping and not the construction of a new vocabulary or ontology. If the mapping project will be complex or persistent (referenced for 12 months or longer), then for long-term maintainability and standardization, it should begin with either the OWL or RDF-SKOS technologies. Although each has some start-up costs to learn the technology, they both provide more advanced and reusable results and more useful tools that can quickly recoup the up-front costs.

RDF-SKOS is being adopted for many vocabulary mapping projects in which simple relationships and associations are the principle objective. The mapping relationships that have been developed in SKOS are useful for such projects, although quite complex sets of mappings among vocabularies can be developed.

On the other hand, OWL enables complex statements about many aspects of relationships between vocabulary terms and can be used in a more formal framework. It is the appropriate tool when the vocabularies themselves are already in OWL and a rich semantic framework is likely to be of value in the future. OWL may also be the more appropriate tool if many different types of relationships are anticipated or if one or more vocabularies are planned to be the basis of an upper ontology.

A fairly straightforward discussion of the differences between OWL and SKOS can be found in section 1.3 of the W3C SKOS reference with additional context elsewhere in that document.

Tool Support

There are many simple and sophisticated tools for working with mappings and ontologies. For the relatively simple task of producing vocabulary mappings, a simple approach may be desirable. If you have a vocabulary list in the form of a text file, the first step will be to convert your list into an RDF or OWL file. The open source MMI tools Voc2OWL and Voc2RDF can perform this transformation fairly quickly.

Once your files are in RDF or OWL, the vocabulary mapping tool VINE—also an open source tool from MMI—is available to perform mappings. This tool finds similar terms and easily maps them.

Among more sophisticated tools, Protégé is a free, full-featured and widely used choice and has collaboration features that are helpful in developing domain ontologies for a community. The comprehensive list of ontology tools on the MMI site describes more options.

Community Challenges

In addition to technical challenges, there are cultural and attitudinal challenges associated with mapping vocabularies and caused by misconceptions in the science community:

  • Lack of immediate need or value leading to unwillingness to participate
  • Perception that mapping is too difficult or that processes make it needlessly complicated
  • Busy PIs who are too busy to participate in mapping processes and think that the job should be done by developers
  • Belief that standards and terms from other projects are not relevant to an individual PI’s project

Creating effective vocabulary mappings is time-consuming and technically challenging. We strongly encourage people who need to produce vocabulary mappings to engage key members of their scientific community to participate in a focused effort.

Preferred Community Approach: Vocabulary Mapping Workshops

To produce a robust mapping, at least several domain science experts will be needed for a day or two, often on several occasions. To overcome the cultural challenges and to address the goal of producing effective mappings, community adoption of vocabulary mapping workshops provides a practical approach. The MMI project has held such workshops and developed a comprehensive template for others to hold them. All are welcome to adopt this template for their own communities, although we do suggest you contact us if you plan to use it, and we request appropriate credit during your workshop. The MMI project welcomes feedback regarding the use and improvement of the template and tools.

Suggested Citation

Graybeal, J. 2011. "Mapping Among Controlled Vocabularies." In The MMI Guides: Navigating the World of Marine Metadata. http://marinemetadata.org/guides/vocabs/cvchooseimplement/cvmap. Accessed December 7, 2019.