Template Guide for the Vocabulary Mapping Process

Introduction

To map multiple vocabularies, some strategies may be useful to minimize the overall amount of time required and produce the most information. This document presents some suggested strategies for performing the mapping process.

Overview

All the domains have access to multiple common vocabularies, and most have many domain-specific vocabularies identified. While it is theoretically possible to map every term in each vocabulary to terms in all the other vocabularies, the inference capabilities of these ontologies makes simpler approaches equally effective.

The main options to keep in mind are:

  • performing many-to-many mapping until a dominant vocabulary is identified;
  • identifying a single reference vocabulary in advance; and
  • identifying multiple reference vocabularies (or sections from each),

Most groups during the initial vocabulary mapping workshop preferred to use one or two primary vocabularies (on the "right" side of the VINE GUI) and map other vocabularies to those (using similar levels). It is also important to include only disjoint concepts in the primary vocabulary(ies). It may also be appropriate to map terms within a single, primary vocabulary by choosing that vocabulary on both sides of the GUI. It is also important to try to use vocabularies with term definitions available.

Another factor to consider is the vocabulary type. Typically, a Parameter Discovery Vocabulary (PDV), also called a keyword vocabulary, is important for general searches and categorization of terms. A Parameter Usage Vocabulary (PUV), also known as a markup or item vocabulary, is important to be able to find specific terms or items.

Guidance

Each mapping team should discuss these approaches at least briefly before beginning the mapping process, and again as needed during the workshop. Different approaches will be appropriate for different teams, and possibly at different times in the workshop.

We recommend reviewing all the vocabularies at least briefly at the beginning of the mapping, and determining at least one likely reference vocabulary in the two vocabulary categories above, PDVs and PUVs. These can be a starting point for mapping terms with other vocabularies.

The process of mapping may then be used to refine the selection of reference vocabularies, if the participants recognize that certain vocabularies are more important than had been perceived. For this reason, it may be worth going through an initial mapping with some of the most significant vocabularies, to appreciate their value.

Once the key vocabularies have been mapped, at least in part, the group may choose to add other vocabularies. Because multiple vocabularies can be searched and displayed for mapping simultaneously, it can be very efficient to map a lot of vocabularies at one time. Each group should decide the appropriate time to begin mapping more than a few vocabularies.

Multiple vocabularies may be used simultaneously as references by mutual agreement. Each vocabulary can provide a different contribution to the overall reference, so that as a group works in different sub-domains, they can easily select different vocabularies as references. A major advantage of an ontological framework is that it supports working with multiple vocabularies in this way.

Creating a Standard Vocabulary

Many of the groups would like to see a standard vocabulary emerge from the workshop. Although it is not meant to be the primary goal for the mapping, it can be derived, based on the mapping work, using multiple techniques. (Of course, it can also be negotiated directly without any mapping process, but that option is not considered here. :->)

One way to create a standard vocabulary is to identify the best name within each set of equivalent terms. By identifying this term as the ‘standard’ (for example, by mapping it to itself and adding the comment “Standard”), a list of terms can be built up.

A second technique would be to identify the standard vocabulary or vocabularies, and use those as the de facto standard names for terms that they contain. In addition to judgment of the users, another technique to identify potential standards is to identify which vocabularies have the most, or the greatest percentage, of their (domain-appropriate) terms mapped. Vocabularies which have a lot of useful terms may prove most valuable as the standard in that domain.

Finally, some combination of the above may be used, together with the creation of new terms when none of the existing terms are sufficient. In this case, the new terms would presumably go into a new vocabulary, which can then be incorporated into the mapping tool with the others considered for the domain.

However a standard vocabulary is created, if it is identified in a consistent way in the ontology, it is then possible to create a tool which can find those standards and, if desired, create a new document containing all the standard terms (and, of course, pointing to the sources from which the terms were derived).