Vocabularies: Dictionaries, Ontologies, and More

Introduction to metadata vocabularies, including definitions, basic examples, and links to additional guidance

Every discipline has its own terminology. Consider terms that are used to describe vertical distances. The word “altitude” refers to the distance of something above a reference point like ground level, such as an airplane in flight. If we were examining a set of blueprints for a building we would not use the word “altitude” to describe the level of the rooftop, even though it is also a vertical distance above ground level. Instead, we would use the word “height.” Similarly, if we were in a boat looking down into the water, we would use “depth” rather than “height” to describe the vertical distance.

Also, a single term may be used in multiple communities but with different connotations. For example, an oceanographer may use the term “altitude,” in the operation of a Remotely Operated Vehicle (ROV), to mean the distance above the ocean floor.

In the context of metadataData about data. Metadata provides a context for research findings, ideally in a machine-readable format. It enables discovery of data via an electronic interface, and correct use and attribution of findings. Related Guide, having multiple terms with the same meaning—and terms that can have different meanings in different contexts—can make it harder for people to find and understand data. Using controlled vocabulariesA managed list of terms. In the context of vocabularies, management typically includes careful selection of terms, maintenance of terms over time (i.e. addition, deprecation, modification), and presentation of the vocabulary in an accessible format. Related Guide within metadata (instead of freely allowing any terminology to be used) can reduce confusion and improve data accessibility.

The Controlled Vocabularies section of the guides describes the importance of controlled vocabularies, the different kinds and their uses, how to implement existing controlled vocabularies, and some considerations for developing new ones.

What is a Controlled Vocabulary?

A vocabulary is a set of terms (words, codes, etc.) that are used in a specific community. In this example, “altitude,” “depth,” and “height” are all part of the vocabularies that scientists and engineers use to talk about vertical distances. It is common for terms to have different connotations in different communities.

A controlled vocabulary is a managed set of terms. The management can take different forms, but in controlled vocabularies the allowed terminology is restricted in some way. Within a metadata standardA set of documented rules which define the creation of metadata by providing a combination of terminology (vocabularies), syntactical rules, format rules, and other requirements. Metadata standards are approved, published and governed by a formal body or organization with broad community-based representation (international or national). Related Guide or specificationAny description of how to store metadata. Specifications have no limitations on the level of required documentation and no requirement for formal approval, publishing or governance by a broad community-based organization. Related Guide, controlled vocabularies are often used to describe the allowed content within a metadata elementIndividual instance of a metadata label and value pair. For example, "creator: John Doe" is a metadata element. Related Guide. This is in contrast to a free-text metadata element. As in the example above, in a free-text element, users may choose to use height, altitude, or depth to describe a dataset containing vertical distances. A controlled vocabulary might limit the user's choices—and ensure consistent use of terminology—by specifying that only the term “depth” be used to describe the distance from the ocean’s surface to the seafloor.

For brevity throughout these guides, when we use the term “vocabulary,” we are usually referring to a controlled vocabulary.

Characteristics of a Good Controlled Vocabulary

At a minimum, a controlled vocabulary only needs to manage a set of terms in some way. However, a good controlled vocabulary—one that is easily understood and applied, is likely to be widely adopted, and which improves the clarity of metadata—is one in which the controlled terms are:

  • Accepted: the term must adhere to community practices.
  • Defined: the terms are precisely characterized; typically, this means the terms have rigorous definitions.
  • Managed: experts create, store, and maintain the controlled vocabulary according to agreed-upon procedures. Maintenance involves periodic review, addition of new terms, modification of terms, and occasionally deprecation of terms.

Note that this definition of a controlled vocabulary does not specify a particular scope of usage. Controlled vocabularies could be developed for a local project, for a broader community, or as part of a widely used standard or tool (ISOInternational Standards Organization 19115).

