What is a Controlled Vocabulary?

Definition of a Controlled Vocabulary

A vocabulary is a set of terms (words, codes, etc.) that are used in a specific community. VocabulariesA set of terms (e.g., words) that are used in a specific community. Related Guide provide a mechanism for communication- be it written, oral or electronic- because the meaning of the terms are known and agreed upon by the community members. When a vocabulary is formally managed, it becomes a controlled vocabulary. In this case, "managed" means the terms are stored and maintained using agreed-upon procedures. Procedures should exist for adding terms, modifying terms and, more rarely, deprecating terms from a controlled vocabulary.

A controlled vocabulary is a collection of terms that are:

  • Accepted: The term must adhere to community practices.
  • Defined: The terms are precisely characterized. Typically, this means the terms have rigorous definitions.
  • Managed: In general, there will be a body of experts that create and maintain the controlled vocabulary. The controlled vocabulary maintenance will involve periodic review, addition of new terms, modification of terms, and occasionally deprecation of terms.

Notice, this definition does not specify a particular scope of usage. Controlled vocabulariesA managed list of terms. In the context of vocabularies, management typically includes careful selection of terms, maintenance of terms over time (i.e. addition, deprecation, modification), and presentation of the vocabulary in an accessible format. Related Guide could be developed for a local project (like the Scripps Institution of Oceanography Geological Data Center), a broader community (e.g. OOSTethys), or as a part of a widely used standardA set of documented rules which define the creation of metadata by providing a combination of terminology (vocabularies), syntactical rules, format rules, and other requirements. Metadata standards are approved, published and governed by a formal body or organization with broad community-based representation (international or national). Related Guide or tool (ISOInternational Standards Organization 19115).

Controlled Vocabulary Categories and Types

To many people, the English language is a well-known vocabulary. We have many ways of representing the terms in the English language. For example, if we want to figure out what a specific word means we might consult a glossaryA type of flat controlled vocabulary containing a list of terms in a particular domain of knowledge with the definitions for those terms. Related Guide; if we want to know the origin of a term we might consult a dictionaryIn the context of metadata, a dictionary is a type of controlled flat vocabulary, which provides a list of metadata terms, definitions and additional information within a specific domain. Related Guide; and if we want to know how the term relates to other terms we might consult a thesaurusA type of relational controlled vocabulary which provides a list of terms, with specific relationships between the terms. Related Guide. We also need to recognize that the meaning of terms may change through time. Generations use terms in different ways (cool in one generation means a low temperature, while cool in another is a positive adjective).

To enable formal management, a controlled vocabulary can be organized in several ways. There are three broad categories of controlled vocabularies: flat, multi-level and relational.

  • Flat controlled vocabularies provide a set of used terms. Some flat controlled vocabularies will provide additional information about each term.
  • Multi-level controlled vocabularies build upon a flat controlled vocabulary by assigning each term to a category.
  • Relational controlled vocabularies provide a set of terms, and capture how they are associated with each other.

Within these three categories, there are additional controlled vocabulary types. The table below summarizes these categories and types. The table categorizes necessary conditions only. Some controlled vocabularies will appear as "hybrids" of one or more categories of controlled vocabularies. Please see the Types of Controlled Vocabularies guide for a more extensive explanation, or this article on Knowledge Organization Systems.

Broad Category Controlled Vocabulary Types Description
Flat Controlled Vocabulary Authority FileA type of flat controlled vocabulary that consists of a list of labels and terms which can be used for establishing the acceptable content, for example a metadata element or database field. Related Guide List of terms
Glossary List of terms and definitions within a specific domain
Dictionary List of terms, definitions, and additional information
GazetteerIn the context of metadata, a gazetteer is a very specific type of flat controlled vocabulary - a geographic term list. Related Guide List of place names
Code ListA type of flat controlled vocabulary consisting of a set of codes and their meanings, in use in a specific project. Related Guide List of codes (e.g. abbreviations) and definitions
Multi-Level Controlled Vocabulary TaxonomyA multi-level controlled vocabulary in which metadata terms are grouped according to subject-specific classes, usually hierarchical. Related Guide Terms classified into categories
Subject HeadingA type of multi-level controlled vocabulary in which metadata values are classified into categories which may be broad classes. Related Guide Terms classified into categories, which may be broad classesGrouping of metadata values, based on shared criteria. Related Guide
Relational Controlled Vocabulary Thesaurus Set of terms and relationshipsConnections between metadata terms within a vocabulary. These relationships can connect terms by scope, provenance, or other well-defined criteria. among individual valuesMetadata values are the content connected to metadata labels in a metadata element. For example, if the metadata label is "date", the metadata value could be "May 13, 2007". Related Guide
Semantic NetworkA type of relational controlled vocabulary consisting of lists of terms/concepts and directed relationships. Related Guide Set of terms/concepts and directed relationships
OntologyA type of relational controlled vocabulary, which provides for categories, relationships, rules and axioms among metadata elements. Typically a hierarchy of classes and terms, an ontology is a machine-readable way of relating metadata terminology. Related Guide Set of terms and relationships among terms, enhanced by additional information provided by rules and axioms.

The Purpose of a Controlled Vocabulary

Controlled vocabularies can serve several different purposes. For example, a controlled vocabulary might help users find data (also known as a "discovery vocabularyUse of metadata values or vocabularies to find metadata or data sets. Related Guide"), or assist in the interpretation of data (also known as a "usage vocabularyThe set of terms used to identify, analyze, or re-use data values in the native form of the data asset. Related Guide"). The controlled vocabulary might provide human-understandable meaning (also known as a "semantic vocabulary") or machine-readableIn the context of metadata, formatted in a way that is well defined and processable by the system's software and hardware. Metadata with this characteristic can be discovered, ingested, and presented by an electronic system (also known as 'computable'). Related Guide format information (also known as a "syntactic vocabulary"). Controlled vocabularies provide these abilities by:

  • establishing the permissible terms to be used;
  • maintaining the proper and agreed-upon spelling of the terms;
  • clarifying terms for those who are new to the community; and
  • eliminating the use of arbitrary terms that can cause inconsistencies and confusion.

Suggested Citation

2009. "What is a Controlled Vocabulary?." In The MMI Guides: Navigating the World of Marine Metadata. http://marinemetadata.org/guides/vocabs/vocdef. Accessed April 23, 2014.