What is a Controlled Vocabulary?

A vocabulary is a set of terms (words, codes, etc.) that are used in a specific community. Vocabularies provide a mechanism for communication- be it written, oral or electronic- because the meaning of the terms are known and agreed upon by the community members. When a vocabulary is formally managed, it becomes a controlled vocabulary. In this case, "managed" means the terms are stored and maintained using agreed-upon procedures. Procedures should exist for adding terms, modifying terms and, more rarely, deprecating terms from a controlled vocabulary.

A controlled vocabulary is a collection of terms that are:

  • Accepted: The term must adhere to community practices.
  • Defined: The terms are precisely characterized. Typically, this means the terms have rigorous definitions.
  • Managed: In general, there will be a body of experts that create and maintain the controlled vocabulary. The controlled vocabulary maintenance will involve periodic review, addition of new terms, modification of terms, and occasionally deprecation of terms.

Notice, this definition does not specify a particular scope of usage. Controlled vocabularies could be developed for a local project (like the Scripps Institution of Oceanography Geological Data Center), a broader community (e.g. OOSTethys), or as a part of a widely used standard or tool (ISO 19115).

Controlled Vocabulary Categories and Types

To many people, the English language is a well-known vocabulary. We have many ways of representing the terms in the English language. For example, if we want to figure out what a specific word means we might consult a glossary; if we want to know the origin of a term we might consult a dictionary; and if we want to know how the term relates to other terms we might consult a thesaurus. We also need to recognize that the meaning of terms may change through time. Generations use terms in different ways (cool in one generation means a low temperature, while cool in another is a positive adjective).

To enable formal management, a controlled vocabulary can be organized in several ways. There are three broad categories of controlled vocabularies: flat, multi-level and relational.

  • Flat controlled vocabularies provide a set of used terms. Some flat controlled vocabularies will provide additional information about each term.
  • Multi-level controlled vocabularies build upon a flat controlled vocabulary by assigning each term to a category.
  • Relational controlled vocabularies provide a set of terms, and capture how they are associated with each other.

Within these three categories, there are additional controlled vocabulary types. The table below summarizes these categories and types. The table categorizes necessary conditions only. Some controlled vocabularies will appear as "hybrids" of one or more categories of controlled vocabularies. Please see the Types of Controlled Vocabularies guide for a more extensive explanation, or this article on Knowledge Organization Systems.

Broad Category Controlled Vocabulary Types Description
Flat Controlled Vocabulary Authority File List of terms
Glossary List of terms and definitions within a specific domain
Dictionary List of terms, definitions, and additional information
Gazetteer List of place names
Code List List of codes (e.g. abbreviations) and definitions
Multi-Level Controlled Vocabulary Taxonomy Terms classified into categories
Subject Heading Terms classified into categories, which may be broad classes
Relational Controlled Vocabulary Thesaurus Set of terms and relationships among individual values
Semantic Network Set of terms/concepts and directed relationships
Ontology Set of terms and relationships among terms, enhanced by additional information provided by rules and axioms.

The Purpose of a Controlled Vocabulary

Controlled vocabularies can serve several different purposes. For example, a controlled vocabulary might help users find data (also known as a "discovery vocabulary"), or assist in the interpretation of data (also known as a "usage vocabulary"). The controlled vocabulary might provide human-understandable meaning (also known as a "semantic vocabulary") or machine-readable format information (also known as a "syntactic vocabulary"). Controlled vocabularies provide these abilities by:

  • establishing the permissible terms to be used;
  • maintaining the proper and agreed-upon spelling of the terms;
  • clarifying terms for those who are new to the community; and
  • eliminating the use of arbitrary terms that can cause inconsistencies and confusion.

Suggested Citation

Neiswender, C. 2009. "What is a Controlled Vocabulary?." In The MMI Guides: Navigating the World of Marine Metadata. http://marinemetadata.org/guides/vocabs/vocdef. Accessed March 6, 2021.