What is a Controlled Vocabulary?
A vocabulary is a set of terms (words, codes, etc.) that are used in a specific community. Vocabularies provide a mechanism for communication- be it written, oral or electronic- because the meaning of the terms are known and agreed upon by the community members. When a vocabulary is formally managed, it becomes a controlled vocabulary
. In this case, "managed" means the terms are stored and maintained using agreed-upon procedures. Procedures should exist for adding terms, modifying terms and, more rarely, deprecating terms from a controlled vocabulary.
A controlled vocabulary is a collection of terms that are:
- Accepted: The term must adhere to community practices.
- Defined: The terms are precisely characterized. Typically, this means the terms have rigorous definitions.
- Managed: In general, there will be a body of experts that create and maintain the controlled vocabulary. The controlled vocabulary maintenance will involve periodic review, addition of new terms, modification of terms, and occasionally deprecation of terms.
Notice, this definition does not specify a particular scope of usage. Controlled vocabularies could be developed for a local project (like the Scripps Institution of Oceanography Geological Data Center), a broader community (e.g. OOSTethys), or as a part of a widely used standard or tool (ISO 19115).
Controlled Vocabulary Categories and Types
To many people, the English language is a well-known vocabulary. We have many ways of representing the terms in the English language. For example, if we want to figure out what a specific word means we might consult a glossary; if we want to know the origin of a term we might consult a dictionary
; and if we want to know how the term relates to other terms we might consult a thesaurus
. We also need to recognize that the meaning of terms may change through time. Generations use terms in different ways (cool in one generation means a low temperature, while cool in another is a positive adjective).
To enable formal management, a controlled vocabulary can be organized in several ways. There are three broad categories of controlled vocabularies: flat, multi-level and relational.
- Flat controlled vocabularies provide a set of used terms. Some flat controlled vocabularies will provide additional information about each term.
- Multi-level controlled vocabularies build upon a flat controlled vocabulary by assigning each term to a category.
- Relational controlled vocabularies provide a set of terms, and capture how they are associated with each other.
Within these three categories, there are additional controlled vocabulary types. The table below summarizes these categories and types. The table categorizes necessary conditions only. Some controlled vocabularies will appear as "hybrids" of one or more categories of controlled vocabularies. Please see the Types of Controlled Vocabularies guide for a more extensive explanation, or this article on Knowledge Organization Systems.
The Purpose of a Controlled Vocabulary
Controlled vocabularies can serve several different purposes. For example, a controlled vocabulary might help users find data (also known as a "discovery vocabulary"), or assist in the interpretation of data (also known as a "usage vocabulary
"). The controlled vocabulary might provide human-understandable meaning (also known as a "semantic vocabulary") or machine-readable
format information (also known as a "syntactic vocabulary"). Controlled vocabularies provide these abilities by:
- establishing the permissible terms to be used;
- maintaining the proper and agreed-upon spelling of the terms;
- clarifying terms for those who are new to the community; and
- eliminating the use of arbitrary terms that can cause inconsistencies and confusion.