Vocabularies: Dictionaries, Ontologies, and More
In many professional communities, the terminology used to communicate is often discipline-specific. For example, consider the use of the word "altitude." Altitude refers to the height of something above a reference point (like ground level), such as an airplane in flight. Although altitude is a common term, if we were examining a set of blueprints for a building we probably would not think of using the word 'altitude' to describe the level of the rooftop. Most likely, we would use the word "height." Similarly, if we were in a boat looking downward into the water, we probably wouldn't use "altitude" or "height" to describe the position of the bottom; rather we would use the word "depth."
The three words-altitude, height, and depth-are all similar in that they represent measures of distance relative to specified levels; but they are all used differently. As well, they can be associated with different communities. In this simple example, those communities include aviation, architecture and oceanography. What's more, the same term might be used differently in a different community, as when oceanographers use "altitude" to mean the distance above the ocean floor (for example, when operating a Remotely Operated Vehicle (ROV).
Various kinds of controlled vocabularies exist, but all can help improve metadata by restricting and better defining the content of metadata entries.
Definition of Controlled Vocabulary
A vocabulary is a set of terms (words, codes, etc.) that are used in a specific community. Vocabularies provide a mechanism for communication- be it written, oral or electronic- because the meaning of the terms are known and agreed upon by the community members. When a vocabulary is formally managed, it becomes a controlled vocabulary. In this case, "managed" means the terms are stored and maintained using agreed-upon procedures. Procedures should exist for adding terms, modifying terms and, more rarely, deprecating terms from a controlled vocabulary.
A controlled vocabulary is a collection of terms that are:
- Accepted: The term must adhere to community practices.
- Defined: The terms are precisely characterized. Typically, this means the terms have rigorous definitions.
- Managed: In general, there will be a body of experts that create and maintain the controlled vocabulary. The controlled vocabulary maintenance will involve periodic review, addition of new terms, modification of terms, and occasionally deprecation of terms.
Notice, this definition does not specify a particular scope of usage. Controlled vocabularies could be developed for a local project (like the Scripps Institution of Oceanography Geological Data Center), a broader community (e.g. OOSTethys), or as a part of a widely used standard or tool (ISO 19115).
Controlled Vocabulary Categories and Types
To many people, the English language is a well-known vocabulary. We have many ways of representing the terms in the English language. For example, if we want to figure out what a specific word means we might consult a glossary; if we want to know the origin of a term we might consult a dictionary; and if we want to know how the term relates to other terms we might consult a thesaurus. We also need to recognize that the meaning of terms may change through time. Generations use terms in different ways (cool in one generation means a low temperature, while cool in another is a positive adjective).
To enable formal management, a controlled vocabulary can be organized in several ways. There are three broad categories of controlled vocabularies: flat, multi-level and relational.
- Flat controlled vocabularies provide a set of used terms. Some flat controlled vocabularies will provide additional information about each term.
- Multi-level controlled vocabularies build upon a flat controlled vocabulary by assigning each term to a category.
- Relational controlled vocabularies provide a set of terms, and capture how they are associated with each other.
Within these three categories, there are additional controlled vocabulary types. The table below summarizes these categories and types. The table categorizes necessary conditions only. Some controlled vocabularies will appear as "hybrids" of one or more categories of controlled vocabularies. Please see the Types of Controlled Vocabularies guide for a more extensive explanation, or this article on Knowledge Organization Systems.
|
Broad Category
|
Controlled Vocabulary Types
|
Description |
|---|---|---|
| Flat Controlled Vocabulary |
Authority File
|
List of terms |
|
Glossary
|
List of terms and definitions within a specific domain | |
|
Dictionary
|
List of terms, definitions, and additional information | |
|
Gazetteer
|
List of place names | |
|
Code List
|
List of codes (e.g. abbreviations) and definitions | |
| Multi-Level Controlled Vocabulary |
Taxonomy
|
Terms classified into categories |
|
Subject Heading
|
Terms classified into categories, which may be broad classes | |
| Relational Controlled Vocabulary |
Thesaurus
|
Set of terms and relationships among individual values |
|
Semantic Network
|
Set of terms/concepts and directed relationships | |
|
Ontology
|
Set of terms and relationships among terms, enhanced by additional information provided by rules and axioms. |
The Purpose of a Controlled Vocabulary
Controlled vocabularies can serve several different purposes. For example, a controlled vocabulary might help users find data (also known as a "discovery vocabulary"), or assist in the interpretation of data (also known as a "usage vocabulary"). The controlled vocabulary might provide human-understandable meaning (also known as a "semantic vocabulary") or machine-readable format information (also known as a "syntactic vocabulary"). Controlled vocabularies provide these abilities by:
- establishing the permissible terms to be used;
- maintaining the proper and agreed-upon spelling of the terms;
- clarifying terms for those who are new to the community; and
- eliminating the use of arbitrary terms that can cause inconsistencies and confusion.
The Importance of Controlled Vocabularies to Metadata
Controlled vocabularies are very powerful when combined with formal metadata standards. This is because the terms within the controlled vocabulary can be used as the content for specific metadata elements that make up the standard. In many cases, the controlled vocabulary terms completely define the allowable content for a particular metadata element. This control helps avoid misspellings and inconsistencies in the metadata content. Moreover, in the world of computers, the controlled vocabulary offers enhanced capabilities because it can be incorporated into automated procedures. For example, in a data system a controlled vocabulary can simplify system input and contribute to quality control of that input. Input is simplified by providing users or other systems with a list of allowed entries for the specific metadata elements. Similarly, the controlled vocabulary can be used to check existing or imported metadata descriptions for consistency and correctness, including things like spelling and hyphenation.
Controlled Vocabularies as an Interoperability Aid
Translation and crosswalking can be thought of as the basis for metadata interoperability. When a metadata description created by one system can be interpreted by another system, the resource described by the metadata can be used more easily and precisely within both systems.
In spoken language, when we move from one language to another, we need to identify a word in our own language and relate it with a word in another language. We might need to take a closer look at the word in our language, to determine exactly what it means, or the proper usage. There will be times when a word won't translate directly into a single word or phrase in another language. We will also probably make use of grammar rules in the translation process.
In the context of metadata, a controlled vocabulary is analogous to a language in the above example. The terms in one controlled vocabulary can be translated into the terms used by a second controlled vocabulary. If the entire controlled vocabulary is translated, then all metadata descriptions that use the first controlled vocabulary can also be translated to use the second controlled vocabulary. In this way, controlled vocabularies facilitate metadata interoperability.
The different types of controlled vocabularies provide different levels of interoperability. Often when we move from one project to another, we need to identify the metadata descriptions that use one controlled vocabulary, and relate these descriptions to another system. We might need to understand more about the terms in the initial controlled vocabulary-what it represents (glossary), how it came to be (dictionary), and what terms are similar (thesaurus, semantic network, or ontology). There will be times when one term doesn't fit nicely into the second controlled vocabulary. This is where hierarchies and classifications (subject headings, taxonomies, and ontologies) become very handy.
Example of Controlled Vocabulary Usage
Suppose three different oceanographic research projects are using various vessels or submersibles. In the worst case, we could imagine that none of these projects had a controlled vocabulary. In this case, if someone were to query the data resource to accurately locate all data associated with a particular research vessel like the R/V Moana Wave, they would need to know all the ways "R/V Moana Wave" was represented within the resource, and construct a search query for all of the variations (including the misspelled, misrepresented, and "nicknamed"). Beyond daunting, this seems nearly impossible!
In a better case, we could suppose each project generated a controlled vocabulary, as shown below.
![]() |
Dictionary Each term is articulated with an acronym. (1st entry, blue) The acronyms are spelled out in the description. (2nd entry, yellow) Additional information about how each term came to be is included in the etymology. (3rd entry, green) |
![]() |
Taxonomy The actual terms (2nd entry, blue) are placed in a structure, according to the decade in which they were commissioned (1st entry, green). |
![]() |
Ontology Actual terms (3rd entry, blue) are classified into two major classes (1st entry, green), and one subclass (2nd entry, yellow). Notice the vessels are connected to submersibles, based on the operating institution. This is a complex interrelation, which enhances the class heirarchy. |
Notice, each of these controlled vocabularies represents the same list of real-world objects (i.e., vessels or submersibles). They are presented as different types of controlled vocabularies, with different terms to represent the real-world objects, and with slightly different accompanying information.
Suppose each project exposed their particular controlled vocabulary to a search engine and that translations existed between the vocabularies. The search engine may provide a dropdown menu of platform names to expedite the user searches. When a user needs to identify all data associated with the R/V Moana Wave, they could use a dropdown menu to select that particular ship. Without the translation, which in fact only exists because of the three controlled vocabularies, the user would need to know that each project represents the SOEST University of Hawaii vessel in a different way. In that case, the user would need to do a search for "R/V Moana Wave" and "Moana Wave" and "MW"- all without typographical errors!
In addition, this example illustrates the value of adopting established controlled vocabularies, instead of developing a local vocabulary. Each of these three controlled vocabularies is a representation of the same set of real-world objects, but three different programs took the time to develop a unique controlled vocabulary. One or more of the locally developed controlled vocabularies might not be exhaustive, and not all three contain the same information. If the three programs collaborated and developed a single controlled vocabulary, this authoritative controlled vocabulary could be managed centrally. The controlled vocabulary would be more complete, and thus would be much stronger, possibly with less effort by any individual program.
Controlled Vocabulary Management
Regardless of the type of controlled vocabulary you implement, it is useful to understand the process of managing a controlled vocabulary. Of course, your management tasks can be avoided if you use the vocabularies managed by another organization- usually a good idea (see Choosing and Implementing a Controlled Vocabulary). A controlled vocabulary may not exist that meets a project's needs, and a group will then need to become a controlled vocabulary manager (see A Last Resort... Developing a Local Controlled Vocabulary). In either case, it is very useful to understand the essence of controlled vocabulary development. What follows is a very simplified synopsis of the controlled vocabulary development process.
Simplified Controlled Vocabulary Development and Management
- Clearly define the need for a controlled vocabulary. Individuals or groups that manage controlled vocabularies must, first and foremost, meet the needs of the appropriate scientific and technical community.
- Using community expertise, evaluate each candidate term. Is the term widely used? Does it have appropriate meaning to the community?
- After a thorough review, format the controlled vocabulary. Different types of controlled vocabularies can be implemented using different formats.
- Register the controlled vocabulary with an appropriate organization.
- Use the controlled vocabulary in community projects. Solicit input from implementing organizations.
- Incorporate user community input to improve future versions of the controlled vocabulary.
Throughout the management process, the controlled vocabulary may evolve. For example, if an organization begins with an authority file, they can provide descriptions and etymology in future versions of the controlled vocabulary. This will enhance the authority file and evolve it into a dictionary. Perhaps one of the implementing organizations will enrich the dictionary by submitting classifications, relationships, and axioms to the managing organization for the dictionary. What started as an authority file has now become an ontology/dictionary combination. This is one important benefit of a controlled vocabulary- it can become a living resource, which is relatively easy to implement, update, enhance, and understand.


