Classification of Controlled Vocabularies
In understanding English, if we want to figure out what a word means, we might consult a dictionaryIn the context of metadata, a dictionary is a type of controlled flat vocabulary, which provides a list of metadata terms, definitions and additional information within a specific domain. Related Guide or a glossaryA type of flat controlled vocabulary containing a list of terms in a particular domain of knowledge with the definitions for those terms. Related Guide. Or we may use an etymology dictionary to track the history of a word. If we want to know how a term relates to other terms we might consult a thesaurusA type of relational controlled vocabulary which provides a list of terms, with specific relationships between the terms. Related Guide.
Like the vocabulary sources for the English language, controlled vocabulariesA managed list of terms. In the context of vocabularies, management typically includes careful selection of terms, maintenance of terms over time (i.e. addition, deprecation, modification), and presentation of the vocabulary in an accessible format. Related Guide for describing metadataData about data. Metadata provides a context for research findings, ideally in a machine-readable format. It enables discovery of data via an electronic interface, and correct use and attribution of findings. Related Guide can be classified by their purpose, their form, or their functionalities.
Classification by Purpose
Vocabularies may be defined by their ability to accomplish specific goals:
- Discovery vocabularyUse of metadata values or vocabularies to find metadata or data sets. Related Guide: helps users find data
- Usage vocabularyThe set of terms used to identify, analyze, or re-use data values in the native form of the data asset. Related Guide: assists in the interpretation of data
- Semantic vocabulary: provides human-understandable meaning
- Syntactic vocabulary: translates information into machine-readableIn the context of metadata, formatted in a way that is well defined and processable by the system's software and hardware. Metadata with this characteristic can be discovered, ingested, and presented by an electronic system (also known as 'computable'). Related Guide format
Controlled vocabularies provide these abilities by
- establishing the permissible terms to be used;
- maintaining the proper and agreed-upon spelling of the terms;
- clarifying terms for those who are new to the community; and
- eliminating the use of arbitrary terms that can cause inconsistencies and confusion.
Classification by Form
To enable formal management, a controlled vocabulary can be organized structurally such that that it fits into one of these broad categories:
- Flat: provides a set of required terms that may be used. Some flat controlled vocabularies will provide additional information about each term.
- Multilevel: builds upon a flat controlled vocabulary by assigning each term to a category.
- Relational: provides a set of terms and captures how they are associated with each other.
Classification by Functionality
Within the three broad categories that classify controlled vocabularies by form, there are subState and University Library -groupings that we will call “types.” The table below summarizes the relationshipsConnections between metadata terms within a vocabulary. These relationships can connect terms by scope, provenance, or other well-defined criteria. between the broad, form-based categories and their respective function-based types. The table defines the types and categories according to their minimum required characteristics.
Broad, Form-based Category |
Functionality-based Type |
Description |
Flat Controlled Vocabulary |
Authority FileA type of flat controlled vocabulary that consists of a list of labels and terms which can be used for establishing the acceptable content, for example a metadata element or database field. Related Guide | List of terms |
Glossary | List of terms and definitions within a specific domain |
|
Dictionary | List of terms, definitions, and additional information |
|
GazetteerIn the context of metadata, a gazetteer is a very specific type of flat controlled vocabulary - a geographic term list. Related Guide | List of place names |
|
Code ListA type of flat controlled vocabulary consisting of a set of codes and their meanings, in use in a specific project. Related Guide | List of codes (e.g., abbreviations) and definitions |
|
Multilevel Controlled Vocabulary |
TaxonomyA multi-level controlled vocabulary in which metadata terms are grouped according to subject-specific classes, usually hierarchical. Related Guide | Terms classified into categories |
Subject HeadingA type of multi-level controlled vocabulary in which metadata values are classified into categories which may be broad classes. Related Guide | Terms classified into categories, which may be broad classesGrouping of metadata values, based on shared criteria. Related Guide |
|
Relational Controlled Vocabulary |
Thesaurus | Set of terms and relationships among individual valuesMetadata values are the content connected to metadata labels in a metadata element. For example, if the metadata label is "date", the metadata value could be "May 13, 2007". Related Guide |
Semantic NetworkA type of relational controlled vocabulary consisting of lists of terms/concepts and directed relationships. Related Guide | Set of terms/concepts and directed relationships |
|
OntologyA type of relational controlled vocabulary, which provides for categories, relationships, rules and axioms among metadata elements. Typically a hierarchy of classes and terms, an ontology is a machine-readable way of relating metadata terminology. Related Guide | Set of terms and relationships among terms, enhanced by additional information provided by rules and axioms. |
Hybrid Classifications and the Real World
Not all controlled vocabularies fit neatly into one type; some may appear as hybrids or crossovers. Vocabularies rarely exist in a vacuum and evolve over time, causing the distinctions between the classifications to be muddied, either intentionally or unintentionally. In addition, vocabularies can fit multiple classifications.
Consequently, one controlled vocabulary might fit the definition of more than one type. For example, an ontology might also have many of the characteristics of a dictionary. Because of this ambiguity, the different types may be referred to generically as "vocabularies" or "controlled vocabularies," especially if they have hybrid characteristics.
Comparing and Understanding Classifications of Controlled Vocabularies
The guides in this section contain several articles to help you understand the distinction between classifications of controlled vocabularies and to examine some types side by side. Also, see the article Knowledge Organization Systems for more information.