Choosing and Implementing Established Controlled Vocabularies
Background
When you are creating systems that must provide metadataData about data. Metadata provides a context for research findings, ideally in a machine-readable format. It enables discovery of data via an electronic interface, and correct use and attribution of findings. Related Guide content, or you are creating metadata content directly, sooner or later you will provide a way to fill out fields with specific terms. To take a simple example, you may need to answer the question "What is this data item?", obtaining as a reply the explicit name of a measurement parameterIndividual instance of a metadata label and value pair. For example, "creator: John Doe" is a metadata element. Related Guide.
If you do not constrain the alternatives, the reply may include any text. This is called a 'free text' metadata entity, and while it has important uses, there are many reasons to prefer a more prescribed result in most cases. To allow (and demand) a constrained result, you must specify a controlled vocabulary to enable and limit the user's choices in filling out the information.
There are two ways to obtain such a controlled vocabulary: start with an existing vocabulary, or combination of them; or build your own controlled vocabulary. We strongly recommend the first approach for most situations. In this and subsidiary guides, we outline the methods you can use to choose a controlled vocabulary, and integrate it with your system.
Should you find no existing vocabulary even approximately meets your needs, you may choose to develop your own vocabulary. A following guide describes this process. After you have implemented your own vocabulary, it becomes an existing vocabulary, and the following steps in this outline can be applied.
Many of the more technical sections in this guide are still under development. Please contact us if you want further help with these topics.
Steps Toward Implementing an Existing Controlled Vocabulary
There are four steps you may undertake on the way to a system with integrated controlled vocabulariesA managed list of terms. In the context of vocabularies, management typically includes careful selection of terms, maintenance of terms over time (i.e. addition, deprecation, modification), and presentation of the vocabulary in an accessible format. Related Guide. These are described as follows:
Choosing a Controlled Vocabulary
Although just a few years ago there were very few mature controlled vocabularies for marine science concepts, today there are a large number of possible candidates to consider. The process of finding suitable candidates, evaluating them, and deciding on the best one is described in the guide Choosing a Controlled Vocabulary.
Implementing Controlled Vocabularies in Your System
The controlled vocabulary you've chosen must be integrated into your system. This is not just a matter of creating and populating a drop-down menu; most vocabularies will change over time. Keeping your system current will require some basic engineering, as well as some strategic decisions. For more information, see the [pending] guide Implementing Controlled Vocabularies in Your System.
Mapping Among Controlled Vocabularies
It probably will not be enough to simply provide a 'standard vocabulary' that everyone can use. You will find that systems users have their own custom vocabulary termsA potential metadata value that is part of a set intended to restrict the available options in a particular metadata element. that they are more familiar with. They may need to understand the items in your controlled vocabulary better ("to which term of mine does this correspond?" or "what is the corresponding term in this other controlled vocabulary?"), or they may want to interface with your system using their own terms. In these cases, you may need to create a map between two or more different vocabularies. The guide Mapping Among Controlled Vocabularies describes this process, and the tools that are available to help.
Achieving Semantic Interoperability
You now have a lot of components of a semantically interoperable system. How do they work together to give the user a seamless experience of semantic interoperability? We address these questions in the guide Achieving Semantic Interoperability.
Summary: The Big Picture
Although all these technologies and solutions are fully usable in principle, and systems have been built to use them, this is still a young technology field, particularly in environmental science. Vocabularies are in their infancy; patterns for adopting, using, and maintaining vocabularies in data systems are not widely adopted; mappings are relatively new and immature; and end-to-end solutions that achieve semantic interoperability are few.
We encourage you to help us reference the best practices and examples in this field, and to help us describe the processes that provide the best results. We will be pleased to have you participate in the project, and the community will appreciate your contributions!
Have a specific question about controlled vocabularies? Ask MMI!