Ontologies: Robust Controlled Vocabularies

Data managers, scientists, and others in the marine science community are used to working with controlled vocabularies, even if they know them by another name. After all, anyone working with a set of data described using an agreed-upon vocabulary of terms, like a dictionary or thesaurus, has been exposed to controlled vocabulary usage. For many purposes, basic controlled vocabularies are the recommended tool: they are often simple to develop, can be passed between a small community of users, and are easy to store, visualize, and access.

However, as data interoperability and more advanced data comparison and discovery requirements are built into projects, simple vocabularies start to make less and less sense. Fortunately, there exists a way to improve on the concept of simple controlled vocabularies and turn them into a true computational powerhouse. Semantic technology, including the use of controlled vocabularies known as ontologies, pave the way for data interoperability, advanced search and discovery, and machine learning and reasoning in a way that older technologies will never be able to support.

In this guide, we will explore the nature of this new semantic technology and its relationship to the work being done in marine science. First, we will explain exactly what an ontology is, including how it differs from a standard controlled vocabulary. Then we discuss the importance of ontologies, including the various strengths of using ontologies. We also provide a brief overview of the various technologies which form the foundation of ontology work, including the RDF and OWL formats. Finally, we discuss various methods of developing and providing ontologies, as well as working with existing ontologies.