Creating Ontologies Using Vocabularies

For flat or hierarchical vocabularies, it is often straightforward to create a corresponding ontology. The development of an ontology will involve ensuring the following: format, classes/subclasses, instances, and relationships. Here are a few key things to understand before you begin creating the ontology.

Deciding on Term Identifiers: Opaque or Meaningful

In general, terms are represented in a vocabulary using a string of alphanumeric characters. These may be meaningful and describe the term (e.g., temperature), or they may be opaque and represent the term (e.g., 729402c).

A first approach to creating unique references to terms on the Web might be to give them an identifier that contains their name, for example http://mmisw.org/mmi/examples/sea_surface_temperature. Here 'sea_surface_temperature' is the name associated with a given concept, and we can define what it means, and life is good, right?

Unfortunately, it is in the nature of language, and of scientific terminology, that meanings of terms change over time. This is true even in such terminology-focused domains as species classification. Species names change, species identifications change, even the way species are classified into higher groups is subject to major change. So, we need to appreciate that the thing we called 'sea surface temperature' 50 years ago may have a different name next year, like 'sea surface foundation temperature.'

Most ontologists have determined that the way to avoid this problem is to create a unique identifier—specifically for the Web, a Uniform Resource Identifier, or URI—for the concept that is of interest, and to make that identifier 'opaque,' that is, without any meaningful concepts embedded in it. The identifier may be a number, or a code, or a random string, but it has an associated preferred label, and a definition, which clarifies the concept of the URI. If the way people refer to the item changes, say from 'frisbee' to 'flying disc,' then the label can easily be changed, while the concept's URI stays the same.

Ontology providers can use either method. Both have strengths and weaknesses. The URI method provides a powerful abstraction, but it requires ontology users to know the representative string of characters. The meaningful term method is instantly clear to users, but it must accommodate changes in terminology and meaning.

For the creator of a vocabulary, determining an approach may not require deep understanding of the two options. It is easy enough to use numbers or codes for your vocabulary entities, and in fact many communities that negotiate shared vocabularies find that using numbers is the only way to avoid unending arguments about which term to use. On the other hand, if you are creating a local vocabulary that is unlikely to have extensive persistence or community use, the simplicity of meaningful identifiers can be a powerful incentive toward standardization.

There are two other reasons to consider opaque, or at least non-literal, terms for your vocabulary. The first is the ability to specify the label as accurately as you want. If you use the term as part of your identifier, it becomes very awkward to create a usable URL out of a term like 'Crutchfield Jacob's Syndrome.' Terms that include spaces, hyphens, pound signs, slashes, accent marks, or other non-ASCII and non-Roman characters work poorly as part of a URL in a browser.

The second simple reason to avoid using terms as identifiers is when your terms have multiple meanings. Since identifiers must be unique, it may make a lot more sense to use numbers than to use successively longer discriminators, as in 'sea_surface_temperature_when_moving_measured_by_thermistor_uncalibrated.' You get the idea.

This guide suggests that you choose the approach you consider most appropriate to your community, considering its size, longevity, diversity, and visibility to the greater science community, and your ability to support codes in your data systems. If you decide to use terms as meaningful identifiers, be aware that eventually there may well be repercussions associated with that approach.

Other Information to Include

We know it is important to include the string used as the identifier, a preferred label (if your term is not the preferred label), and a definition. What other information should be included with your vocabulary and its terms?

At the vocabulary level, a number of metadata items are worth capturing: who created it, the date, its purpose, its principal topic, and maybe the terms under which others can use it. There are metadata standards like Dublin Core and the Ontology Metadata Vocabulary, as well as extensions like MMI's Metadata Vocabulary that can help you decide what to include.

At the term level, what other information is worth including? Information about the semantic content of the term may be useful. For example, a complete spelling if it is an acronym, or a URL for the home page if it is an organization. Alternate labels may be of value, as well as relationships to terms in another vocabulary. These are not recommendations, simply suggestions to consider.

The tools that work with controlled vocabularies and ontologies are very term oriented, so you should not include concepts unrelated to the meaning of the term itself. The format and units of a data item may seem useful, but prevent you from using that term and definition in other contexts where the format and units might be different, or the word might not represent a data item at all. To think of it another way, the vocabulary is not intended to define all the metadata for a data variable, simply the language used to name the data variable.

It is simple to add information to your vocabulary, but if the list of information is more than a few items, or you want to express the relationship of your terms to each other, you may be better off working directly with ontology tools, as described in previous pages of the MMI Guides.

Translating to an Ontology

A good starting point for your vocabulary is to have each type of information about the terms—identifying name, definition, and other pieces of information—stored in a separate column, separated by tabs or commas. A spreadsheet can easily generate data in either of these formats. Remember that the identifying name has to be unique, without spaces or unusual characters (keeping it to A-Z, a-z, 0-9, and _ is a good practice). Given this, how do we make an ontology?

A basic ontology has a simple format, so a simple pattern substitution could turn your vocabulary list into a credible term ontology. Examine a few other term ontologies to identify many of the basic patterns, if you want to experiment when creating your own. An ontology editing tool like Protégé (free) can help you confirm the ontology is the way you want it, or can help you create the ontology from scratch.

With the voc2rdf tool introduced by MMI, there is another way to move from a vocabulary list to an ontology. With this service, it is possible to enter the comma- or tab-delimited text into a dialog box, and press a button to have it converted to an ontology. If you want to get off to a quick start, this is a reasonable way to proceed, and you have the option of working with the resulting ontology, or committing it directly to a repository.

If your vocabulary is at all complex, with hierarchies or internal relationships, you probably should start with an ontology tool, and some of the basic instructions on ontologies here and elsewhere. But for simple vocabularies, some simple approaches may work well.

Suggested Citation

Graybeal, J. 2011. "Creating Ontologies Using Vocabularies." In The MMI Guides: Navigating the World of Marine Metadata. http://marinemetadata.org/guides/vocabs/ont/provider/createvocabont. Accessed December 12, 2019.