MMI Ontology Creation Guidance

Motivation

To make it easier to work with marine science data, the MMI project wants to standardize the way we work with that data. MMI strongly encourages the effective use of interoperable metadata, including well defined vocabularies. Well-defined vocabularies simplify data publishing, discovery, documentation, and accessibility. Vocabularies are complex because they can be found in different organization systems and formats. A common model or format to express controlled vocabularies facilitates interoperability among information systems (See more about harmonization here). The common format selected by MMI is OWL.

Ontology creation guidelines

As noted above, vocabularies are found in many different formats. Sometimes their terms are categorized (grouped or organized, perhaps in a hierarchy) and sometimes they are not. Selections of what should be a class (a general group of terms) or instance (a specific term) is sometimes not trivial. Here we present the technique we are using for the MMI project.

An OWL ontology is composed basically of classes, properties and individuals. (Here is classic tutorial about ontologies.) These will be addressed in the following sections: what should be a class, a property and an individual when constructing marine ontologies from existing vocabularies.

Selection of classes

A class is a term that represents a category of individuals. For example: Marine-Variables is a class that can help categorize marine terms. The original terms are available in some kind of format. A source for a class name from different encodings is shown in the table bellow. For example, in GCMD the term 'variable' is a category, while in BODC the term 'parameter' is a category. These group names are expressed as a class in an OWL ontology.

Encoding Source for class
Plain list Title (heading) of the list
Table Title of the table
XML file Element tag
RDBS Name of the table (If the term and its attributes are stored in a table)
UML Name of the class (If the term and its attributes are stored in a class)

Selection of individuals (instances)

The terms in a vocabulary that are not the main categories are said to be individuals. You can ask the following question to determine whether an individual is the appropriate type: Is the proposed instance a member of the class? For example, we can say that "iron is a member of the class elements", so iron is an instance, while elements is a class.

Selection of individuals' properties

Each individual has its own collection of properties in the ontology. For example, an individual will have a short-name, id, definition, date of creation, author, and similarly descriptive characteristics. These properties could be data-type properties or object properties.

data-type property (owl:datatypeProperty)
If the value the property can take (the range of the property's values) is a number or string
object property (owl:objectProperty)
If the value a property can take is another resource (another URI)

Attributes of the vocabularies are mapped to properties in OWL. To select the properties in OWL, the following sources can de identified: 1) Owl built-in properties, 2) Dublin Core properties, 3) Other properties.

OWL built-in properties (they also include RDF properties) are the first choice when selecting a label attribute of a term. For example, an OWL built in property is (rdfs:comment). For a complete list look at the OWL Appendix C reference. Dublin Core properties are good because of their wide acceptance, tools support (e.g. OWL and JENA), and compatibility with RDF. The third source of properties is other properties, which includes any arbitrary given label. If using OWL, these properties will have a unique namespace to avoid semantic conflicts.

A fast approach to create an ontology from an available vocabulary is to mimic the attributes of the term in the original source as data-type properties in OWL (Third approach); however at MMI we want to map terms across ontologies and do something useful with this mapping, like getting to know what units they have and trying to plot them. For this reason we have selected a minimum number of properties that a vocabulary should have.

The properties and the name used for the source of these properties are explained below. In the following list Def stands for definition and Map stands for the property name in the ontology. An x before a colon denotes that any namespace can be used, or in other words, that it is an arbitrary property. The minimum suggested set of properties are:

Unique identifier
Def: Unique identifier conforming to XML names specifications.
Map: rdf:ID
Original Unique identifier
Def: Original unique identifier of the vocabulary. Is the original label that will be used to query the system. A valid XML name is not necessary.
Map: rdfs:label
Definition
Def: Definition of the term.
Map: Two strategies:
  • dc:description(preferred)
  • rdf:comment
Units
Def: Units of the term
Map: Two strategies:
  • Use units only as a string value of an owl:datatypeProperty
  • Create a class to store the units as individuals and link each term of the vocabulary to a unit using an owl:ObjectProperty, as follows:
    1. Create a Class (rdf:Class) named here: x:unitTypes.
    2. Create individuals for this class, based on the original units of the vocabulary. For this the units should be transformed into valid XML names.
    3. Store the original unit values in a field. This is an owl:datatypeProperty named here x:originalUnits.
    4. Create an owl:objectProperty named x:hasUnit, whose domain is the class created and whose range is the x:unitTypes