Constructing URIs for Ontologies

As noted in the previous section, the use of URLs provides an intuitive and self-contained link to additional resource information, essentially allowing it to be self-documenting as fully as desired.

This section provides additional suggestions on how to create URIs that provide a good mix of utility, transparency, and maintainability. To some extent these traits work against each other; for example, the most maintainable URIs may be those that are semantically opaque (they appear meaningless), but these are relatively difficult for humans to work with (more on this below). We have tried to identify the best possible long-term benefit in our approaches to creating URIs; only time will tell if we have been successful.

URIs: Opaque vs Transparent, Label vs Concept

While many people recommend an 'opaque' format for the URI, in which the meaning of the URI is not apparent from its form (see for example [10]), we have chosen to recommend URIs constructed according to a semantically meaningful format, as described below. While there are definitely some portability and persistence costs to this approach, we believe that when representing semantic terms, usability and social benefits are more important at this stage of semantic web development. We also consider that the costs of this approach over the long term can be mitigated by other means (for example, ensuring the persistence of the URIs themselves).

Note that there is an important semantic model implicit in this approach, and explicitly embedded in its content. The concepts represented by resources on the web may change their name; for example, the company called "Apple Inc." may become "Apple", even though the company itself is unchanged. In that case, it is important that a unique persistent resource for the Apple company continue to represent the company itself, and that only the label corresponding to that resource should change. However, in the case of the vocabularies MMI represents, we have concluded that it is in fact the label itself that we are documenting and creating a web resource for: the string for a given term, say "sst", is a significant object in and of itself. Therefore, we associate the resource for a term in a given vocabulary to the string object "sst" that is that term. Over time, the meaning associated with this object may change; "sst" may come to mean "saline solution temperature" rather than "sea surface temperature". Since the resource identifies the label itself, not the meaning, the resource (and corresponding URI) will now relate to a new meaning associated with "sst".

If you want your vocabulary to be constructed around codes or opaque strings, rather than meaningful strings, you can still do that while using our repository service. Simply make the desired code the instance of the term, and provide a corresponding label in addition to the definition. Then the label can change over time, according to usage.

If disambiguation is required, it is possible with our approach, using one of several conceptual facets that are incorporated in the URI. These are described below.

While we expect this process can support terms that embed http-unfriendly characters (e.g., accents, unicode, or '/'), we do not look forward to proving that. For now, we only support basic ASCII characters in our implementation of elements used to create URIs for ontologies and terms.

URL Construction for Ontology Files

The URL representing an ontology file should follow the following basic scheme (some modifiers are described later):

http://{hostdomain}/{ontologiesRoot}/{authority}/{version}/{resourceType}.owl

For an RDF or SKOS file, the scheme may be modified to replace .owl with .rdf or .skos, depending on the goals of the provider. MMI intends and suggests using .owl for any ontology file that does not conflict with the OWL specification, as this distinguishes these files most clearly from more general-purpose RDF files.

An example of this URL scheme is:

http://mmisw.org/ont/mmi/20080701T022342/platform.owl

The following section discusses these components.

URL Construction for Ontology Files: Additional Recommendations

These recommendations apply to the assignment of URLs to ontology files. (Although in principal URNs could also be assigned to ontology files, current practice rarely if ever does so.)

A) Include the file extension, since this will help search engines to access the ontology files. For example typing “buoy filetype:owl” in Google will query ontologies with the term ‘buoy’. So, if the ontology is expressed in OWL use “.owl” (even though this is not an approved MIME type). RDF files may be expressed as .rdf, but if it is a vocabulary file compatible with the OWL format, MMI suggests ending it with .owl. So form 1 below is preferred over forms 2 and 3:

1) http://mmisw.org/ont/mmi/20080701T022342/platform.owl <-- PREFERRED
2) http://mmisw.org/ont/mmi/20080701T022342/platform
3) http://mmisw.org/ont/mmi/20080701T022342/platform.xml

B) Include version number, using for example the timestamp pattern. Some suggest putting the most significant elements toward the beginning of the path, but we recommend placing the timestamp just before the ontology name. This is because the short name of the ontology will be placed at the end of the path, matching the file name of the ontology (and allows the ontology name to gracefully end in .owl, as recommended above). It also follows the W3C pattern. So the first case is preferred over the second case:

1) http://mmisw.org/ont/mmi/20080701T022342/platform.owl <-- RECOMMENDED
2) http://mmisw.org/ont/mmi/platforms/20080701T022342

We have chosen a full timestamp pattern including hours, minutes and seconds; this simplifies disambiguation of versions when multiple version are submitted in a short period of time. You may choose to use a shorter timestamp string.

C) The file name should encode information about the resource type (please see the Resource Types section below for more details) or other distinguishing characteristic, and can include the authority as a prefix for clarity. The name should not contain spaces, and if it contains an authority, should contain at least one underscore “_” that separates the authority from the object type. The use of “-“ inside the name is not recommend since it will sometimes confuses search engines. (The character “-“ means exclusion, and even if the string is searched as a quoted string (“word1-word2”) you are not guaranteed to get pages with that exact string. For example, searching for “"moored-buoy" in Google, returns not only pages containing “moored-buoy”, but also pages containing “moored buoy” and “moored.Buoy”. This doesn’t occur if we use the underscore character “_”.)

D) If an ontology is replaced by another one, use the Dublin Core [6] element isReplacedBy (http://purl.org/dc/terms/) to inform the user or the program about the new ontology. However, if an OWL document is to be created, an ontology that contains the Dublin Core concept term should be imported. If this is not done, the ontology will not be valid. The property should go inside the ontology element. An RDF vocabulary or a SKOS vocabulary can use the Dublin Core property without any problem. If SKOS [9] is used, the property tag should go inside the ConceptScheme element.

E) The 'auth' may be tailored for circumstances as required. For example, if three ontologies from an organization must have the same name, the organization may create a separate 'authority' with the ontology repository for each ontology. Even if all 3 ontologies are in fact run by the same organization, the different auth fields can map to the same entity. This technique also can support organizational hierarchies or separate managers within a single organization. However, note that some organizations may prefer to assign managers on a per-ontology basis, while keeping the overall 'authority' the same for all those ontologies, so the authority does not necessarily map directly to the person with privileges to change the ontology.

F) Techniques for resolving duplicate terms in vocabularies are beyond the scope of this proposals. In brief, each term in the ontology must be unique, and in many vocabularies this may be the case even for terms in many different hierarchical branches. However, if the vocabulary has the same term name appear in different hierarchies, it is necessary to disambiguate the different terms. Possible techniques for doing so include (a) specifying the complete path as part of the term name; (b) specifying path components in reverse order until disambiguation is achieved; (c) using a specific numbering or appending scheme that distinguishes specific terms, but can be easily disregarded by viewers. We prefer approach (b), as it is the most intuitive while being the least intrusive. Further, we suggest '__' as a component separator, as it is unlikely to be duplicated in the term itself. In any case, the definition of the term should include the original name of the term in an attribute (attribute name to be specified). Unfortunately, we are not aware of any technique which consistently and unambiguously translates vocabulary term names into graceful unique names, and then reverses the process.

G) Creating consistently formed term names from unique vocabulary strings is also not explicitly addressed by this document. Briefly, we recommend terms be formed as camel case, without underscores or dashes, or other characters outside [A-Za-z0-9], and that they start with a letter. (Spaces and slashes are particularly unfortunate characters to embed in term names, or any other component of the URI, as these must be escaped in URLs.) These forms are easily translated into URL components and other concepts on the web. If our recommended forms are not followed, substitution is usually necessary, and as noted above, the original name of the term should be preserved in an attribute.

URL Construction for Ontology Files: Backus-Nauer Specification

In Backus-Nauer form, the ontology resource specification is represented as follows:

<MMI-URI> ::= “http://” <hostdomain> “/” <ontologiesRoot> “/”
<authority> "/" <version> “/” <resourceType> ".owl"

<version> := <ShortISO8601> | <NumberVersion>
<ShortISO8601> := <YYYYMM> | <YYYYMMDD> | <YYYYMMDD.hh> | <YYYYMMDD.hhmm> | <YYYYMMDD.hhmmss>
<NumberVersion> := <MajorVersionNumber> “.” <RevisionNumber>

In these schemes (and the ones in the following section), the following explanations hold:

  • hostdomain is the URL owned by the administering authority of the repository, and typically resolves to the server hosting the ontology. For example: mmisw.org
  • ontologiesRoot is the node of the host that serves as the base of the ontologies; typically it is a directory where the ontologies live. For example: /ont
  • authority is a code representing an administrative authority responsible for the particular set of terms. This may be the repository authority, or may be a different authority on whose behalf the repository is acting. For example: 'mmi' or 'cf' may be appropriate authority values. The authority is meant to provide a reference to the actual authority, not be a one-to-one mapping with the authority. There may be multiple authorities in a given organization: noaa, noaa1, noaa_dif, and so on.
  • version represents the publication timestamp (or part of it) of the resource or a version control number; the MMI default will be a publication date/time in ISO8601 form.
  • resourceType is the type of objects that are being represented by the vocabulary (as categorized by the vocabulary authority). Some examples are parameter, ucum_units, agu_index_terms and mmi_platforms. The authority may or may not be reflected in this term, which often is effectively the name of the ontology file.
    Additional disambiguation may be necessary for a resourceType, for example for different types of files dealing with the same resource type, or multiple organizations dealing with that resource:

URL and URN Construction for Terms

URL for a Term

The URL representing a term should follow the following scheme:
  http://{hostdomain}/{ontologiesRoot}/{authority}/{version}/{resourceType}/{shortName}

An example term using this URL scheme is:
  http://mmisw.org/ont/mmi/20080701T022342/platform/moored_buoy

More information is available in the next page about resolving this URL scheme.

Backus-Nauer of Term URL

In Backus-Nauer form, this is represented as follows:

<MMI-URI> ::= “http://” <hostdomain> “/” <ontologiesRoot> “/”
<authority> "/" <version> “/” <resourceType> "/" <shortName>

<version> := <ShortISO8601> | <NumberVersion>
<ShortISO8601> := <YYYYMM> | <YYYYMMDD> | <YYYYMMDDThh> | <YYYYMMDDThhmm> | <YYYYMMDDThhmmss>
<NumberVersion> := <MajorVersionNumber> “.” <RevisionNumber>

URN for a Term

If expressed as a URN, this scheme becomes:

urn:urn_auth:authority:version:resourceType:shortName

The urn_auth is the URN namespace, and it must be owned by a namespace authority, who is responsible for generating the URNs according to a declared pattern. The urn_auth could be an organization generating URNs, or a generic namespace (e.g., 'term') dedicated to terms, and run by some organization for that purpose. (This latter proposition is particularly notional, as it depends on the existence of a dedicated URN namespace, which has not been proposed to IETF.)

There are multiple existing namespaces which could potentially be used today for expressing URNs for terms, but the process must follow the steps outlined by the authority. For example, the Open Geospatial Consortium (OGC) has obtained the 'ogc' urn namespace to enable this functionality. However, protocols for easily obtaining custom names in this namespace have not been fully established, and it is not appropriate for anyone other than OGC to generate a URN in this namespace.  For this reason, using URNs for terms remains relatively uncommon.

Versioned and Unversioned URLs

URLs for the Unversioned Resource (Ontology or Term)

The versioned (timestamped) URLs above point to a specific version of an ontology or term. In many applications, including most mapping and semantic inferencing activities, the desired resource should be constant, even as specific aspects of that resource (definition, preferred label, or even its semantics) may change.

Here is a non-semantic example of this behavior: the web site http://nytimes.com always contains the most recent New York Times content, even as the content itself changes over time. If you want to find yesterday's version of the New York Times, this web site URL will not help you. (Note: We are not talking about the URI representing this resource in the semantic web; the URI of the resource for the concept described as "the latest New York Times content" could be entirely different, like urn:news:publications:papers:newyorktimes:daily. Yes, that can be confusing.)

The equivalent concept in the world of vocabularies is "what does this term mean today?" This is what the dictionary will tell you—the latest definition of the term. But every time we change the meaning for a term in our repository, we create a new version, with a new URL. So in the URL examples we showed above for our term, we do not have any way to refer to the concept of "the current meaning of the term moored_buoy".  Since the URI for the term changes whenever its version changes, how do we accommodate the need for this 'current meaning' URI?

In MMI we call this the unversioned form of the resource. To create an unversioned form of an ontology or term URI, simply delete the version string (and slash) from the original URL. So, for example, the unversioned form of http://mmisw.org/ont/mmi/20080701T022342/platform/moored_buoy is
http://mmisw.org/ont/mmi/platform/moored_buoy
Note: For this to be unambiguous, the version can never begin with an alphabetic character, and the resourceType can never begin with a digit. We are enforcing this by convention at this time.)

To be clear: The resource that is identified by this URI is as follows: the term that is spelled 'moored_buoy' in the MMI platform ontology. Even if the definition for this term changes, the unversioned URI will represent the same old spelling, but now with the new definition.  So the concept of that resource is the term whose spelling is 'moored_buoy'. If the spelling of the term changes someday—say, people start calling these things moord_buoy, or mooring_buoy—there would be a new URI to match. If this spelling is no longer needed and no longer has a meaning, it will be deprecated, but still available as a (historical) term in the ontology and a corresponding resource.

When mappings are made to these unversioned resources, it is understood that the mapping is intended to persist through all versions of the ontology. This can be thought of as mapping the two labels together. It does not map the two meanings corresponding to those labels as of the time of the mappings, since those meanings may change over time.  While it is tempting to do mappings to unversioned terms (because of the simplicity of inferencing), this will eventually lead to unpleasant consequences as vocabularies evolve. There may be technical reasons to perform unversioned mappings—for example, to identify related text—but expediency is not the best rationale.

Special note about unversioned ontologies: If an unversioned ontology is requested, the user will receive the ontology containing all its terms (including deprecated terms), with the terms presented in unversioned form. This allows convenient mappings to be created to all the unversioned terms of an ontology. If the desired result is the most recent version of the ontology, see the next paragraph.

URLs to Specify the Most Recent Version of an Ontology or Term

It is often the case that a user wants to obtain the most recent version of a given ontology or term. In this case, the version string is replaced by '$'. (The resourceType can never begin with a dollar sign, so confusion is unlikely.) Thus, these requests should obtain the most recent version of the corresponding ontology or term:

The ontology or term returned by this URL should contain metadata necessary to establish which version it represents.

Note that while this could also be considered a unique concept (resource) represented by a web URI, we are not declaring this is a resource. In particular, it should not be mapped in ontologies.

URLs for Summary Ontologies

Note: This section representing ongoing discussion. It has not been extensively reviewed and has not been implemented in MMI services. We would appreciate feedback on its content.

An ontology provider should make individual ontologies available via URLs as discussed above. Additionally, the following summary ontologies can be useful: (1) an ontology defining the top concepts contained in the resources that are made available, and (2) an overarching ontology, which imports all the latest versions of the ontologies available in that host.

Recommended URLs:
For the top ontology:
- http://host/ont/version/top.owl
Example: http://mmisw.org/ont/200708/top.owl

For the overarching ontology:
http://host/ont/version/hostShortName_onts.owl
Example: http://mmisw.org/ont/200708/mmi_onts.owl

For each example, individual terms can be referred to by appending '/' and the term identifier to the above.

The top ontology will be used to map different concepts to a central concept known by the data provider. It will be like the internal glue. More details about the use of this ontology is discussed in the next section.

The overarching ontology helps “customers” to get access to all the available ontologies, by means of only requesting one file. It should also help ontology implementers to debug ontologies. For example, find inconsistencies.

If an individual authority only provides a single ontology covering a range of topics, a suggested name for that resource is allTerms.owl.