URIs: What, Why, and How?

Definition of URIs

A URI, or Unique Resource Identifier, is a Web naming/addressing technology that uses short strings to refer to resources in the Web. These resources could be Web Documents, services, electronic mailboxes, images, downloadable files, or other electronic entities. URI encompasses both URLs (Universe Resource Locators) and URNs (Uniform Resource Names). For more details, you may wish to visit W3G guidance about URIs, or the 2001 W3G recommendations on URIs, URLs, and URNs and the W3C Web Architecture, or MMI's collection of URI references, which includes references to many of the key URI technologies.

Why do we need URIs?

  • Some specifications (e.g. OGC - SOS) expects URIs to be the identification media for properties, features of interest, time types, coordinate system, process types, units, contact information roles, etc.
  • Having controlled vocabularies encoded in URIs will allow to present them in Semantic Web tools, e.g. an Resource Description Framework (RDF) graph. This implies that it will be easier to mapping distributed vocabularies, making inferencing among them and sharing concepts among communities. A lot of tools have been developed in different languages that make the previous possible. For example, Protege, SWOOP, and VINE.
  • A URI is unique and can be linked to other features in a system. For example: linked to a thesaurus use in web portal, or linked to a preferred icon or linked to process type that knows how to best present the data being referenced by the URI.

What is an ontology and what is RDF?

An ontology is a description of concepts. RDF stands for "resource description framework", and is a way of describing resources, for example words that refer to concepts. So an ontology can be expressed as a graph of RDF resources.

In the RDF framework, any resource has a URI associated with it. Two nodes in the graph and its link composes an RDF triple. An RDF triple is like sentences, composed of subject, predicate and object. The RDF triples are sometimes also known as resource, property, value triples.

So, informally a triple can be as follows, corresponding to the concept that "temperature is of type parameter":

Example 1: An Informal Triple

Triple Part Our Example
Subject temperature
Predicate is of type
Object parameter

In this case, temperature is the subject, is of type is the predicate and parameter is the object.

But the story does not end there. Formally, the subject and predicate must be a Uniform Resource Identifier, or URI (the object could be anything). So a more correct version of the previous triple is:

Example 2: A Formal Triple

Triple Part Our Example
Subject http://marinemetadata.org/2005/03/voc#temperature
Predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Object http://marinemetadata.org/2005/03/voc#parameter

What are URI bases and fragments?

"The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information" W3c Web Architecture. In URLs it is usually the string after the "#" character. The fragment is actually, the string after the last non-alphanumeric-character (Any character different than letter A-Z or a-z, or any digit 0-9.) found in a URI. So, in the URI http://purl.org/dc/elements/1.1/description, description is the fragment identifier. The base URI is then the part of the URI that is not the fragment. In the previous example is the URI base is "http://purl.org/dc/elements/1.1/".

How do I know which URI to use?

If you are using an ontology or other web term to do metadata annotations. Generally each term that lives in a semantically enabled vocabulary will have its own URI, so you use the URI corresponding to the vocabulary and term you need to identify.

A system like OOSTethys is designed to understand most types of URIs, though some systems may require URIs of a particular type, or from a particular vocabulary.

For example the CF ontology at MMI, as well as the OGC URIs for time, latitude, and longitude, are all used in the OOSTethys system. Units from multiple unit vocabularies may be specified, using the URI for that particular vocabulary.

Ideally you can find a vocabulary that contains all the URIs you need, but if not, most applications work with a mix of URIs from different vocabularies.

If I want to add a term (URI) how do I do it ?

You could add terms for parameters and source types. For parameters you need to use unidata units or UCUM units. For source types (e.g. sensor types) and you need to map this with the MMI ontology here.

  1. Create an ontology of your terms. You can use Protege, or if you can export them to an ASCII delimited format you can use Voc2OWL
  2. Need to map your terms with CF terms, To do this you can use VINE
  3. Submit the mappings to the MMI server

How do I encode URIs in OGC?

One way to encode a URI is as follows:

...

<gml:identifier codeSpace="{URI_base}">{URI_fragment}</gml:identifier>

...

The OGC is discussing recommendations for consistent use of URIs in its standards.

How do I encode URIs in ISO 19139 (ISO 191115 Application Schema)?

Details are described in Encoding URIs in ISO 19139