Usage vs Discovery Vocabularies

In the guide Vocabularies: Dictionaries, Ontologies, and More, we used the term “altitude” to describe part of the spatial position of something. We may complete the spatial description by including the terms “latitude” and “longitude.” “Latitude” typically refers to a value that describes north-south placement (or y-coordinate ) of something on the earth (more generally a rotational ellipsoid). Used with the term “longitude” (to describe the east-west placement (or x-coordinate), we can fully specify the position of something on the earth.

Consider a data asset that contains altitude, latitude, and longitude values. The asset may be a database table, a spreadsheet, or a text file. The asset could use plain English names for the columns of numbers, that is, it could use altitude, latitude and longitude. Alternatively, the names could be cryptic codes or abbreviations such as ALT, LAT, and LONG. The names used within the asset represent what we refer to as a usage vocabulary. A usage vocabulary is important when clients—or software applications—want to effectively access the data.

However, when discovering the content of an asset, the usage vocabulary may or may not be useful depending on how cryptically the data columns are named.

Therefore, to facilitate discovery, we use a discovery vocabulary, which uses terminology to identify the data that are common to the subject community. Discovery vocabulary terms can take a variety of forms:

  • They may be identical to terms in the usage vocabulary. This is the situation when the data asset uses common language terminology to identify the data, for example, data values identified as temperature or salinity.
  • They may represent groups of terms in the usage vocabulary. This is a common situation for legacy assets, where cryptic codes have been used to identify similar data from multiple sources. For example, consider a legacy data asset that contains temperature values from sensors A, B, and C. These data are identified within the asset as ATEMP, BTEMP, CTEMP. The discovery vocabulary term that encapsulates all three usage terms would be temperature, as illustrated in the image below.
    Data Asset
  • They may represent groups of data values. In this case, the discovery vocabulary terms identify particular subgroups of the data, rather than all of the data. For example, if the data asset contains geology data, then certain geological time periods (e.g., Mesozoic Era) may be identified in the discovery vocabulary. In physical oceanography, a discovery term may identify a particular water mass (e.g., Labrador Sea Water) that has particular characteristics (e.g., physical or chemical).

Discovery vocabularies aid a person in finding the data asset, while the usage vocabulary aids in use of the asset. Both vocabularies can pertain to data-related topics such as parameters, platforms, sensors, geographic areas, etc., and both usage and discovery vocabularies are specialized forms of controlled vocabularies.

Suggested Citation

Isenor, A. 2011. "Usage vs Discovery Vocabularies." In The MMI Guides: Navigating the World of Marine Metadata. Accessed July 9, 2020.