Metadata Classifications

In developing data systems, a number of categories have been used for metadata. This guide explains a few of these and some of their strengths and weaknesses.

Many of them are not particularly well defined and represent messy and partially overlapping ways to organize metadata. They are presented here for two reasons. First, so that you will know their meaning if you run across these terms in other metadata-related reading. Second, because in classifying metadata in different ways they highlight the different uses of metadata, which can be valuable to consider when developing your own metadata approach.   

Metadata Classification Techniques

  • Syntactic vs Semantic
    The structure of the data (syntax) as opposed to their meaning (semantic).
  • Use vs Search
    Metadata required for someone to make appropriate use of the data, as opposed to metadata required for finding the data.
  • Static vs Dynamic
    Whether the metadata change through time.
  • By Functional Category
    Includes six different functions performed by metadata.

Syntactic vs Semantic

Syntactic metadata describe what the data look like and how they are organized. Semantic metadata describe what they mean. Semantic data are often considered to be human-oriented rather than machine-usable, but that seems to be an assumption, and not required by the term itself.

Syntactic fields often include the unique variable name, data type (integer, float, etc., including sizes), file format, and units of measurement. Note that the first and last of these certainly have semantic meaning, even if their primary use is for labeling or identification.

Semantic fields are often more descriptive, such as long name, definition, comments, and copyright. For example, the information that a field labeled “SST” holds sea surface temperature measurements is semantic metadata. Most semantic fields would be more widely useful if they followed agreed-upon conventions and terminology. The increasing use of ontologies will likely push semantic content much more into a machine-readable realm.

Use vs Search

Search metadata, also known as discovery metadata, include information that would help a person decide if there were things of interest in a data set or which search keywords to use if they were using a data portal. An observation type such as multibeam bathymetry is an example of helpful search metadata, especially when managed by a system of controlled vocabularies. Search metadata might also be latitude and longitude bounds, so that a computer or a person could know if the data fell within an area of interest.

Usage (use) metadata helps a computer or a person to understand or process the data. Typical use metadata would be calibration parameters, units, and precision information. Use metadata often overlap with syntactic metadata, though they are not synonymous. Usage metadata labels need to be unique to be of value for processing the data, while syntactic data may not.

Based on typical definitions of the terms, the distinction between use and search metadata can be unclear. Some or all search metadata may be automatable, that is, represented in ways that are meaningful to the applications processing and used by that software. Indeed, this will be necessary to facilitate widespread data mining. Furthermore, some use metadata will be of interest to people searching for data, even though it is more oriented toward computer applications.

When designing a metadata approach, it is important to consider both the terms and characteristics that people or systems will need to search for and to find your data, and also the details that people and systems would need to know to use them. 

Static vs Dynamic

Static metadata are not expected to change much over the life of the data they describe, even as the data evolve. Conversely, dynamic metadata are a function of the contents of the data, so as a data set evolves, dynamic metadata change. In reality, even static metadata may have to be changed if, for example, incorrect information was captured and the error discovered later.  

An interesting special case involves the seeding of metadata prior to the arrival of data themselves. Metadata captured before data arrival are implicitly static and can be associated with that data permanently, possibly as part of an automated process embedded in the data stream. Metadata captured after data arrival imply some other process for entering that information.

When planning your metadata process, it is worth considering which metadata you expect to be persistent through time, and which will need to change. And you will need to determine the processes for updating dynamic metadata, and for handling an unexpected need to adjust or correct static metadata.

By Functional Category

In their 2006 paper (see reference below), Ganesan Shankaranarayanan and Adir Even propose six types of metadata based on their function:

  • Infrastructure metadata describes the components of the computer systems, such as hardware, operating systems, networking and database servers. It is primarily used for system maintenance.
  • Model metadata (the data dictionary) describes the modeling of data into entities and their relationships (e.g., tables and column headers). It includes conceptual, logical, and physical descriptions as well as semantic information, such as terms used and how they relate to other terminology in the system.
  • Process metadata provides information on how data are generated and the changes they undergo from source to target.
  • Quality metadata includes both a description of the physical size (number of records, bytes) as well as quality measurements, such as the accuracy and completeness of the data.   
  • Interface (delivery and reporting) metadata captures how the data is used, such as where and how much data are delivered (e.g., downloaded from an online system) and in what formats. It can also include how the data are used in derivative products like reports. 
  • Administration metadata includes information on users, security, and access privileges to data and applications.

Classification Summary

The classifications above represent frameworks to consider in designing a metadata system. When designing your system, it is important to understand what kind of data you are dealing with, what kinds of questions you need to answer with your metadata, and which distinctions above are likely to be relevant to your system and which are not.

Finally, the categories above illustrate the importance of precise terminology when collaborating on a design. Be sure that the data system's developers are using search metadata to mean the same thing you are. Lists of metadata fields, and the user queries they will enable, are helpful tools to ensure agreement and understanding.

References

Shankaranarayanan, G. and A. Even. The Metadata Enigma, Communications of the ACM, Vol. 49, No. 2, pp 88–94, February 2006.

Suggested Citation

Graybeal, J., Stocks, K., Miller, S.P. 2016. "Metadata Classifications." In The MMI Guides: Navigating the World of Marine Metadata. http://marinemetadata.org/guides/mdataintro/mdatadefined/howclassified. Accessed December 14, 2019.