Metadata for All
Why Metadata?
The scientific data collected by a computer system is that system's most important product. For a brief time, data can even stand on its own, without descriptions or context, and provide the desired results.
Eventually, though, the data will ever be used in other ways, by other people, or at other times. This is where metadata plays an essential role. What a person forgets about the data, or someone from another project never knew in the first place, metadata can remember and explain. Good metadata can even explain data for another computer to understand it and make use of it.
Not all metadata travels with the data. You may watch a TV show on your television. Let's call this "the data". But the descriptions of the TV show (it's "metadata") in the TV Guide arrives in the mailbox. And the TV Guide might not tell you everything you wanted to know about the TV show you're watching, so you might go online to learn more. You can even use the metadata to tell a computer, like a digital video recorder, what to do ("record any show if it is Sports/Soccer and is a Live Event").
Just as metadata contributes to your enjoyment of television, metadata is central to the use of scientific and engineering data from marine observing systems. And the Marine Metadata Interoperability project provides key benefits, both to the science and engineering activities, and to the many users of those systems.
So, what is MMI doing? The following sections give some examples.
Helping users interpret and combine their data
When a scientist (or their technical support team) want to use data sets from other scientists, those data sets are often labeled unclearly. Sure, they may have names on each item, like "Chorophyll" and "Temperature". But just seeing the name is not enough to use that data, if you don't know the precise measurement that produced the chlorophyll value, or whether the temperature is air temperature, water temperature, or instrument temperature. And sometimes, there might just be a name that the science users haven't seen before.
Until recently, the only way a user could hope to know the actual meaning of a measurement or calculation was to ask the person who created it. But if that person isn't available, or has forgotten, then the value of the data is lost to science. If the original users wrote down a definition for the item, that might be the solution. But as it turns out, data creators almost never do that. (It's hard, and they don't need it when they first use the data, because it is fresh in their minds.)
Today, we can make connections and translations using computers. Services like those written at MMI can make it simple for a scientist or other data collector to describe their data using standard names, that already have detailed definitions for them. And the computer can even help do that, looking for common names and suggesting possible relationships. When we make it easy for the data provider to "do the right thing" describing their data, the data becomes useful to everyone, for a long time.
Capturing knowledge and understanding
Recently, MMI held a workshop. Experts from 6 different domains (5 science and 1 technical) gathered and tried to find common names and relationships between different data sets and vocabularies. The groups found a few common problems:
- Nobody defined their terminology, even in the standard vocabularies.
- It is impossible, or at least very hard, to understand someone else's names, if you aren't a part of the project.
- It takes time and expertise to pull that information together after the fact.
MMI produces tools that makes it easier to compare, search, organize, and relate vocabularies from these different domains. Even more important, we are creating concepts and processes to enable these connections to be made when systems and vocabularies are built, and everyone knows what they meant by a particular term. And, we are encouraging people to follow good practices when building their systems.
We believe that these steps will make it possible for people to make much better use of data generated today, than is possible for most data generated 20, 10, or even 5 years ago.
Making it possible for computers to work together
The systems MMI is creating and advocating don't just make it possible for humans to understand these data sets. They make it possible to define the data in ways that computers can understand them too.
The concept of the "semantic web" is that computers will be able to make meaningful connections, on their own, between different terms and declarations on different parts of the web. In a very simple way, the translations that MMI supports will help make that possible. Today, when a scientist searches for "salinity" in a marine data set, they will not find those data sets that contain "water conductivity", "water temperature", and "depth" or "pressure" -- even though these three variables can be used to calculate salinity. By defining those relationships, as well as simpler ones like "water temp" = "water temperature", we hope to make it possible for computers to be much smarter about the data they are working with.
Making information about marine terminology accessible
You may find a term sometime that you don't understand, like "PCO exchange". If it's a common term, Wikipedia or Google may be able to find and define it, but if it's specialized, those resources may not help either -- and they won't know how a specific data set used it.
Even if MMI can't give you a sure definition for that term, we'll often be able to tell you where we've come across that term (and others like it), and give you a good idea of what it might really mean in your context. The more these systems are used and shared, the better they will work to provide everyone answers they are looking for.
For More Information
For more information on metadata and the MMI project, see the links below, or contact us using the links at the bottom of this page.