4. Submit Metadata to Clearinghouse
Actor(s)
A data producer with (one or more) data sets who wants to submit relevant metadata to (one or more) clearinghouses.Background
When a user like Don creates a single data set and wants to submit it to a clearinghouse like GCMD, one of the potentially new and possibly painful discoveries is that there are vocabulary terms that must be selected to indicate what is in the data set. If Don just has a single data set, and it only has a few variables, and he is just submitting it to one clearinghouse, this doesn't seem to be an issue -- Don can go through the provided keywords and select a few that seem appropriate. (He may not do a very good job of it, not being familiar with the keywords for that clearinghouse and generally being eager to get through the process, but the customs have at least been observed.)
But if we extend this simple use case in any direction, Don may throw up his hands. Assuming he wants to save time and maximize the utility of his work, the manual translation approach will grow old very fast.
If Don has not one but 10 data sets, this process is tedious at best.
If there are many data items in each data set, with many different parameter names, the complexity is too large for manual implementation.
If Don wants to submit metadata to multiple clearinghouses, and they each use different (or even multiple!) vocabularies, Don would seem to be out of luck. His grant certainly did not pay for that kind of data translation tedium.
If Don is dealing with data sets that are automatically generated by observing systems, and may even produce variables with different names depending on the particular calculations and data available that week, a manual approach is untenable.
And most of all, if Don wants to do a really good job of managing his data and metadata, and exposing it to the people who are most interested in it, he will need a more robust translation than any individual will do manually using a point-and-click web site.
User Scenario
So Don wants to do the following:
- One time for each variable he creates, map that variable, or term, to some detailed community vocabulary or code.
- Have a widget at hand that can automatically accept a term in his own vocabulary, and give back the community term that corresponds.
- Have another widget at hand that can automatically accept the community term and a target vocabulary, and translate that term into the target vocabulary.
- One time only, write or have access to a widget that can determine each input data sets variables; accept one or more target clearinghouse systems; and transform each metadata description through widgets in the steps above to produce the unique terms required for each target system.
Solution Space
A number of data system developers have programmed their systems to produce, or accept, multiple metadata description formats (FGDC, ISO 19115, DIF, and so on). The applications are capable of mapping the content format in the source data repository, to different content formats for each of the target systems. For an example, see the GI-go link below.
For the most part, these systems do not make any attempt to translate terms, and so the next frontier is providing the technologies, and data resources, to do so.