Meeting Notes 2004.05.18

MeetingNotes on 20040518

Standard Naming Exercise

MeetingNotes20040518 (meeting 2)

What do we want to do today?

  • fill in blanks

  • standardize domain column items

    • this is useful regardless of how the results will be used (e.g., ocean/river/lake vs water)

    • could construct a class hierarchy (but how useful is that?)

    • will get gnarly when we get to current

  • discuss how units column should be used

  • standardize most basic variable names (large percentage of file contents)

    • how to indicate quality flag, e.g., _QF?

  • identify references

=== Review of Element Issues ====

  • how is domain applied?

    • use the most generic term for it

  • should Domain always be a part of name?

    • appears to be a user- and use-specific issue

    • keeping domain and parameter names separate is very useful (sayeth the DB pros)

  • distinction between bottles and in-situ?

    • not at this level, one may be treated as the other

  • columns 'instrument' and 'sensor' confusing (which is which?)

    • could improve definitions in the spreadsheet

    • how primary is this information?

  • units has been subject to a lot of work

    • don't store as knots, since it isn't metric (or maybe just be sure to convert for science use)

    • have precise terminology, so we know degC = degrees C = centigrade

Discussion of Preferred External Standards

McCann: [http://sweet.jpl.nasa.gov/sweet/esmf.xls esmf.xls-ESMF/CF Taxonomy] [http://sweet.jpl.nasa.gov/sweet/gcmd.xls gcmd.xls-GCMD Keywords Taxonomy]
  • extended presentation of variables vs Realm, Phenomena, Property, Substance, Space, Numerics, Time, Biosphere, Services, Data

  • lots of measurements exist, across a wide range of domains

  • includes domain in some of the names

  • this ontology is used in the registration of data sets (via form) at GCMD

  • form exists to enter new terms for describing their data

    • indication is NASA is quick to respond to new requests

    • Gene Major is implementing the GCMD, process for adding names is not rigid

    • we could use these for comparison as we're filling out our variables

    • how mature is this? can we use it as a sole basis?

      • interesting that Earth Realm doesn't include atmosphere in several places that it might

      • using this model, "first submission wins", so it may not be totally systematic

    • what distinguishes between CTD profile and measurement at given depth?

      • this may be different kind of information, not captured in this information space

  • we're looking at automating processes, rather than providing human-searchable data

Kolber: [http:// ARGO Data Management User Manual] (.pdf)
  • contains a very well described structure with detailed info on lots of variables

  • variable names are short and do not include domain, which is expressed in a long name

  • how to handle different domains in short name?

    • short name tends to be meaningless, since long name is what appears on plots and contains domains

    • interoperability affected by repeated use of same names

  • ARGO is dealing with limited data types -- this simplifies the issues (e.g., they are not dealing with water temperature and air temperature)

  • might be interesting to take some of these common names (lat, long, jday, depth) and map the information they provide to our spreadsheet

    • ARGO vs GCMD? neither one necessarily what *we* wanted to do

McCann: [http://sweet.jpl.nasa.gov SWEET Ontology (with Protoge)]
  • download Protege from Stanford site, and install Ontology Web Language (OWL) plugin

  • can also view on web (but not with Safari)

  • tool can be used to create new ontologies or additions to existing ontologies

  • large user and developer community for this tool

  • submissions can be submitted to web library

  • could take SWEET ontologies (which are based on GCMD) and add our own information

  • putting cart before horse?

    • take spreadsheet and group thing together

Others For Future Discussion
  • Schramm: EPIC Key File Variables

    • contains a long list of variables and associated information

    • variables are organized by numeric indices

  • Graybeal: SensorML

    • an ontology for what data describes sensors (their location, etc.)

    • recent release just came out

  • MarineXML

    • complicated histories here: European and Australian versions

    • BODC reconciled European version with their own? Australian based on BODC originally?

    • BODC has old data sets that have to be compatible

    • very hard to keep up with this work or know where it's going

  • starting to id discrepancies with other references could start feeding this back to those references

Completing the Parameter Column of the Compiled Spreadsheets

  • lots of classes (for an ontology) becoming apparent

  • starting to normalize some names

  • time could be challenging to represent

  • almost through this column

The Plan

Get together next Tuesday at 8:30 AM to continue this process.

  • Anyone who wants to fill out the easy Parameter entries for the rest of their data, send results to John

  • Next steps include:

    • Finish the Parameter column

    • Do the same for Domain as we did for Parameter

    • Go down the Name In Use column and propose better names, given what we now know