Climate Science Modelling Language
Description
The emerging standards of ISO TC211 for earth-related information provide a general framework for broad data interoperability:
- Important data concepts are codified through formal data models (ISO 19103)
- Individual abstractions are known as "feature types" and may be catalogued for re-use in repositories ("feature type catalogues", ISO 19110)
- The logical structure and semantic content of datasets is described through an application schema in terms of feature instances and other objects (ISO 19109)
- Finally, canonical encoding rules may be specified for serialisation of datasets (ISO 19118). Alternatively, mappings may be defined from an application schema to the internal data models of third-party file formats or data delivery services.
The Climate Science Modelling Language (CSML) has been developed as an attempt to define structured semantic data models for the climate sciences within this standards framework. A number of climate-science 'feature types' are defined: for point, profile, and gridded data, as well as series of these in time and space, and a geometric trajectory.
CSML has been designed explicitly with a dual purpose. In addition to modelling various climate science data types, it provides a mechanism for wrapping and aggregating file-based data storage (e.g. netCDF, GRIB, NASA Ames formats) to provide a uniform semantic interface to climate science data.
The initial prototyping of CSML is being applied to a variety of data types across the curated holdings of the British Atmospheric Data Centre (BADC) and British Oceanographic Data Centre (BODC) through the NERC DataGrid project. A parsing and processing software suite is being written concurrently. Software releases will be available through the CSML website.
