NERC DataGrid

UK-based project developing Grid middleware and metadata systems for discovery and access.

NERC DataGrid (NDG) is a UK-based "e-Science" project developing Grid middleware and metadata systems to support uniform discovery and access to a range of environmental data across the UK.

The prototype is focussed on the curated archives of the British Atmospheric Data Centre (BADC) and the British Oceanographic Data Centre (BODC). The middleware developed, however, will enable other managed archives and individual research groups to federate their data resources through NDG services.

Key developments of NDG include:

  • An overall metadata taxonomy ([1,2,3,4]) providing a framework for decomposition of the NERC metadata domain. Key classes include usage metadata, domain metadata, and discovery metadata.
  • A rich domain metadata model , or ontology ([1]) for representing and linking concepts around environmental datasets. This provides a means in NDG to navigate meaningfully between related datasets in a way hitherto impossible.
  • A standards-based data model and markup, the Climate Science Modelling Language ([5]). This has been explicitly designed with a dual purpose – both as a semantically rich format independent data representation language (when exposed through a service), and as a wrapper mechanism to encapsulate legacy file-based storage.
  • A robust and scalable discovery federation mechanism based on the harvesting protocols of the Open Archives Initiative.
  • A formal software architecture ([6]) based on the Reference Model for Open Distributed Processing (RM-ODP) that should capture requirements of the wider environmental community.
  • The scoping of a scalable federated role-based authorisation framework ([3]) that meets the requirements of the environmental community, and will interoperate with existing access control mechanisms. This is currently being implemented jointly with a partner project, the NERC Eco-Grid, extending the authorisation framework of the CCLRC DataPortal ([7]).

A prototype harvesting and discovery portal has been implemented, with preliminary XML-based data extraction and plotting tools. The remainder of the project will focus on continued implementation and rollout of NDG middleware for data delivery and security, and systematic metadata population across the bulk of the BADC and BODC archives.

References

  1. O'Neill, K., et. al. (2003), "The Metadata Model of the NERC DataGrid", Proceedings of the UK e-Science All Hands Meeting, 2003. S.J. Cox (Ed) ISBN 1-904425-11-9
  2. Lawrence, B.N., et. al. (2003), "The NERC DataGrid Prototype", Proceedings of the UK e-Science All Hands Meeting, 2003. S.J. Cox (Ed) ISBN 1-904425-11-9
  3. Lawrence, B.N., et. al. (2004), "The NERC DataGrid: 'Googling' secure data", Proceedings of the UK e-Science All Hands Meeting, 2004. S.J. Cox (Ed) ISBN 1-904425-21-6
  4. O'Neill, K., et. al. (2004), "A specialised metadata approach to discovery and use of data in the NERC DataGrid", Proceedings of the UK e-Science All Hands Meeting, 2004. S.J. Cox (Ed) ISBN 1-904425-21-6
  5. Woolf, A., et. al. (2005), "Climate Science Modelling Language: Standards-based markup for metocean data", 85th American Meteorological Society Annual Meeting, San Diego, January 2005.
  6. Woolf, A., et. al. (2004), "Enterprise specification of the NERC DataGrid", Proceedings of the UK e-Science All Hands Meeting, 2004. S.J. Cox (Ed) ISBN 1-904425-21-6
  7. Manandhar, A., et. al. (2003), "Grid Authorisation Framework for the CCLRC DataPortal", Proceedings of the UK e-Science All Hands Meeting, 2003. S.J. Cox (Ed) ISBN 1-904425-11-9
Maturity Estimate: 
Pre-Operational