WDB Metadata Guide
Updated 911 Days AgoPublic

WARNING: This guide is incomplete

WDB is a data storage solution for weather and water data based on the PostgreSQL object-relational database system. The system utilizes PostGIS for GIS (Geographic Information Systems) support and handles regular grids (e.g., forecast fields) and point (e.g., observation) data.

This guide details some of the principles and methods used to deal with metadata in the system

Introduction

The WDB system contains a lot of metadata that is used to identify and describe the values stored in the database. The WDB system administrator must have a good understanding of this metadata in order to maintain the system effectively.

About this Manual

This manual is intended for the system administrator of the WDB system. It describes how to maintain the metadata in the WDB system. It is assumed that the reader has some familiarity with SQL and weather data.

For a more elaborate description of the vision and system architecture of the WDB system, see the WDB Developer's Manual.

Intended Usage

The metadata in WDB is used to describe the values stored in the system. By carefully and correctly maintaining this metadata, it is possible to view stored data from several different points of view, while maintaining the consistency of the stored data.

Outline

This document is structured as follows:

  • Key Concepts: introduction to the concepts and terms used in the documentation
  • Data Provider
  • Place Definition
  • On Time: Reference and Valid Times
  • Parameters: Value and Level Parameters
  • System Metadata
  • FAQ: Frequently asked questions
  • Known bugs and limitations

Key Concepts

This section defines and explains some of the key concepts of the WDB metadata system.

WDB Metadata Dimensions

WDB is a database system that stores weather data items. Each data item in WDB could be an observation, a forecast, an analysis, etc. Each item consists of a value and a number of dimensions that describe the value.

Each of the seven dimensions in WDB is used to describe the data item; crucially, for any given data item, there is a set of these seven dimensions that uniquely describe that data item. No two data items in the database can ever have identical combinations of these dimensions. The metadata dimensions in WDB are:

  • Data Provider: the provider of the data item (e.g., the data source)
  • Place Definition: the spatial location for which the data item is valid
  • Reference Time: the time of reference at which the data item was created
  • Valid Time: the time interval for which the data item is valid
  • Value Parameter: the parameter which describes value of the data item (e.g., air temperature)
  • Level Parameter: the parameter which describes the value of the level information (e.g., height)
  • Data Version: the version number of the data item

Of these dimensions, data version is simply a number sequence and will not be discussed further here. The other dimensions will be discussed in greater detail in the following chapters.

WDB Namespaces

There are a large number of different naming conventions that exist in the meteorological world. Each of these naming conventions may be valid in its own context, but there is also typically a large overlap (for instance, a meteorological station may be identified by several different indicators or index numbers). WDB provides name spaces as a mechanism to handle this kind of issue.

A namespace is simply a mapping from a set of names to a set of metadata. Within that mapping, each of the metadata items are unique; thus, one particular data item may have many different identifications in different namespaces, but only one specific identification within any particular namespace. For example, three countries within one namespace may be named Denmark, Norway, and Sweden. In another namespace, they could be named Danmark, Norge, and Sverige. Namespaces can be used to permit support naming schemes based on different code tables, translations, and other

WDB defines namespaces for three types of metadata: a data provider namespace, a place namespace, and a parameter namespace. For each type of metadata, there always exists a default namespace. The naming in the default namespace is automatically generated from the base metadata in the database.

A namespace in WDB is usually identified by a numerical ID. The default namespace uses ID 0; the test namespace (utilized for the WDB testing framework) is defined by the ID 999. The namespaces from 1 to 254 are reserved for usage by WMO centers (we use the same ID code as the WMO numbers).

Data Provider

This section defines the metadata for data providers.

Definition

The Data Provider identifies the source of the data; literally, the entity that provides the data. Where multiple sources could be identified as the source of the data, the entity that can be identified closest to the creation of the data item at the time of loading is usually used.

A data provider can be a software process (e.g., Hirlam), a meteorological or climate station, an aircraft, or a person. A data provider is identified by a DataProviderName. DataProviderNames are used to search for the data. For convenience, Data Providers may be collected into groups; e.g., the DataProviderName "Hirlam" might be used to search for any of the various Hirlam processes: "Hirlam 4", "Hirlam 8", etc.

When searching for data, the user may specify a single DataProviderName, a list of DataProviderNames (specified using an ARRAY constructor), or NULL. NULL indicates that the user wants all data items, regardless of the data source.

DataProviderNames exist within a data provider namespace. A namespace can be defined by the WDB administrator, in order to permit the user or an application to retrieve data in an accustomed language or code set. The default namespace of WDB is the DataProviderNameSpaceId 0, and is always based on English language names and international standard codes. The data provider namespace being used in a querying session can be defined by the user when starting up the session.

Browse Data Providers

To retrieve all of the DataProviderNames (excluding data provider groups) that are currently stored in the database for the currently specified namespace, the following wci.browse function call could be used:

SELECT * FROM wci.browse( NULL::wci.browsedataprovider );

To retrieve all of the DataProviderNames (including data provider groups) that the database currently has the capacity to store and display in the currently specified namespace, the following wci.info function call could be used:

SELECT * FROM wci.getDataProvider( NULL );

Add Data Providers

To add a new data provider to the database, the wci.adddataprovider function is used. The following function call adds the dataprovider gribload to the database.

select wci.adddataprovider( 'gribload',
                            'computer system',
                            'grid',
                            '1 day',
                            'Data received from WDB gribLoad (unidentified)' );

The first parameter in the function is the data provider name itself. The following parameter details the type of data provider. The current data provider types available are:

  • aeroplane
  • computer system
  • data provider group
  • named observation site
  • organization
  • person
  • ship
  • wci user

The third parameter describes what kind of data can be written by the data provider. Currently there are two kinds of data that are possible; 'point' and 'grid'. Specifying 'any' here indicates that the type of data provided can be either of the above options.

The fourth parameter describes the lifetime of the data in the WDB system as an interval. Depending on when (and how) the cleaning programs are run, the database will be expected to clean out data once it's age exceeds the number given here.

The fifth parameter is a comment field. Ideally, it contains a few brief lines of text that describe and explain the data provider entity.

Data Provider Groups

Data provider groups permit the user a convenient way to retrieve a set of data providers without having to specify each data provider individually.

TODO: Add examples

Place (Geographic Location)

This section defines the metadata for place definitions.

Definition

The place (geographic location) of a data item is the position of the item on the earth in a 2D space. In WDB, the geographic location is by default specified using longitude and latitude in a WGS84 coordinate system (though this can be changed when the database is set up; consult your system administrator). The geographic dimension is specified using a geometry object and can be either a point or a polygon.

In addition to using geometry objects to retrieve data, the user can also use a PlaceName - a pre-specified name that defines a geometry object in the database - to specify location. PlaceNames exist within a place namespace that can be defined by the WDB administrator. The default namespace of WDB is the PlaceNameSpaceId 0; defined in international english.

PlaceNames are essentially a short-hand for the user; each PlaceName uniquely identifies one geographical object (referred to as a Place definition) in the database (note that multiple PlaceNames may identify the same object), allowing the use of "aliases".

Place Definitions are always defined using a spatial reference ID (SRID); defined using PROJ.4 definition strings. WDB is set up with the default SRIDs that come with the Postgis package; additional SRIDs as required by new data types are added as needed in WDB meta data packages.

Browse Place Definitions

To retrieve all of the Place Definition Names that are currently stored in the database for the currently specified namespace, the following wci.browse function call could be used:

SELECT * FROM wci.browse( NULL::wci.browsedataprovider );

To retrieve all of the Place Definition that the database currently has the capacity to store and display in the currently specified namespace, the following wci function call could be used:

SELECT * FROM wci.getPlaceDefinition( NULL );

To retrieve the full list of SRIDs registered in the database, the following wci function call could be used:

SELECT * FROM wci.getsrid( NULL );

Add place Definitions

There are various differing functions to add a place definition to the database, depending on the type of place definition to be added. The function wci.addplacepoint is used for adding point data, while wci.addplaceregulargrid is used to define regular grids as follows:

select wci.addplacepoint('oslo',
                         st_geomfromtext('POINT(10.7464 59.9111)',4030));

The first parameter in the function is the place name, while the second is the geographical position of the place definition in the default coordinate system of the WDB instance (as a binary geometry). Note that the SRID provided must be the same as the default PostGIS SRID used to build WDB (by default, this tends to be PostGIS ID 4030).

select wci.addplaceregulargrid( 'ecmwf 0.5 grid',
                                187,
                                109,
                                0.500,
                                0.500,
                                -21.0000,30.0000,'+proj=longlat +a=6367470.0 +towgs84=0,0,0 +no_defs');

The third parameter describes what kind of data can be written by the data provider. Currently there are two kinds of data that are possible; 'point' and 'grid'. Specifying 'any' here indicates that the type of data provided can be either of the above options.

The fourth parameter describes the lifetime of the data in the WDB system as an interval. Depending on when (and how) the cleaning programs are run, the database will be expected to clean out data once it's age exceeds the number given here.

The fifth parameter is a comment field. Ideally, it contains a few brief lines of text that describe and explain the data provider entity.

Spatial Reference ID

All SRIDs must be defined with "+no_defs". Not including the "+no_defs" parameter in the PROJ.4 transformation string makes the SRID dependent on the /usr/share/proj/proj_def.dat, essentially breaking the integrity of the metadata in the database.

Parameters

This section defines the metadata for parameters.

Definition

The Parameter in WDB identifies the characteristic or measurable factor of the value being parameterized. Parameters provide a definitive description of what the data represents, including spatial and temporal properties. The parameter names are based around the NetCDF climate and forecast (CF) metadata conventions. Unfortunately, CF standard names are not well suited for the purposes of WDB, as they lack uniqueness; consequently, the WDB parameter system is only based on, rather than a direct adaption of the CF metadata convention. Every CF standard name should be possible to map directly to a WDB parameter, but the converse may not always be the case.

Parameter names exist with a parameter namespace. A namespace can be defined by the WDB administrator, in order to permit the user or an application to retrieve data in an accustomed language or code set. The default namespace of WDB is the ParameterNameSpaceId 0, and is always based on English language names and maps CF standard names. The parameter namespace being used in a querying session can be defined by the user when starting up the session.

Browse Parameters

To retrieve all of the value parameters that are currently stored in the database for the currently specified namespace, the following wci.browse function call could be used:

SELECT * FROM wci.browse( NULL::wci.valueparameter );

For the level parameters, use

SELECT * FROM wci.browse( NULL::wci.levelparameter );

To retrieve all of the ValueParameterNames and LevelParameterNames that the database currently has the capacity to store and display in the currently specified namespace, the following wci.info function call could be used:

SELECT * FROM wci.getparameter( NULL );

Add Parameters

To add a parameter to the database, the wci.addvalueparameter function is used. The default parameter name structure is constructed based on the Guidelines for Constructing CF Standard Names. The function call for adding a parameter is as follows:

  select wci.addparameter( 'standard-name',
		           'surface',
		           'component',
			   'medium',
			   'process',
			   'condition',
		           'methods',
			   'unit of measure' )

This gives a default parameter in the default namespace of:

single-word-surface component standard-name //at// multi-word-surface //in// medium //due to// process //assuming// condition [methods]

Standard name is one of the CF standard names.

Surface is defined as a function of a horizontal position. A new surface is added to the database using the wci.addcfsurface function, for example:

select wci.addcfsurface( 'sea level', 'MSL - mean sea level' )

Component defines the spatial component of the parameter. A new component is added to the database using the wci.addcfcomponent function, for example:

select wci.addcfcomponent( 'upward', 'Upward component' );

Medium indicated the local medium or layer of the parameter. A new medium is added to the database using the wci.addcfmedium function. Example:

select wci.addcfmedium( 'atmosphere layer', 'Atmosphere layer medium' )

Process specifies a physical process. A new process is added to the database using the wci.addcfprocess function. Example:

select wci.addcfprocess( 'large scale precipitation', 'Due to large scale precipitation' )

Condition indicates special circumstances of the parameter. A new condition is added to the database using the wci.addcfcondition function. Example:

select wci.addcfcondition( 'deep snow', 'Assumption of deep snow' )

Methods indicate the calculations used for the parameter. A new method is added to the database using the wci.addcfmethods function. Example:

select wci.addcfmethods('maximum within days', 'Maximum value', 'max' );

Unit of measure is the standard unit of measure used by the parameter. WDB uses SI units defined using context-sensitive UCUM.

The default (canonical) parameter name is constructed using the various components described above.

For examples of adding new parameters to WDB, see the wdb_parameters install files.

Set Parameter Names in Namespace

The canonical parameter name is valid for the default (0) parameter name space only. To set the parameter name in other namespaces, the wci.setparametername function should be used. Example:

select wci.setparametername( 'air temperature', 'TEMP' );

This sets a parameter name 'TEMP' in the currently defined namespace, which is equivalent to the canonical parameter 'air temperature'.

If you are satisfied with using CF-like parameters, then the function wci.copyParameterNameSpace( 0 ) can be used to copy all of the parameters in the default namespace into the currently defined parameter namespace.

select wci.copyParameterNameSpace( 0 )

Parameters created using this method are similar to the CF standard name, except that the short form of the CF methods precedes the CF standard name in order to generate a more natural language parameter description. Thus 'air temperature [maximum over days]' becomes 'max air temperature'.

For an example of adapting and adding parameters to a private parameter namespace, see the Met.no WDB metadata files.

Last Author
michaeloa
Subscribers
None
Projects
None

Document Hierarchy