Submit data as NetCDF/CF

If you want to publish data through the Arctic Data Centre, your data have to be submitted as NetCDF following the Climate and Forecast convention (CF). The main reason for this is to standardise the use metadata, enabling users to understand the content of a data file by the data file only as data are self described and enabling users to use standardised applications for analysis as Input/Output from applications are supported by the self described formats. Furthermore, data centres supporting this format often support data through the OPeNDAP protocol which enables users to stream data.

When encoding data as CF-NetCDF it is good practise to include discovery metadata in the file using the Attribute Convention for dataset Discovery (ACDD). Discovery metadata will then be directly connected to the data themselves. The ACDD elements for proper extraction of discovery metadata are listed below. In addition it is recommended to add the CF global attribute featureType for datasets following the discrete sampling geometry setup according to CF. This is required to have  visualisation of the dataset in the data portal.

 

ACDD global attributes required to generate discovery metadata (extract from ACDD documentation) in ADC related catalogues.
Attribute Description Comment
id An identifier for the data set, provided by and unique within its naming authority. The combination of the "naming authority" and the "id" should be globally unique, but the id can be globally unique by itself also. IDs can be URLs, URNs, DOIs, meaningful text strings, a local key, or any other unique string of characters. The id should not include white space characters. Required if not hosted by MET. If hosted by MET, please do not add this.
naming_authority The organization that provides the initial id (see above) for the dataset. The naming authority should be uniquely specified by this attribute. We recommend using reverse-DNS naming for the naming authority; URIs are also acceptable. Example: 'edu.ucar.unidata'. Required if not hosted by MET.
title A short phrase or sentence describing the dataset. In many discovery systems, the title will be displayed in the results list from a search, and therefore should be human readable and reasonable to display in a list of such names. This attribute is also recommended by the NetCDF Users Guide and the CF conventions. Required. Please use an informative title that guides potential users on the content of the dataset.
summary A paragraph describing the dataset, analogous to an abstract for a paper. Required. Please use informative text that guides the potential user on the content of this dataset, as well as additional information that might be relevant.
keywords A comma-separated list of keywords and/or phrases. Keywords may be common words or phrases, terms from a controlled vocabulary (GCMD is required), or URIs for terms from a controlled vocabulary (see also "keywords_vocabulary" attribute). If keywords are extracted from e.g. GCMD Science Keywords, add keywords_vocabulary='GCMDSK' and prefix in any case each keyword with the appropriate prefix. Required, GCMD Science Keywords. Additional vocabularies may be used. See here for details on how to use. The GCMD Keyword Viewer may come in handy.
keywords_vocabulary If you are using a controlled vocabulary for the words/phrases in your "keywords" attribute, this is the unique name or identifier of the vocabulary from which keywords are taken. If more than one keyword vocabulary is used, each may be presented with a prefix and a following comma, so that keywords may optionally be prefixed with the controlled vocabulary key. Example: 'GCMD:GCMD Keywords, CF:NetCDF COARDS Climate and Forecast Standard Names'. Required, GCMD Science Keywords. Additional vocabularies may be used. See here for details on how to use.
geospatial_lat_min Describes a simple lower latitude limit; may be part of a 2- or 3-dimensional bounding region. Geospatial_lat_min specifies the southernmost latitude covered by the dataset. Must be decimal degrees north. Required
geospatial_lat_max Describes a simple upper latitude limit; may be part of a 2- or 3-dimensional bounding region. Geospatial_lat_max specifies the northernmost latitude covered by the dataset. Must be decimal degrees north. Required
geospatial_lon_min Describes a simple longitude limit; may be part of a 2- or 3-dimensional bounding region. geospatial_lon_min specifies the westernmost longitude covered by the dataset. See also geospatial_lon_max. Must be decimal degrees east (negative westwards). Required
geospatial_lon_max Describes a simple longitude limit; may be part of a 2- or 3-dimensional bounding region. geospatial_lon_max specifies the easternmost longitude covered by the dataset. Cases where geospatial_lon_min is greater than geospatial_lon_max indicate the bounding box extends from geospatial_lon_max, through the longitude range discontinuity meridian (either the antimeridian for -180:180 values, or Prime Meridian for 0:360 values), to geospatial_lon_min; for example, geospatial_lon_min=170 and geospatial_lon_max=-175 incorporates 15 degrees of longitude (ranges 170 to 180 and -180 to -175). Must be decimal degrees east (negative westwards). Required
time_coverage_start Describes the time of the first data point in the data set. Use the ISO 8601:2004 date format, preferably the extended format as recommended in the Attribute Content Guidance section. I.e. YYYY-MM-DDTHH:MM:SSZ (always use UTC). Required
time_coverage_end Describes the time of the last data point in the data set. Use ISO 8601:2004 date format, preferably the extended format as recommended in the Attribute Content Guidance section. I.e. YYYY-MM-DDTHH:MM:SSZ (always use UTC). Required. If the file is being appended new data, this can be left out and the dataset will then be considered an ongoing effort.
Conventions A comma-separated string of the conventions that are followed by the dataset. For files that follow this version of ACDD, include the string 'ACDD-1.3'. (This attribute is described in the NetCDF Users Guide.) Required
history Provides an audit trail for modifications to the original data. This attribute is also in the NetCDF Users Guide: 'This is a character array with a line for each invocation of a program that has modified the dataset. Well-behaved generic netCDF applications should append a line containing: date, time of day, user name, program name and command arguments.' To include a more complete description you can append a reference to an ISO Lineage entity; see NOAA EDM ISO Lineage guidance. Required
source The method of production of the original data. If it was model-generated, source should name the model and its version. If it is observational, source should characterize it. This attribute is defined in the CF Conventions. Examples: 'temperature from CTD #1234'; 'world model v.0.1'. Optional
processing_level A textual description of the processing (or quality control) level of the data. Optional
date_created The date on which this version of the data was created. (Modification of values implies a new version, hence this would be assigned the date of the most recent values modification.) Metadata changes are not considered when assigning the date_created. The ISO 8601:2004 extended date format is recommended, as described in the Attribute Content Guidance section. E.g. 2020-10-20T12:35:00Z. Required
creator_type

Specifies type of creator with one of the following: 'person', 'group', 'institution', or 'position'. If this attribute is not specified, the creator is assumed to be a person.

If multiple persons are involved, please list these as a comma separated string. In such situation please remember to add a comma separated string for creator_institution and creator_email as well. Consistency between these fields are done from left to right.

Required. Consistency across comma separated lists for all creator_* attributes is required. Do not use ',' except for separating elements. Use this for principal investigator.
creator_institution The institution of the creator; should uniquely identify the creator's institution. This attribute's value should be specified even if it matches the value of publisher_institution, or if creator_type is institution. See last paragraph under creator_type. Required. Consistency across comma separated lists for all creator_* attributes is required. Do not use ',' except for separating elements. Use this for principal investigator.
creator_name The name of the person (or other creator type specified by the creator_type attribute) principally responsible for creating this data. See last paragraph under creator_type. Required. Consistency across comma separated lists for all creator_* attributes is required. Do not use ',' except for separating elements. Use this for principal investigator.
creator_email The email address of the person (or other creator type specified by the creator_type attribute) principally responsible for creating this data. See last paragraph under creator_type. Required. Consistency across comma separated lists for all creator_* attributes is required. Do not use ',' except for separating elements. Use this for principal investigator.
creator_url The URL of the person (or other creator type specified by the creator_type attribute) principally responsible for creating this data. See last paragraph under creator_type. Required. Consistency across comma separated lists for all creator_* attributes is required. Do not use ',' except for separating elements. Use this for principal investigator.
institution The name of the institution principally responsible for originating this data. This attribute is recommended by the CF convention. If provided as a string ending with a keyword in parantheses (), the main text will be interpreted as the long name and the keyword in the parantheses as the short name. E.g. 'Norwegian Meteorological Institute (MET)'

Optional, not extracted to discovery metadata records.

publisher_name The name of the person (or other entity specified by the publisher_type attribute) responsible for publishing the data file or product to users, with its current metadata and format. Yes if not hosted by MET. If not an organisation add publisher_institution which is used to identify the data centre hosting the dataset. If multiple are listed, use comma separated list and consistency across fields.
publisher_email The email address of the person (or other entity specified by the publisher_type attribute) responsible for publishing the data file or product to users, with its current metadata and format. Yes if not hosted by MET. If multiple are listed, use comma separated list and consistency across fields.
publisher_url The URL of the person (or other entity specified by the publisher_type attribute) responsible for publishing the data file or product to users, with its current metadata and format. Yes if not hosted by MET. If multiple are listed, use comma separated list and consistency across fields.
project The name of the project(s) principally responsible for originating this data. Multiple projects can be separated by commas, as described under Attribute Content Guidelines. Examples: 'PATMOS-X', 'Extended Continental Shelf Project'. If each substring includes a keyword in parantheses, the content within the paranthesis is interpreted as the short name for the project while the rest is the long name. E.g. 'Nansen Legacy (NLEG)'. Required
platform Name of the platform(s) that supported the sensor data used to create this data set or product. Platforms can be of any type, including satellite, ship, station, aircraft or other. Indicate controlled vocabulary used in platform_vocabulary. Comma separated list. Recommended. Usage of MMD keywords are encouraged where applicable.
platform_vocabulary Controlled vocabulary for the names used in the "platform" attribute. Comma separated list. Remember to use prefixes like for keywords. Recommended. Usage of MMD keywords are encouraged.
instrument Name of the contributing instrument(s) or sensor(s) used to create this data set or product. Indicate controlled vocabulary used in instrument_vocabulary. Comma separated list. Optional
instrument_vocabulary Controlled vocabulary for the names used in the "instrument" attribute. Comma separated list. Remember to use prefixes like for keywords. Optional
references Published or web-based references that describe the data or methods used to produce it. Recommend URIs (such as a URL or DOI) for papers or other references. This attribute is defined in the CF conventions. Optional
license Provide the URL to a standard or specific license, enter "Freely Distributed" or "None", or describe any restrictions to data access and distribution in free text. It is strongly recommended to use identifiers and URL's from https://spdx.org/licenses/ and to use a form similar to <URL>(<Identifier>) using elements from the SPDX source listed above. Required

 

Global attributes not being part of ACDD, but that are parsed.

Attribute Description Comment
iso_topic_category ISO topic category fetched from a controlled vocabulary. Accepted elements are listed in the MMD specification. Not part of ACDD, MET extension. Recommended for filtering purposes.
activity_type Activity types are used to identify the origin of the dataset. This is not an identification of the observation platform (e.g. specific vessel, SYNOP station or satellite), but more the nature of the generation process (e.g. simulation, in situ observation, remote sensing etc). It is useful in the context of filtering data when searching for relevant datasets. Only elements from the controlled vocabulary of the MMD specification are allowed. Not part of ACDD, MET extension. Recommended for filtering purposes.
operational_status The current operational status of the product. Only elements from the controlled vocabulary of the MMD specification are allowed. Not part of ACDD, MET extension. Recommended for filtering purposes.
featureType This is part of the CF conventions and is required when submitting data according to the discrete sampling geometries section of the CF conventions.

The keywords used has to be exactly written as in the CF conventions.

Valid keywords are listed in http://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#_features_and_feature_types

wigos_id WMO WIGOS identifier if available to describe the platform generating data.  

The short story on ACDD and CF is:

  • ACDD is discovery metadata, used to search for useful datasets.
  • CF is use metadata, used to understand datasets found.

It is recommended to use ACDD in version 1.3 or higher and CF in version 1.6 or higher.

For further information on types of metadata see here.

NorDataNet has together with NMDC supported the development of a web service enabling users to transform data from ASCII to NetCDF/CF. This is available at NERSC.

Convert ASCII files to NetCDF/CF

If the datasets provided are time series at stations etc, please use the appropriate featureType in the NetCDF files. Relevant featureTypes are listed in http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#_features_and_feature_types. Granularity is important considering efficient reuse of data. In this context more flexibility for data consumers is retained if stations are submitted stand alone instead of bundled in one file.