Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

ERDDAP Configuration

John Kerfoot edited this page Jun 6, 2023 · 6 revisions

WikiERDDAP Configuration

Documentation on the DAC ERDDAP server configuration.

Contents

Background

ERDDAP serves data sets, which are aggregations of individual files, that are described in the datasets.xml catalog file. The datasets.xml file is an XML file that describes the contents and metadata of a glider data set in a format that ERDDAP can understand. In addition to new data sets, XML descriptions of existing data sets may need to be modified to reflect the addition of missing or incorrect metadata associated with the data set.

Configuration

This file is located in:

/data/catalog/priv_erddap/datasets.xml

and symlinked to the tomcat instance location:

/var/tomcats/tomcat-erddap-private/content/erddap/datasets.xml

All modifications to the datasets.xml file should therefore be made to the file:

/data/catalog/priv_erddap/datasets.xml

Build

The datasets.xml file is built at 15 minutes past every hour.

Description

The datasets.xml file consists of 3 parts:

  1. Header
    • <erddapDatasets> opening tag
    • Tags specifying IP blacklists, the maximum number of simultaneous connections allowed from a single IP, etc.
  2. Body
    • Individual data set descriptions enclosed in <dataset /> tags
      • <reloadEveryNMinutes />
        • high reload frequency for real-time incomplete data sets
        • low reload frequency for real-time incomplete (active) data sets
        • low reload frequency for delayed-mode data sets regardless of complete/incomplete status
      • METADATA
        • use of extra_atts.json file to correct metadata and add missing metadata
      • <metadataFrom>first|last</metadata>
        • Tells ERDDAP to read all data set metadata, not specifically defined in the <dataset /> tag, from either the earliest (first) or latest (last) data provider submitted NetCDF file.
  3. Footer
    • Closing </erddapDatasets> tag

Events Requiring a Rewrite of the datasets.xml file

  1. Registration of a new data set and uploading of one or more individual NetCDF files
    • creation of initial fragment
    • <reloadEveryNMinutes /> set to faster reload frequency
  2. Modification of the metadata of a single data set via the extra_atts.json file
    • update global and variable attributes
  3. Real-time deployment marked as 'Completed'
    • <reloadEveryNMinutes /> changed to decrease frequency of data set reload
  4. Deletion of an existing data set

Issues and Questions

  1. what user builds the ERDDAP datasets.xml?
  2. what script builds the ERDDAP datasets.xml?
  3. what is the schedule of the datasets.xml catalog build?
  4. Is the entire file rewritten each time or only new <dataset /> fragments added?
  5. What triggers rewriting of the datasets.xml file?

Fixes and Suggestions

  1. All write operations on the datasets.xml file should be done on a copy of the current datasets.xml file

  2. Individual <dataset /> XML fragments should be written to separate files and rewritten in specific cases.

  3. The header and footer XML should be contained in separate files.

  4. Following the writing or rewriting of data set XML fragments, the following process should be performed to write the new datasets.xml file:

     > cat header.xml >> datasets.xml.tmp
     > find $PARENT_DIRECTORY -name '*-dataset.xml' -exec cat '{}' >> datasets.xml.tmp \;
     > cat footer.xml >> datasets.xml.tmp
     > mv datasets.xml.tmp datasets.xml