Skip to content

Commit

Permalink
Update data-flow.md
Browse files Browse the repository at this point in the history
  • Loading branch information
MathewBiddle authored Sep 13, 2024
1 parent cdf0447 commit d897c1b
Showing 1 changed file with 30 additions and 35 deletions.
65 changes: 30 additions & 35 deletions _docs/data-flow.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,40 +54,32 @@ I[("IOOS Data Catalog
(data.ioos.us)
(metadata only)")]
A --> B
A -.-> B
B ----> I
B --> C
B ----> D
B -.-> C
B -..-> D
C --> E
C -.-> E
E --> G
E --> H
E -- OBIS-USA --> D
```

For data collected/managed by the US IOOS community, the project should ensure data and information are readily available to resource managers, scientists, educators, and the public in an easily digestible way.
To that end, making these data available via [ERDDAP](https://erddap.github.io/) services meets these goals and facilitates automated integration into the [IOOS Data Catalog](https://data.ioos.us/) and subsequent Federal Catalogs (eg. [Data.gov](https://data.gov/) and [NOAA OneStop](https://data.noaa.gov/onestop/)).
Using the services that ERDDAP provides, a data manager can develop a reproducible workflow for aligning the data which observes an animal at a place and time to the [Darwin Core standard](https://dwc.tdwg.org/) to be shared with an [OBIS node](https://obis.org/contact/) and subsequently shared to [OBIS](https://obis.org), [GBIF](https://www.gbif.org/), and (for OBIS-USA) archived at [NOAA's NCEI](https://www.ncei.noaa.gov/) through automated processes (solid arrows).
Finally, submission of the raw data to NCEI ensures that no observations are lost and there is long-term stewardship of the source data, as well as meeting our PARR requirements.
The sections below provide more context as well as tips and tricks for each of the elements in the diagram above.

For data collected/managed by the US IOOS community, the project should ensure data and information are readily available
to resource managers, scientists, educators, and the public in an easily digestible way. To that end, making these data
available via ERDDAP services meets these goals and facilitates automated integration into the IOOS Data Catalog. Using
the services that ERDDAP provides, a data manager can develop a reproducible workflow for aligning the data to the
[Darwin Core standard](https://dwc.tdwg.org/) to be shared with an [OBIS node](https://obis.org/contact/) and subsequently
shared to [OBIS](https://obis.org) and [GBIF](https://www.gbif.org/) through automated processes. Finally, submission to
NCEI ensures that no observations are lost and there is long-term stewardship of these data, as well as meeting our PARR
requirements. The sections below provide more context as well as tips and tricks for each of the elements in the diagram
above.

{% include note.html content="For MBON projects, datasets should be registered in the
[MBON dataset registration form](https://docs.google.com/forms/d/e/1FAIpQLSfguACbLmcLiFxHKsR5W5Mv9nEfd0E8oX2rY78gdwAYTrq_zA/viewform?usp=sf_link).
This will ensure that we (the IOOS Marine Life DMAC team) are aware of the dataset and can track its progress through the
data management and sharing workflows." %}
{% include note.html content="For MBON projects, datasets should be registered in the [MBON dataset registration form](https://docs.google.com/forms/d/e/1FAIpQLSfguACbLmcLiFxHKsR5W5Mv9nEfd0E8oX2rY78gdwAYTrq_zA/viewform?usp=sf_link).
This will ensure that we (the IOOS Marine Life DMAC team) are aware of the dataset and can track its progress through the data management and sharing workflows." %}

## RA ERDDAP
For the IOOS DMAC ERDDAP is used as a mechanism for quickly and efficiently sharing biological observations with
the broader community. While ERDDAP can provide data access following the FAIR principles, further alignment to Darwin
Core and submission to OBIS is necessary to make these observations more useful to a broader audience. Essentially, serving
data through an RA ERDDAP is one part of a larger process and should be treated as such.
For IOOS DMAC, ERDDAP is used as a mechanism for quickly and efficiently sharing biological observations with the broader community.
While ERDDAP can provide data access following the FAIR principles, further alignment to Darwin Core and submission to OBIS is necessary to make these observations more useful to a broader audience.
Essentially, serving data through an ERDDAP is one part of a larger process and should be treated as such.

### Key principles for data
When preparing a dataset to be served via ERDDAP it is recommended to follow a few key principles for data management.
Expand All @@ -101,20 +93,18 @@ When preparing a dataset to be served via ERDDAP it is recommended to follow a f
* Latitude and Longitude in decimal degrees (WGS84 preferred)
* Identify units of measure
* Check species names against [WoRMS](https://www.marinespecies.org/).
* See the sections on [Data and File Formatting](data.html) and
[Metadata and Documentation](metadata.html) for more recommendations and best practices.

**Additional Resources**
* [Configuring datasets.xml](https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html) - The ERDDAP manual
* [Configuring datasets.xml](https:/erddap.github.io/setupDatasetsXml.html) - The ERDDAP manual
for configuring a dataset.
* [ERDDAP Quick Start Guide](https://ioos.github.io/erddap-gold-standard/) - Quick Start Guide for deploying ERDDAP in a Docker Container.
* [ERDDAP Google Group](https://groups.google.com/g/erddap) - A great place to search for questions and ask your
questions.
* [ERDDAP GitHub Organization](https://github.com/erddap) - Where ERDDAP's source code can be found and a place to contribute feature requests.

### ERDDAP Requirements

Below is a list of the absolute bare minimum pieces of metadata required by ERDDAP. Some dataset types might have other
requirements specific to the data file formats.
Below is a list of the absolute bare minimum pieces of metadata required by ERDDAP. Some dataset types might have other requirements specific to the data file formats.
* Global attributes
* [datasetID](https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#datasetID)
* [sourceUrl](https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#sourceUrl) - however, depends on
Expand Down Expand Up @@ -153,11 +143,12 @@ resultant dataset in ERDDAP is composed of the following columns: `url`, `name`,
## Darwin Core alignment
When aligning a dataset to Darwin Core it is recommended that a data manager starts with serving the data via ERDDAP
or some comparable online system which has an [API](https://en.wikipedia.org/wiki/API) (or a way to programmatically
grab the data). When working through the Darwin Core alignment using a scripting language (eg. R or Python) which uses
the data served via ERDDAP (or comparable service) is highly recommended. A scripting language provides provenance,
transparency, and reproducibility for the translation. This helps reduce the amount of errors and back-and-forth between
data managers and OBIS. It is highly recommended that, if using a scripting language, the scripts are shared via
distributed version control systems like [GitHub](https://www.github.com).
grab the data) and preferrably follows some set of [OGC standards](https://www.ogc.org/standards/). When working through
the Darwin Core alignment using a scripting language (eg. R or Python) which uses the data served via ERDDAP (or
comparable service) is highly recommended. A scripting language provides provenance, transparency, and reproducibility
for the translation. This helps reduce the amount of errors and back-and-forth between data managers and OBIS. It is
highly recommended that, if using a scripting language, the scripts are shared via distributed version control systems
like [GitHub](https://www.github.com).

**Recommendations TL;DR;**
* Follow the guidance at [TDWG's Darwin Core quire reference guide](https://dwc.tdwg.org/terms/).
Expand All @@ -172,7 +163,10 @@ aligning datasets to Darwin Core.
* [OBIS Manual](https://manual.obis.org/) - This manual provides an overview on how to contribute data to OBIS and how to acess data from OBIS

## Sending to OBIS-USA
Below are the various options for sending your data to OBIS-USA.
Below are the various options for sending your data to OBIS-USA.
OBIS-USA is part of an international data sharing network (Ocean Biodiversity Information System, OBIS) coordinated by the Intergovernmental Oceanographic Commission, of UNESCO (United Nations Educational, Science and Cultural Organization International Oceanographic Data and Information Exchange.
OBIS-USA is the US node to OBIS and uses the [Integrated Publishing Toolkit (IPT)](https://manual.obis.org/ipt) as a platform to publish data which can then be registered and automatically harvested by OBIS and GBIF.
For more details on the publishing process, please review the Marine Biological Data Mobilization Workshop lesson on [Metadata and Publishing](https://ioos.github.io/bio_mobilization_workshop/07-validation-and-publishing/index.html)

* **Recommended**: use the [Dataset Review Request](https://github.com/ioos/bio_data_guide/issues/new/choose) issue to initialize the request.
* Attend the monthly [Standardizing Marine Biological Data Working Group](https://github.com/ioos/bio_data_guide#monthly-meetings) meeting and discuss transfer options.
Expand All @@ -181,7 +175,8 @@ Below are the various options for sending your data to OBIS-USA.

## Sending to NCEI
When planning on submitting data to NCEI, the data provider should coordinate submissions through the IOOS Office to
identify which submission system should be used. This will ensure that the dataset is appropriately identified, tracked,
identify which submission system should be used.
This will ensure that the dataset is appropriately identified, tracked,
and stewarded through the submission process.

Ideally, the raw data should be archived at NCEI. Typically, this will be the dataset served through
Expand All @@ -193,7 +188,7 @@ For more information about archiving data at NCEI, see [https://www.ncei.noaa.go

Briefly, the submission package sent to NCEI should indicate that the observations are from an IOOS MBON project (or
has some affiliation with IOOS). Below is a short summary of the two submission systems at NCEI and their intended uses.
* [ATRAC](https://www.ncdc.noaa.gov/atrac/guidelines.html) - Use the Advanced Tracking and Resource Tool for Archive
* [ATRAC](https://www.ncei.noaa.gov/archive/atrac/index.html) - Use the Advanced Tracking and Resource Tool for Archive
Collections (ATRAC) to submit repeating or multiple delivery data, or data that exceeds 20 GB.
* [S2N](https://www.ncei.noaa.gov/archive/send2ncei/) - Use Send2NCEI to submit non-repeating or single delivery data less than 20 GB.

Expand Down

0 comments on commit d897c1b

Please sign in to comment.