Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

AOML Glider Data delivery to DAC #6

Closed
BeckyBaltes opened this issue Oct 6, 2014 · 27 comments
Closed

AOML Glider Data delivery to DAC #6

BeckyBaltes opened this issue Oct 6, 2014 · 27 comments
Assignees

Comments

@BeckyBaltes
Copy link

Derrick sent out an email to the data group to figure out what needs to happen to get the data from the AOML gliders operating in the Carribean into the DAC. My understanding was that John needed to review the file format. @kerfoot, can you report on the status? If done, what are the next steps?

@BeckyBaltes BeckyBaltes added this to the AOML gliders to the DAC milestone Oct 6, 2014
@kerfoot
Copy link
Contributor

kerfoot commented Oct 6, 2014

@BeckyBaltes @dpsnowden These files contain significantly more information/data than the current DAC standard requires. Here are some first thoughts:

  1. The file uses the trajectory data type. DAC 2.0 uses a single profile data type with a time dimension.
  2. The file contains a single dive, which is 2 profiles, so 2 files would need to be written for each AOML file.
  3. qc flag values are not those required by the DAC.
  4. AOML files have 30 dimensions. DAC requires only 2: time and trajectory string length.
  5. variable names are different
  6. AOML file is missing the instrument_ctd container variable
  7. AOML files don't have a global:wmo_id and do not have a platform:wmo_id attribute.

There are likely other differences, these are just the big ones. These differences alone would require significant rewriting of existing code or likely a decent time investment to write a converter.

@kknee
Copy link
Contributor

kknee commented Oct 7, 2014

@kerfoot are you referring to a converter from the sample netcdf format to the expected DAC2.0 format?
What about having the AOML team follow the submission process documentation from the beginning and creating a netCDF file in the format we expect? Might be a good test of the system.

@kerfoot
Copy link
Contributor

kerfoot commented Oct 7, 2014

@kknee : I'd definitely prefer them follow the process, but I'm guessing they have already spent considerable time and resources to get it to the NODC format they are currently using. Not sure how excited they'll be to start over.

@dpsnowden : any feeling for this? The current DAC workplan and SOW doesn't have anything in there about writing convertors for various groups.

@BeckyBaltes
Copy link
Author

Ideally, this becomes a repeatable process and it probably doesn't make sense to start writing converters for everyone, so I think it's fine to start them with the process and see what they can do. @dpsnowden pointed them to the wiki again this morning and to this thread so if they are on github they can weigh in and track progress.

@dpsnowden
Copy link

Agreed, we started this whole process with the assumption that the formatting would be left to the data providers to the extent possible. Let's see how far we can get with this. But, if the process proves impossible for various reasons then we will need to revisit our assumption and budget for it. If we can't get this data integrated inside of a finite window (1 month?) then we need to think about converters or other technical assistance. The pretty maps and tools in the DAC aren't useful if it isn't full of data.

The more help we can provide in terms of "change x to y in your netcdf file" the better.

Finally, @kerfoot mentioned that they have more metadata in their files than we currently require. I think we should think about adopting the policy that this situation is ok. If we all agreed that more metadata from the provider is better, then we don't want to discourage them from writing it. How would we address this? Can we have rigid standardization of some things and flexibility elsewhere?

@kerfoot
Copy link
Contributor

kerfoot commented Oct 7, 2014

@dpsnowden I think that, as long as they have the variables and attributes that we require, additional data would not be prohibitive and the DAC would be able to serve it. The trick would be setting up erddap datasets in which the underlying .nc file contents are different, depending upon who submitted the data, assuming they wanted all of their data to be accessible.

@kknee
Copy link
Contributor

kknee commented Oct 8, 2014

so as long as DAC 2.0 documentation is ready ( @kerfoot please confirm) then ball is in AOML's court on this issue?

@kerfoot
Copy link
Contributor

kerfoot commented Oct 8, 2014

@kknee The doco on the file format is ready. It's been reviewed by myself and Bob Simons. Since DAC 2.0 is not officially up, the doco on the file submission process is not completely up to date. But they'll need some time to get the files written before they need to worry about submission.

@dpsnowden
Copy link

Good news. I agree that AOML has a role to play here. But, I still would like to identify a technical POC from our team that will interact with them. This interaction would hopefully generate answers to a few questions.

  1. Are they willing/able to create a second version of their data files to comply with the DAC needs?
  2. Is our documentation clear enough that they can do that easily and without much hand holding?
  3. Is it possible to do the "trick" that @kerfoot mentioned above? I see that it might be theoretically possible but who is going to test to determine if it is possible?
  4. How much metadata is lost in migrating from the AOML format into the DAC format and do we care? Is there a way to recover.

@kerfoot
Copy link
Contributor

kerfoot commented Oct 8, 2014

I'm probably the one to handle this. @dpsnowden: can you make the appropriate introductions?

@fbringas
Copy link

Hello @dpsnowden, @kerfoot

I'm writing a code to convert our files into the IOOS_Glider_NetCDF_v2.0. The documentation provided is very good and at this point I would like to make some test to verify that my conversion is accurate and is working as expected. I wonder if you could send me an example of a real glider nc file in the IOOS format? The example in this site is very useful but the variables are empty, a real file with data would be good for tests.

@kerfoot
Copy link
Contributor

kerfoot commented Oct 17, 2014

@fbringas : There are a couple of examples here:

https://github.com/ioos/ioosngdac/tree/master/nc/examples/profile

Would you like me to provide more?

@fbringas
Copy link

@kerfoot : Thank you for the examples.
The issue I'm trying to test is related to the variables "_qc" (i.e. temperature_qc, conductivity_qc, ...). While in my original nc format these variables are declared as char, in the ioss 2.0 format they are declared as byte.
Is it acceptable to declare these variables as char instead of byte?
If not, would you have one more example where these variables contain actual values? In the 2 examples above they were all empty.
By the way, it was my understanding than instead of leaving these "_qc" variables empty they should be set to '0'.

@daf
Copy link
Member

daf commented Oct 17, 2014

@fbringas according to cf-convention/CF-2#3, char shouldn't be used. Most QC fields I've seen are done as flags, which I'm pretty sure is best represented via the byte type, but I'm no expert here.

@kerfoot
Copy link
Contributor

kerfoot commented Oct 17, 2014

@daf: I agree. Char data types are used for strings and bytes are used for numbers. We're using numbers, so we're using bytes.

As for the contents of the _qc variables, they are empty as I haven't yet implemented the flagging system in the files I'm creating.

@lukecampbell
Copy link
Member

It needs to be a signed integer (QARTOD). Most published manuals on marine QA/QC have a very small set of flags and an 8-bit signed integer (Byte in netCDF) is sufficient. Whenever a QC flag is used there needs to be a metadata field that describes the flag values. Example

byte temperature_qc(time):
    string qc_flags = "0=fail, 1=good, 2=suspect, 3=fill_value";

@kerfoot
Copy link
Contributor

kerfoot commented Oct 17, 2014

The DAC 2.0 spec provides a set of flags for these, as an attribute. For example, line 291 here:

https://github.com/ioos/ioosngdac/blob/master/nc/template/IOOS_Glider_NetCDF_v2.0.ncml

I believe we took these from the IMOS specification, though I'm not particularly happy with them as they're very ambiguous and don't relate specifically to the QC check performed. If QARTOD has defined a set of standard QC flags, I'm all for using those.

@lukecampbell
Copy link
Member

flag description
Pass=1 Data have passed critical real-time quality control tests and are deemed adequate for use as preliminary data.
Not evaluated=2 Data have not been QC-tested, or the information on quality is not available.
Suspect or Of High Interest=3 Data are considered to be either suspect or of high interest to data providers and users. They are flagged suspect to draw further attention to them by operators.
fail=4 Data are considered to have failed one or more critical real-time QC checks. If they are disseminated at all, it should be readily apparent that they are not of acceptable quality.
missing=9 Data are missing; used as a placeholder.

From QARTOD Temperature Salinity Manual

@BeckyBaltes
Copy link
Author

UPDATE: On our call this morning, we thought AOML data link was complete, but @robragsdale is still not able to register it without a link to the data. @lukecampbell please provide the access point/link for the data.
@kknee, For awareness.

@kknee
Copy link
Contributor

kknee commented Nov 10, 2014

@robragsdale the link (http://50.17.63.70/erddap/tabledap/SG61020140715T1400.html) was passed around on the IOOS Glider email list, but wanted to document it here too.

Does it make sense to register with this temporarily until we have either (1) a domain for the IP or (2) have completed the WAF?

@robragsdale
Copy link

@kknee EMMA cannot harvest from a .html url. I got a 500 error back when I tried to change extension to xml. Could I use this URL ihttp://50.17.63.70/erddap/metadata/iso19115/xml/SG61020140715T1400_iso19115.xml from the ERDDAP Catalog. Thoughts?

@kknee
Copy link
Contributor

kknee commented Nov 10, 2014

@dpsnowden
Copy link

What is keeping us from deciding on a domain name?

@kknee
Copy link
Contributor

kknee commented Nov 11, 2014

@dpsnowden I don't think anything is. On yesterday's call we discussed using the following URLs - @BeckyBaltes was going to confirm with you that these were okay and next steps for getting Luke access for assigning those domains to the DAC IP.

data.ioos.us/thredds/gliders
data.ioos.us/erddap/gliders

@BeckyBaltes
Copy link
Author

@dpsnowden Just need you to provide Luke whatever logins/accesses he needs to build the two domains.

@dpsnowden
Copy link

Sure. Let's talk Thursday or Friday.

On Wednesday, November 12, 2014, BeckyBaltes [email protected]
wrote:

@dpsnowden https://github.com/dpsnowden Just need you to provide Luke
whatever logins/accesses he needs to build the two domains.


Reply to this email directly or view it on GitHub
#6 (comment).

Excuse my brevity, Sent from Gmail Mobile.

@robragsdale
Copy link

AOML Glider files submitted for registration (SG61020140715T1400 and SG60920140719T1700) are in the IOOS Catalog and Glider DAC v2.0 ERDDAP Service

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants