-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use .ncCFHeader ERDDAP format for Platform dimension tests and other IOOS 1.2 checks #805
Comments
I've spent time examining this issue and how it relates to #804. I am close to finding a solution which would enable the checking of an ERDDAP dataset through only its metadata by parsing the
ConceptCreate a Challenges
Many variables require numeric attributes, such as
Humans reading this understand it to be an array of floating point values, but a machine only reads it as characters encoded into a text file. Assuming that these numeric attributes are numeric arrays could be a large misstep. If the attribute is actually encoded in the file as a character array, it should be marked as an incorrect datatype. Thus, the question: "Is it safe to assume comma-separated strings of numeric-only (or suffixed by Because ERDDAP is so strict with certain aspects of its typing, this may be the case. Tagging @ocefpaf in here just in case he wants to drop some more knowledge on us ;) |
@daltonkell Another consideration is that requesting the generic I talked about looking for workarounds like using a generic For the WQB-04 dataset, the https://pae-paha.pacioos.hawaii.edu/erddap/tabledap/WQB-04.ncCFHeader?&time%3E=2020-07-06T16:00:00Z For the IOOS 1.2 Platform check, we need to get this dimension information, but I don't think we need to read any of the array/table data itself, at least, so maybe there's a solution here. Just wanted to note these issues though. Not sure on your data type questions, sorry! |
You are right. I never thought about that but the creation of that info on a file slice request can be demanding on the server side! Maybe we could use the dataset_id info response? It has probably a smaller server side footprint (Bob can probably say more about that). There are no I've been playing with constructing a nc-like object from that reponse: https://nbviewer.jupyter.org/gist/ocefpaf/ae0d650af68c0670e5f09d35c887129c It is probably a long way from what compliance-checker needs though. And again, no data test would ne run, only metadata tests would work. |
@daltonkell @benjwadams Does the attribute dictionary response in @ocefpaf's notebook look useful for the IOOS 1.2 checker for ERDDAP datasets? We may still need dimension info in order to test the platform concept check, potentially. Also, @benjwadams would this meet our needs in ioos/ckanext-ioos-theme#208 if it were built into erddapy directly? |
@mwengren @ocefpaf's use of the https://geoport.usgs.esipfed.org/erddap/info/1051-A/index.html is very resourceful. Without the dimension information though, we are unable to test of the dataset is CF-DSG-compliant, which, if I recall correctly, was a pretty critical step in the IOOS-1.2 spec. It also doesn't really answer the problem of assuming attribute encoding like I mentioned earlier. |
I wonder if we should work upstream with ERDDAP developers to augment the info response with this metadata instead of working around it. What do you think? |
@ocefpaf I'm 100% for collaboration. Perhaps we could reach out via the Google Group? |
ERDDAP Google Group would be a good call to begin with. I haven't seen much issue traffic in the ERDDAP GtiHub repo - probably partly to do with Bob being the primary/solo developer. He does respond typically on either however. |
Hi all, checking in on this issue as we're working on a data ingestion pipline that will involve frequently running compliance-checker IOOS 1.2 checks against ERDDAP endpoints. Any thoughts on reducing the burden on remote ERDDAPs? Maybe just a non-ideal cchecker flag to skip any checks requiring lot of data loading until the situation can be improved on the ERDDAP side? |
@daltonkell Can you look into options for resolving this dimension checking issue while you're also investigating the CF FeatureType issues in this PR #858. They may not be related exactly, but this one has been lingering for awhile and a flag option as in @shane-axiom's suggestion would help make automated scans of ERDDAP servers for Metadata Profile 1.2 compliance a lot more performant. If there isn't an ERDDAP-based fix for this in the works and we don't have a good workaround like time dimension filtering in CC to reduce ERDDAP sever CPU time to determine dataset DSG dimensionality, providing the option to skip those tests might be the best way to go. |
This issue is to capture discussion in #799 (comment) so it doesn't get lost.
From those comments:
@mwengren said:
@daltonkell said:
This would be a more permanent solution to #804 in that it would hopefully reduce the file sizes requested from ERDDAP
.ncCF
output formats.The text was updated successfully, but these errors were encountered: