-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marking error datasets and warnings/caveats #575
Comments
See terraref/reference-data#218 For data that a human has recognized as being in error (e.g. w/ blurry FLIR data, point clouds clipped at some height)
|
For example (consider this a draft; should probably use a consistent / standard way of encoding this information), every FLIR dataset following May 2017 could have a file named "ERROR.yml" that contains: quality:
status: ERROR
description: Sand and water contaminated FLIR Camera lens so temperature values are invalid
url: https://github.com/terraref/reference-data/issues/182 |
duplicate of #557 |
there are different severities of error - FLIR 2017 is prominent example, but the stair-stepping on the laser3D data is not so cut and dry and may still have valuable data in it |
Proposed script will add new metadata entry from Maricopa Field user with body like:
Other status could be WARNING, ADVISORY, etc. We would also write a corresponding yaml file to the globus directory with those contents as suggested, perhaps at the day level rather than repeated for entire dataset? or do we want it repeated at the dataset (timestamp) level? |
We should also add a file named "ERROR" that contains the description and url to the affected dataset. I think repeating this at the dataset level will be good. There may be use for having a tag at a higher level, but that would be in addition to the dataset level flag. |
My script is prepared to generate the YAML files & metadata, however due to the raw_data directories being owned by dlebauer I am unable to write into them. We can discuss how to handle this during meeting... probably one of:
|
I am glad that the raw data folder is locked down. I am not sure it makes sense for me to be the folder owner (as opposed to a user or group like ‘terraref’ but ... the idea is that we don’t touch the raw_data folder. In the end, the key requirement is that any data that have known errors (or other issues) are clearly labeled as such. It makes sense (at this point) to have to use sudo to touch the raw_data folder, if we should ever touch it at all. But maybe there is a ‘better’ way to handle this. Certainly none of the existing files should be touched, but allowing the same user that transfers the files to be able to create a new file would also seem reasonable. For the FLIR, we did process the data to Level 1. Is the plan to also add an error file to the level 1 data? |
Script is running now. Will close this when completed. I would argue that in the FLIR case we don't add an error to Level_1 data, the goal was to flag the raw so that in the future we dont even process these erroneous datasets. I would make an argument for deleting level 1 + data from this time period for FLIR to be consistent with that. |
standardized taxonomy for different error cases - ERROR that should not be sent through processing vs. WARNING vs. other classes |
Max will write up some documentation/wiki once this is done to propose a standard approach to handling this. |
support ability to explicitly define files (vs entire dataset) - 'all' could be default value. |
Created #589 to follow this. |
Have an ERROR.txt or something in the dataset & on disk to indicate dataset should be skipped for processing.
The text was updated successfully, but these errors were encountered: