Marking error datasets and warnings/caveats #575

max-zilla · 2019-05-09T18:54:55Z

Have an ERROR.txt or something in the dataset & on disk to indicate dataset should be skipped for processing.

dlebauer · 2019-05-09T19:46:16Z

For data that a human has recognized as being in error (e.g. w/ blurry FLIR data, point clouds clipped at some height)

add a text file named "ERROR" with optional content / explanation / pointer to a github issue, could be a set of key: value pairs, perhaps in yaml that get parsed directly to json metadata
have an extractor that finds these files and adds a tag "quality"{ "ERROR" = "TRUE", "description":"", "key2":"value2"
add a a general rule perhaps at the level of extractors, or perhaps at the level of RabbitMQ that says any time an error (file or flag) is found, skip processing the dataset.

dlebauer · 2019-05-09T19:56:39Z

For example (consider this a draft; should probably use a consistent / standard way of encoding this information), every FLIR dataset following May 2017 could have a file named "ERROR.yml" that contains:

quality:
  status: ERROR
  description: Sand and water contaminated FLIR Camera lens so temperature values are invalid  
  url: https://github.com/terraref/reference-data/issues/182

max-zilla · 2019-05-16T18:43:22Z

duplicate of #557

max-zilla · 2019-05-20T20:48:38Z

there are different severities of error - FLIR 2017 is prominent example, but the stair-stepping on the laser3D data is not so cut and dry and may still have valuable data in it

max-zilla · 2019-05-23T16:06:30Z

Proposed script will add new metadata entry from Maricopa Field user with body like:

{
"quality": "ERROR",
"description": "Sand and water contaminated FLIR Camera lens so temperature values are invalid", 
"url": https://github.com/terraref/reference-data/issues/182"
}

Other status could be WARNING, ADVISORY, etc. We would also write a corresponding yaml file to the globus directory with those contents as suggested, perhaps at the day level rather than repeated for entire dataset? or do we want it repeated at the dataset (timestamp) level?

dlebauer · 2019-05-23T16:14:22Z

We should also add a file named "ERROR" that contains the description and url to the affected dataset. I think repeating this at the dataset level will be good. There may be use for having a tag at a higher level, but that would be in addition to the dataset level flag.

max-zilla · 2019-05-30T15:50:36Z

My script is prepared to generate the YAML files & metadata, however due to the raw_data directories being owned by dlebauer I am unable to write into them. We can discuss how to handle this during meeting... probably one of:

run as sudo and chmod the yaml files when created so they are owned by dlebauer consistently with the others
have dlebauer run the script

dlebauer · 2019-05-30T16:26:59Z

I am glad that the raw data folder is locked down. I am not sure it makes sense for me to be the folder owner (as opposed to a user or group like ‘terraref’ but ... the idea is that we don’t touch the raw_data folder.

In the end, the key requirement is that any data that have known errors (or other issues) are clearly labeled as such.

It makes sense (at this point) to have to use sudo to touch the raw_data folder, if we should ever touch it at all. But maybe there is a ‘better’ way to handle this. Certainly none of the existing files should be touched, but allowing the same user that transfers the files to be able to create a new file would also seem reasonable.

For the FLIR, we did process the data to Level 1. Is the plan to also add an error file to the level 1 data?

max-zilla · 2019-06-06T13:54:19Z

Script is running now. Will close this when completed.

I would argue that in the FLIR case we don't add an error to Level_1 data, the goal was to flag the raw so that in the future we dont even process these erroneous datasets. I would make an argument for deleting level 1 + data from this time period for FLIR to be consistent with that.

max-zilla · 2019-06-06T18:12:17Z

standardized taxonomy for different error cases - ERROR that should not be sent through processing vs. WARNING vs. other classes

max-zilla · 2019-06-06T18:13:04Z

Max will write up some documentation/wiki once this is done to propose a standard approach to handling this.

max-zilla · 2019-06-06T18:15:49Z

support ability to explicitly define files (vs entire dataset) - 'all' could be default value.

max-zilla · 2019-06-13T13:59:31Z

Created #589 to follow this.

max-zilla added this to the TERRA Sprint - April 2019 milestone May 9, 2019

max-zilla self-assigned this May 16, 2019

max-zilla modified the milestones: TERRA Sprint - April 2019, TERRA Sprint - May 2019 May 16, 2019

max-zilla changed the title ~~Marking error datasets~~ Marking error datasets and warnings/caveats May 20, 2019

max-zilla mentioned this issue Jun 13, 2019

Standardize error and warning definitions for pipeline #589

Open

max-zilla closed this as completed Jun 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marking error datasets and warnings/caveats #575

Marking error datasets and warnings/caveats #575

max-zilla commented May 9, 2019

dlebauer commented May 9, 2019

dlebauer commented May 9, 2019 •

edited

Loading

max-zilla commented May 16, 2019

max-zilla commented May 20, 2019

max-zilla commented May 23, 2019

dlebauer commented May 23, 2019

max-zilla commented May 30, 2019

dlebauer commented May 30, 2019

max-zilla commented Jun 6, 2019

max-zilla commented Jun 6, 2019

max-zilla commented Jun 6, 2019

max-zilla commented Jun 6, 2019

max-zilla commented Jun 13, 2019

Marking error datasets and warnings/caveats #575

Marking error datasets and warnings/caveats #575

Comments

max-zilla commented May 9, 2019

dlebauer commented May 9, 2019

dlebauer commented May 9, 2019 • edited Loading

max-zilla commented May 16, 2019

max-zilla commented May 20, 2019

max-zilla commented May 23, 2019

dlebauer commented May 23, 2019

max-zilla commented May 30, 2019

dlebauer commented May 30, 2019

max-zilla commented Jun 6, 2019

max-zilla commented Jun 6, 2019

max-zilla commented Jun 6, 2019

max-zilla commented Jun 6, 2019

max-zilla commented Jun 13, 2019

dlebauer commented May 9, 2019 •

edited

Loading