Provide user-friendly default behavior when processing invalid samples through Brain methods #30

brimoor · 2020-07-07T21:49:04Z

Background

All brain methods currently have a validate flag, set to False by default, that control whether validation is performed on samples when deciding if valid data was provided or not.

For example, validation might complain if a user requests an operation that requires logits for a prediction, but none are found. Or, validation might enforce certain constraints on input images such as grayscale vs color.

Objective

Provide the most user-friendly experience possible when users invoke brain methods, allowing them to:

not worry unnecessarily about the format of their data
rest assured that egregiously unnecessary computation is not being performed without their knowledge
small errors do not result in catastrophic errors

The text was updated successfully, but these errors were encountered:

brimoor · 2020-07-07T21:49:16Z

Jason comments

The brain functions all have a validate features that is set to False by default. The validate feature currently just checks basics like does the data exist and do the dataset fields exist. Now, when a user sends "bad" data to the brain, it can go undetected. We need to improve the default behavior. My proposal is to
(1) add more robust format checking to validate. E.g., if the function requires an rgb image, then check if it is so.
(2) make validate True by default
(3) Add a field that is auto_format, default to False; when True, auto convert/format the data when possible. E.g., convert grayscale to color.

Agree? Disagree?

(Note that even always-validate is non-trivial from a cost point of view as it may need to touch all the images in a dataset --> slow)

brimoor · 2020-07-07T21:55:11Z

Brian comments

In the particular case of image formats, I think we best achieve Objective 1 by automatically reformatting input data whenever possible into the required format. So automatic grayscale/four-channel -> RGB conversion would be applied, for example.

Regarding validation, I propose that we consider adjusting the validation logic to support two modes of operation:

skip_invalid=True (the default): skip invalid samples on a per-sample basis. All valid samples will be successfully processed and some kind of descriptive but not overwhelming logging is displayed when one or more samples are skipped for various reasons. This achieves Objective 3
skip_invalid=False: all samples are validated before being processed. If any sample is invalid, an error is raised and no computation is performed

brimoor · 2020-07-07T22:14:01Z

A corollary to my comments is that I would advocate for automatic auto-formatting whenever possible.

I can't yet think of anything so expensive that we would violate Objective 2 by auto-formatting (to achieve Objective 1)

brimoor added feature Work on a feature request backlog Issues related to the roadmap and feature backlog labels Jul 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide user-friendly default behavior when processing invalid samples through Brain methods #30

Provide user-friendly default behavior when processing invalid samples through Brain methods #30

brimoor commented Jul 7, 2020 •

edited

Loading

brimoor commented Jul 7, 2020

brimoor commented Jul 7, 2020 •

edited

Loading

brimoor commented Jul 7, 2020

Provide user-friendly default behavior when processing invalid samples through Brain methods #30

Provide user-friendly default behavior when processing invalid samples through Brain methods #30

Comments

brimoor commented Jul 7, 2020 • edited Loading

brimoor commented Jul 7, 2020

brimoor commented Jul 7, 2020 • edited Loading

brimoor commented Jul 7, 2020

brimoor commented Jul 7, 2020 •

edited

Loading

brimoor commented Jul 7, 2020 •

edited

Loading