Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide user-friendly default behavior when processing invalid samples through Brain methods #30

Open
brimoor opened this issue Jul 7, 2020 · 3 comments
Labels
backlog Issues related to the roadmap and feature backlog feature Work on a feature request

Comments

@brimoor
Copy link
Contributor

brimoor commented Jul 7, 2020

Background

All brain methods currently have a validate flag, set to False by default, that control whether validation is performed on samples when deciding if valid data was provided or not.

For example, validation might complain if a user requests an operation that requires logits for a prediction, but none are found. Or, validation might enforce certain constraints on input images such as grayscale vs color.

Objective

Provide the most user-friendly experience possible when users invoke brain methods, allowing them to:

  1. not worry unnecessarily about the format of their data
  2. rest assured that egregiously unnecessary computation is not being performed without their knowledge
  3. small errors do not result in catastrophic errors
@brimoor brimoor added feature Work on a feature request backlog Issues related to the roadmap and feature backlog labels Jul 7, 2020
@brimoor
Copy link
Contributor Author

brimoor commented Jul 7, 2020

Jason comments

The brain functions all have a validate features that is set to False by default. The validate feature currently just checks basics like does the data exist and do the dataset fields exist. Now, when a user sends "bad" data to the brain, it can go undetected. We need to improve the default behavior. My proposal is to
(1) add more robust format checking to validate. E.g., if the function requires an rgb image, then check if it is so.
(2) make validate True by default
(3) Add a field that is auto_format, default to False; when True, auto convert/format the data when possible. E.g., convert grayscale to color.

Agree? Disagree?

(Note that even always-validate is non-trivial from a cost point of view as it may need to touch all the images in a dataset --> slow)

@brimoor
Copy link
Contributor Author

brimoor commented Jul 7, 2020

Brian comments

In the particular case of image formats, I think we best achieve Objective 1 by automatically reformatting input data whenever possible into the required format. So automatic grayscale/four-channel -> RGB conversion would be applied, for example.

Regarding validation, I propose that we consider adjusting the validation logic to support two modes of operation:

  • skip_invalid=True (the default): skip invalid samples on a per-sample basis. All valid samples will be successfully processed and some kind of descriptive but not overwhelming logging is displayed when one or more samples are skipped for various reasons. This achieves Objective 3
  • skip_invalid=False: all samples are validated before being processed. If any sample is invalid, an error is raised and no computation is performed

@brimoor
Copy link
Contributor Author

brimoor commented Jul 7, 2020

A corollary to my comments is that I would advocate for automatic auto-formatting whenever possible.

I can't yet think of anything so expensive that we would violate Objective 2 by auto-formatting (to achieve Objective 1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Issues related to the roadmap and feature backlog feature Work on a feature request
Projects
None yet
Development

No branches or pull requests

1 participant