-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raise error instead of warning when a user has missing data and add c… #2143
Conversation
…heck for train data in addition to test
Codecov Report
@@ Coverage Diff @@
## main #2143 +/- ##
==========================================
+ Coverage 88.14% 92.22% +4.07%
==========================================
Files 129 103 -26
Lines 7270 5143 -2127
==========================================
- Hits 6408 4743 -1665
+ Misses 862 400 -462
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this was ever a warning but this looks great, and thanks for adding tests 🙂👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the check on train data if it has missing values? We compute insights on test data only.
looking at the error analysis manager constructor, we actually use the test data there : but this highlights @imatiach-msft point, that if we fail because test data is missing but the user doesn't use error analysis, then we're failing their run for no reason. Right now, I kept the check for test data and train data to not have null values to keep the validation logic all in the same place but to Ilya's point, do you think we should even be failing and/or should this logic be in the respective manager code ? |
my thoughts were just that we shouldn't fail at all if the user's pipeline can handle missing values already... |
…olbox into hawestra/missingTrainDataValidation
*To address this bug: Bug 2477311: [Live site bug]Add length validation for test / train datasets
Description
Checklist