Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise error instead of warning when a user has missing data and add c… #2143

Merged
merged 7 commits into from
Jul 6, 2023

Conversation

hawestra
Copy link
Contributor

  1. Raise user validation error instead of warning when a user's test data has missing data
  2. Add this same validation check for a user's train data
  3. Add tests

*To address this bug: Bug 2477311: [Live site bug]Add length validation for test / train datasets

Description

Checklist

  • I have added screenshots above for all UI changes.
  • I have added e2e tests for all UI changes.
  • Documentation was updated if it was needed.

@codecov-commenter
Copy link

codecov-commenter commented Jun 26, 2023

Codecov Report

Merging #2143 (9069723) into main (a7cb6ec) will increase coverage by 4.07%.
The diff coverage is 100.00%.

❗ Current head 9069723 differs from pull request most recent head 9b0c01c. Consider uploading reports for the commit 9b0c01c to get more accurate results

@@            Coverage Diff             @@
##             main    #2143      +/-   ##
==========================================
+ Coverage   88.14%   92.22%   +4.07%     
==========================================
  Files         129      103      -26     
  Lines        7270     5143    -2127     
==========================================
- Hits         6408     4743    -1665     
+ Misses        862      400     -462     
Flag Coverage Δ
unittests 92.22% <100.00%> (+4.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...sibleai/responsibleai/rai_insights/rai_insights.py 87.74% <100.00%> (+0.06%) ⬆️

... and 26 files with indirect coverage changes

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this was ever a warning but this looks great, and thanks for adding tests 🙂👍🏻

Copy link
Contributor

@gaugup gaugup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the check on train data if it has missing values? We compute insights on test data only.

@hawestra
Copy link
Contributor Author

hawestra commented Jun 27, 2023

Why do we need the check on train data if it has missing values? We compute insights on test data only.

looking at the error analysis manager constructor, we actually use the test data there :
image
image

but this highlights @imatiach-msft point, that if we fail because test data is missing but the user doesn't use error analysis, then we're failing their run for no reason. Right now, I kept the check for test data and train data to not have null values to keep the validation logic all in the same place but to Ilya's point, do you think we should even be failing and/or should this logic be in the respective manager code ?

@imatiach-msft
Copy link
Contributor

my thoughts were just that we shouldn't fail at all if the user's pipeline can handle missing values already...

@hawestra hawestra requested a review from gaugup June 27, 2023 21:16
@hawestra hawestra enabled auto-merge (squash) July 5, 2023 22:17
@hawestra hawestra disabled auto-merge July 5, 2023 22:18
@hawestra hawestra merged commit 0d5e41a into main Jul 6, 2023
131 checks passed
@hawestra hawestra deleted the hawestra/missingTrainDataValidation branch July 6, 2023 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants