-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
856 profile null column #1339
856 profile null column #1339
Conversation
…sed and produce a warning message as opposed to an error, and permitting the remainder of the charts generating.
Test: test_2_rounds_1k_duckdbPercentage change: -29.0%
Test: test_2_rounds_1k_sqlitePercentage change: -15.8%
Click here for vega lite time series charts |
…g_rules to provide a dataframe." This reverts commit 13a2925. Reverting changes that were intended for another branch
Is this ready for review? |
tests/test_profile_data.py
Outdated
captured_logs = caplog.text | ||
|
||
assert ( | ||
"Warning: No charts produced for test_2 as the column only contains null values." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a multi-line string to satisfy the linter (or add a #noqa mark if you're feeling particularly lazy.
splink/profile_data.py
Outdated
outer_spec = deepcopy(_outer_chart_spec_freq) | ||
|
||
outer_spec["vconcat"] = inner_charts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I agree with this being an odd error message- this needs to return something so the user knows it's their error, and it's that or something more general - i.e. "No charts produced due to missing data", but I think the specificity aids in de-bugging this if you've done it by accident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message isn't what's odd about it. In the special case that there's only one null column (or multiple columns that are all null) then the function still attempts to produce an empty chart after printing the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, apologies, I clearly reviewed this fairly quickly last time.
My comment was more directed towards the blank chart that is being generated, as Sam L mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks!
Some minor comments and then I'll approve.
@sama-ds it's just the blank chart creation that needs a minor adjustment and then I will approve. |
tests/test_comparison_lib.py
Outdated
@@ -59,6 +59,7 @@ def test_distance_function_comparison(): | |||
assert sum(df_pred[f"gamma_{col}"] == gamma_val) == expected_count | |||
|
|||
|
|||
@mark_with_dialects_excluding() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to make this backend agnostic, you need to add some extra goodies.
See the construction of the following test -
splink/tests/test_comparison_level_lib.py
Line 52 in c60e253
helper = test_helpers[dialect] |
For the backend agnostic logic to work, you need to use pick out a specific backend helper, grab the relevant comparison library (cl) and also supplies **helper.extra_linker_args()
to the linker object:
linker = helper.Linker(df, settings, **helper.extra_linker_args())
If you want to do this, I'd recommend doing it in a separate PR. Then you won't need to worry about rebasing from master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thorough breakdown can be found here - https://moj-analytical-services.github.io/splink/dev_guides/changing_splink/testing.html#writing-tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I changed this in error- thanks for spotting. This is not relevant to this PR.
…ervices/splink into 856_profile_null_column
Could you please the linter? |
tests/test_profile_data.py
Outdated
assert ( | ||
"Warning: No charts produced for test_2 as the column only contains null values." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work and satisfies the linter.
assert ( | |
"Warning: No charts produced for test_2 as the column only contains null values." | |
assert ( | |
"Warning: No charts produced for test_2 as the column only contains " | |
"null values." | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
name: '856_profile_null_column'
about: '856_profile_null_column'
title: '856_profile_null_column'
assignees: 'sama-ds'
Type of PR
Is your Pull Request linked to an existing Issue or Pull Request?
#856
Give a brief description for the solution you have provided
Previously, if a null column was passed to profile columns it would cause an error. This PR checks whether the column is null, and if it is instead returns a warning message to say that the specific graph could not be made and returns the remaining graphs.
PR Checklist