-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MeanEncoderTransform
generates wrong values
#492
Conversation
🚀 Deployed on https://deploy-preview-492--etna-docs.netlify.app |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #492 +/- ##
==========================================
- Coverage 90.61% 90.51% -0.10%
==========================================
Files 247 247
Lines 16486 16516 +30
==========================================
+ Hits 14938 14950 +12
- Misses 1548 1566 +18 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 3 out of 3 changed files in this pull request and generated no suggestions.
Comments skipped due to low confidence (2)
etna/transforms/encoders/mean_encoder.py:173
- Ensure that the initial values of
ans_cumsum
andans_cumcount
are correctly handled to avoid potential issues with NaN values.
ans_cumsum = np.full_like(target, np.nan)
tests/test_transforms/test_encoders/test_mean_encoder_transform.py:34
- Ensure that the new test cases cover all edge cases, including scenarios with multiple NaN values and different categories.
df["mean_encoded_regressor"] = [np.NaN, np.NaN, np.NaN, 1.5, 2.75, 2.25] + [np.NaN, np.NaN, 6.25, 7, 7.625, np.NaN]
|
tests/test_transforms/test_encoders/test_mean_encoder_transform.py
Outdated
Show resolved
Hide resolved
tests/test_transforms/test_encoders/test_mean_encoder_transform.py
Outdated
Show resolved
Hide resolved
|
||
ts = TSDataset(df, freq="D") | ||
return ts | ||
|
||
|
||
@pytest.fixture | ||
def multiple_nan_target_new_category_ts() -> TSDataset: | ||
"""Fixture with several timestamp with NaN target for new category where there were no notna targets yet.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to write:
Fixture with segment having multiple NaN targets before first non-NaN target value.
Look at the segment ``A``.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need segment B
here?
|
||
@pytest.fixture | ||
def multiple_nan_target_old_category_ts() -> TSDataset: | ||
"""Fixture with several timestamp with NaN target for category where there was already a notna target.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to write:
Fixture with segment having multiple NaN targets after first non-NaN target value.
Look at the segment ``B``.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need segment A
here? It almost repeats multiple_nan_target_new_category_ts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My motivation was to check each case when there is more that one unique category.
Okey, let't combine these tests. It can be done by adding one more timestamp with "A" category
Before submitting (must do checklist)
Proposed Changes
Closing issues
closes #490