`MeanEncoderTransform` generates wrong values #492

egoriyaa · 2024-10-16T04:24:25Z

Before submitting (must do checklist)

Did you read the contribution guide?
Did you update the docs? We use Numpy format for all the methods and classes.
Did you write any new necessary tests?
Did you update the CHANGELOG?

Proposed Changes

Closing issues

closes #490

github-actions · 2024-10-16T04:31:18Z

🚀 Deployed on https://deploy-preview-492--etna-docs.netlify.app

codecov · 2024-10-16T05:11:51Z

Codecov Report

Attention: Patch coverage is 45.45455% with 18 lines in your changes missing coverage. Please review.

Project coverage is 90.51%. Comparing base (6cd66ae) to head (c4ca475).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
etna/transforms/encoders/mean_encoder.py	45.45%	18 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #492      +/-   ##
==========================================
- Coverage   90.61%   90.51%   -0.10%     
==========================================
  Files         247      247              
  Lines       16486    16516      +30     
==========================================
+ Hits        14938    14950      +12     
- Misses       1548     1566      +18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

etna/transforms/encoders/mean_encoder.py

Copilot reviewed 3 out of 3 changed files in this pull request and generated no suggestions.

Comments skipped due to low confidence (2)

etna/transforms/encoders/mean_encoder.py:173

Ensure that the initial values of ans_cumsum and ans_cumcount are correctly handled to avoid potential issues with NaN values.

ans_cumsum = np.full_like(target, np.nan)

tests/test_transforms/test_encoders/test_mean_encoder_transform.py:34

Ensure that the new test cases cover all edge cases, including scenarios with multiple NaN values and different categories.

df["mean_encoded_regressor"] = [np.NaN, np.NaN, np.NaN, 1.5, 2.75, 2.25] + [np.NaN, np.NaN, 6.25, 7, 7.625, np.NaN]

egoriyaa · 2024-11-05T07:15:36Z

`MeanEncoderTransform(..., mode="per-segment")`

segments=2000, periods=1000

2000 * 1000 unique categories
no_numba - 75 sec, numba 25 sec

200 unique categories
no_numba - 30 sec, numba 19 sec

1 unique category
no_numba - 18 sec, numba 18 sec

`MeanTransform(..., window=-1)` (for comparison)

segments=2000, periods=1000

7 sec

etna/transforms/encoders/mean_encoder.py

tests/test_transforms/test_encoders/test_mean_encoder_transform.py

d-a-bunin · 2024-11-06T08:33:31Z

tests/test_transforms/test_encoders/test_mean_encoder_transform.py


    ts = TSDataset(df, freq="D")
    return ts


+@pytest.fixture
+def multiple_nan_target_new_category_ts() -> TSDataset:
+    """Fixture with several timestamp with NaN target for new category where there were no notna targets yet."""


I think it is better to write:

Fixture with segment having multiple NaN targets before first non-NaN target value. Look at the segment ``A``.

Do we really need segment B here?

d-a-bunin · 2024-11-06T08:34:45Z

tests/test_transforms/test_encoders/test_mean_encoder_transform.py

+
+@pytest.fixture
+def multiple_nan_target_old_category_ts() -> TSDataset:
+    """Fixture with several timestamp with NaN target for category where there was already a notna target."""


I think it is better to write:

Fixture with segment having multiple NaN targets after first non-NaN target value. Look at the segment ``B``.

Do we really need segment A here? It almost repeats multiple_nan_target_new_category_ts.

My motivation was to check each case when there is more that one unique category.
Okey, let't combine these tests. It can be done by adding one more timestamp with "A" category

fix MeanEncoder

132d07f

egoriyaa self-assigned this Oct 16, 2024

update changelog

a50a638

github-actions bot temporarily deployed to pull request October 16, 2024 04:31 Inactive

fix

7bf1165

github-actions bot temporarily deployed to pull request October 16, 2024 09:06 Inactive

Egor Baturin added 3 commits October 21, 2024 10:29

fix

cdfbb61

fix

5a1e9d1

restore rnn file

4bace07

github-actions bot temporarily deployed to pull request October 21, 2024 07:37 Inactive

egoriyaa commented Oct 21, 2024

View reviewed changes

etna/transforms/encoders/mean_encoder.py Outdated Show resolved Hide resolved

fix mean encoder

c8cfd04

github-actions bot temporarily deployed to pull request November 1, 2024 08:13 Inactive

fix numba method

64dab7b

github-actions bot temporarily deployed to pull request November 2, 2024 07:08 Inactive

martins0n requested a review from Copilot November 5, 2024 04:49

Copilot AI reviewed Nov 5, 2024

View reviewed changes

egoriyaa requested a review from d-a-bunin November 5, 2024 07:21

d-a-bunin reviewed Nov 5, 2024

View reviewed changes

add comments

269cf7c

github-actions bot temporarily deployed to pull request November 5, 2024 13:49 Inactive

add comment for fixtures

2f72b59

egoriyaa requested a review from d-a-bunin November 5, 2024 13:53

add blank lines for more readability

3a48d1a

github-actions bot temporarily deployed to pull request November 5, 2024 13:56 Inactive

fix spelling

17b2199

github-actions bot temporarily deployed to pull request November 5, 2024 14:04 Inactive

d-a-bunin reviewed Nov 6, 2024

View reviewed changes

combine two tests in one

1cac8f5

github-actions bot temporarily deployed to pull request November 6, 2024 10:15 Inactive

egoriyaa requested a review from d-a-bunin November 6, 2024 10:18

add test for 2 segments

c4ca475

github-actions bot temporarily deployed to pull request November 6, 2024 12:12 Inactive

d-a-bunin approved these changes Nov 6, 2024

View reviewed changes

egoriyaa merged commit 4a6e975 into master Nov 6, 2024
17 checks passed

egoriyaa deleted the issue-490 branch November 6, 2024 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`MeanEncoderTransform` generates wrong values #492

`MeanEncoderTransform` generates wrong values #492

egoriyaa commented Oct 16, 2024

github-actions bot commented Oct 16, 2024 •

edited

Loading

codecov bot commented Oct 16, 2024 •

edited

Loading

egoriyaa commented Nov 5, 2024 •

edited

Loading

d-a-bunin Nov 6, 2024

d-a-bunin Nov 6, 2024

d-a-bunin Nov 6, 2024

d-a-bunin Nov 6, 2024 •

edited

Loading

egoriyaa Nov 6, 2024 •

edited

Loading

MeanEncoderTransform generates wrong values #492

MeanEncoderTransform generates wrong values #492

Conversation

egoriyaa commented Oct 16, 2024

Before submitting (must do checklist)

Proposed Changes

Closing issues

github-actions bot commented Oct 16, 2024 • edited Loading

codecov bot commented Oct 16, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

egoriyaa commented Nov 5, 2024 • edited Loading

MeanEncoderTransform(..., mode="per-segment")

segments=2000, periods=1000

MeanTransform(..., window=-1) (for comparison)

segments=2000, periods=1000

d-a-bunin Nov 6, 2024

Choose a reason for hiding this comment

d-a-bunin Nov 6, 2024

Choose a reason for hiding this comment

d-a-bunin Nov 6, 2024

Choose a reason for hiding this comment

d-a-bunin Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

egoriyaa Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

`MeanEncoderTransform` generates wrong values #492

`MeanEncoderTransform` generates wrong values #492

github-actions bot commented Oct 16, 2024 •

edited

Loading

codecov bot commented Oct 16, 2024 •

edited

Loading

egoriyaa commented Nov 5, 2024 •

edited

Loading

`MeanEncoderTransform(..., mode="per-segment")`

`MeanTransform(..., window=-1)` (for comparison)

d-a-bunin Nov 6, 2024 •

edited

Loading

egoriyaa Nov 6, 2024 •

edited

Loading