Rework missing values validation in etna metrics #514

d-a-bunin · 2024-11-28T12:49:22Z

Before submitting (must do checklist)

Did you read the contribution guide?
Did you update the docs? We use Numpy format for all the methods and classes.
Did you write any new necessary tests?
Did you update the CHANGELOG?

Proposed Changes

See #513.

Closing issues

Closes #513.

github-actions · 2024-11-28T12:54:30Z

🚀 Deployed on https://deploy-preview-514--etna-docs.netlify.app

codecov · 2024-11-28T13:29:13Z

Codecov Report

Attention: Patch coverage is 69.56522% with 21 lines in your changes missing coverage. Please review.

Project coverage is 90.42%. Comparing base (a1647bb) to head (2fab53a).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
etna/metrics/base.py	66.12%	21 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #514      +/-   ##
==========================================
- Coverage   90.52%   90.42%   -0.10%     
==========================================
  Files         247      247              
  Lines       16511    16559      +48     
==========================================
+ Hits        14946    14973      +27     
- Misses       1565     1586      +21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

d-a-bunin · 2024-11-29T07:41:34Z

etna/metrics/base.py

+        :
+            aggregated value of metric
+        """
+        return np.nanmean(list(metrics_per_segments.values())).item()  # type: ignore


As I understand, it won't work correctly, the error will be:

TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'

The None value isn't equal to NaN and it causes some problems.

I suggest the following to resolve this issue

Suggested change

return np.nanmean(list(metrics_per_segments.values())).item() # type: ignore

return np.nanmean(np.fromiter(metrics_per_segments.values(), dtype=float)).item() # type: ignore

It may throw an error if all segments metrics are None / nan. We should handle this case properly

d-a-bunin · 2024-11-29T08:02:29Z

etna/metrics/base.py

@@ -322,7 +345,7 @@ def __call__(self, y_true: TSDataset, y_pred: TSDataset) -> Union[float, Dict[st

        segments = df_true.columns.get_level_values("segment").unique()

-        metrics_per_segment: Dict[str, float]
+        metrics_per_segment: Dict[str, Optional[float]]


We should also probably do smth with NaNs returned from metric functions:

We should cast them into None inside __call__

We should cast NaNs into None inside the functional metrics

This should be handeled inside functional metric.

Probably yes, but

It makes implementation of functional metric more difficult

It doesn't save us from getting NaNs from functional metrics, in that case we have both None and NaN

If we work in matrix_to_array mode the result from functional metric is np.array. By default it has dtype float, to return None we should rework dtype or return list.

There will be more logic there, for example, if you use nanmean, an error may occur, it should be handled as well.

And when can nan occur there? I proceeded from the fact that None is returned if everything is empty.

This can also be processed, if you convert it to the object type, you can put None there, but most likely we will lose a little in performance in such a case.

I saw only RuntimeWarning: Mean of empty slice. What error are you talking about?

There are two scenarios: we allow NaNs to occur, we have some error that makes them to occur...

We could probably just return a list. I'm worried that returning list or ndarray[object] isn't that people regularly expect from functions like mean_squared_error, etc.

If we are going to move this logic into functional metric we should probably update ArrayLike or make a separate type hint for return value.

Yes, you're right. Sorry for misleading I referred to this

If nan is a result of some kind of error, it still would be converted to None at the last step in the functional metric, wouldn't it ?

This is true. Should we consider not replacing nan with None?

brsnw250 · 2024-12-02T08:20:26Z

docs/source/api_reference/metrics.rst

@@ -27,6 +27,7 @@ Enums:
   :template: class.rst

   MetricAggregationMode
+   MetricMissingMode


We should add MetricWithMissingHandling to the documentation as well.

Do we really want to add this? It seems like a developer hack than a public API.

Why is it a hack? Users themselves can inherit from this class to make their custom metrics handle missing values.

brsnw250 · 2024-12-02T08:21:29Z

CHANGELOG.md

@@ -9,7 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 - Add `load_dataset` to public API ([#484](https://github.com/etna-team/etna/pull/484))
 - Add example of using custom pipeline pools in `Auto` ([#504](https://github.com/etna-team/etna/pull/504))
- 
+- Change signature of `etna.metrics.Metric` to return `None` values ([#514](https://github.com/etna-team/etna/pull/514))


Should we add new class MetricWithMissingHandling here ?

etna/metrics/__init__.py

brsnw250 · 2024-12-02T08:50:05Z

etna/metrics/base.py

@@ -322,7 +345,7 @@ def __call__(self, y_true: TSDataset, y_pred: TSDataset) -> Union[float, Dict[st

        segments = df_true.columns.get_level_values("segment").unique()

-        metrics_per_segment: Dict[str, float]
+        metrics_per_segment: Dict[str, Optional[float]]


This should be handeled inside functional metric.

brsnw250 · 2024-12-02T09:04:29Z

etna/metrics/base.py

+        df_true = y_true.df.loc[:, pd.IndexSlice[:, "target"]]
+        df_pred = y_pred.df.loc[:, pd.IndexSlice[:, "target"]]
+
+        df_true_isna = df_true.isna().any().any()


Instead of calling any two times we can do

Suggested change

df_true_isna = df_true.isna().any().any()

df_true_isna = np.sum(df_true.isna().values)

brsnw250 · 2024-12-02T09:19:32Z

etna/metrics/base.py

+        :
+            aggregated value of metric
+        """
+        return np.nanmean(list(metrics_per_segments.values())).item()  # type: ignore


I suggest the following to resolve this issue

Suggested change

return np.nanmean(list(metrics_per_segments.values())).item() # type: ignore

return np.nanmean(np.fromiter(metrics_per_segments.values(), dtype=float)).item() # type: ignore

brsnw250 · 2024-12-02T09:22:05Z

etna/metrics/base.py

+        :
+            aggregated value of metric
+        """
+        return np.nanmean(list(metrics_per_segments.values())).item()  # type: ignore


It may throw an error if all segments metrics are None / nan. We should handle this case properly

etna/metrics/intervals_metrics.py

etna/pipeline/base.py

…hMissingHandling into public API

feature: add task files

39309f0

d-a-bunin self-assigned this Nov 28, 2024

chore: update changelog

2762a12

github-actions bot temporarily deployed to pull request November 28, 2024 12:54 Inactive

d-a-bunin commented Nov 29, 2024

View reviewed changes

d-a-bunin requested a review from brsnw250 December 2, 2024 07:38

brsnw250 requested changes Dec 2, 2024

View reviewed changes

fix: update macro_average, add erroneous segments in validation of nans

31b8d8e

github-actions bot temporarily deployed to pull request December 2, 2024 12:29 Inactive

fix: update macro_average

bde7a0e

github-actions bot temporarily deployed to pull request December 2, 2024 12:37 Inactive

fix: add type ignore

09fd761

github-actions bot temporarily deployed to pull request December 2, 2024 14:53 Inactive

d-a-bunin requested a review from brsnw250 December 3, 2024 08:40

fix: update handling missing values in Metric.__call__, add MetricWit…

2fab53a

…hMissingHandling into public API

github-actions bot temporarily deployed to pull request December 3, 2024 12:21 Inactive

brsnw250 approved these changes Dec 4, 2024

View reviewed changes

d-a-bunin merged commit 0b1b29d into master Dec 4, 2024
17 checks passed

d-a-bunin deleted the issue-513 branch December 4, 2024 07:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework missing values validation in etna metrics #514

Rework missing values validation in etna metrics #514

d-a-bunin commented Nov 28, 2024 •

edited

Loading

github-actions bot commented Nov 28, 2024 •

edited

Loading

codecov bot commented Nov 28, 2024 •

edited

Loading

d-a-bunin Nov 29, 2024

brsnw250 Dec 2, 2024

brsnw250 Dec 2, 2024

d-a-bunin Nov 29, 2024

brsnw250 Dec 2, 2024

d-a-bunin Dec 2, 2024 •

edited

Loading

brsnw250 Dec 2, 2024

d-a-bunin Dec 2, 2024

d-a-bunin Dec 2, 2024

brsnw250 Dec 3, 2024

brsnw250 Dec 2, 2024

d-a-bunin Dec 2, 2024

brsnw250 Dec 3, 2024

brsnw250 Dec 2, 2024

brsnw250 Dec 2, 2024

brsnw250 Dec 2, 2024

brsnw250 Dec 2, 2024

brsnw250 Dec 2, 2024

	return np.nanmean(list(metrics_per_segments.values())).item() # type: ignore
	return np.nanmean(np.fromiter(metrics_per_segments.values(), dtype=float)).item() # type: ignore

	df_true_isna = df_true.isna().any().any()
	df_true_isna = np.sum(df_true.isna().values)

Rework missing values validation in etna metrics #514

Rework missing values validation in etna metrics #514

Conversation

d-a-bunin commented Nov 28, 2024 • edited Loading

Before submitting (must do checklist)

Proposed Changes

Closing issues

github-actions bot commented Nov 28, 2024 • edited Loading

codecov bot commented Nov 28, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d-a-bunin Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d-a-bunin commented Nov 28, 2024 •

edited

Loading

github-actions bot commented Nov 28, 2024 •

edited

Loading

codecov bot commented Nov 28, 2024 •

edited

Loading

d-a-bunin Dec 2, 2024 •

edited

Loading