Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Wrappers (e.g., BootStrapper) in MetricCollection #2046

Closed
pietrolesci opened this issue Sep 3, 2023 · 3 comments
Closed

Support for Wrappers (e.g., BootStrapper) in MetricCollection #2046

pietrolesci opened this issue Sep 3, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@pietrolesci
Copy link

🚀 Feature

Allow to pass two bootstrappers to MetricCollection as a dict.

Motivation

Currently the behaviour is the following.

import torch

from torchmetrics.wrappers import BootStrapper
from torchmetrics.classification import MulticlassAccuracy, MulticlassF1Score
from torchmetrics import MetricCollection

_ = torch.manual_seed(42)
preds = torch.randn(10, 3).softmax(dim=-1)
target = torch.randint(3, (10,))

bootstrap_acc = BootStrapper(MulticlassAccuracy(num_classes=3, average=None), num_bootstraps=20)
bootstrap_f1 = BootStrapper(MulticlassF1Score(num_classes=3, average=None), num_bootstraps=20)

If MetricCollection is build as a list of metrics, a ValueError is thrown

metrics_list = MetricCollection([bootstrap_acc, bootstrap_f1]) 
# same as MetricCollection(bootstrap_acc, bootstrap_f1)

# ValueError: Encountered two metrics both named BootStrapper

If MetricCollection is build as a dict of metrics, results are conflated

metrics_dict = MetricCollection({"acc": bootstrap_acc, "f1": bootstrap_f1})
metrics_dict(preds, target)

# {'mean': tensor([0.5259, 0.7331, 0.0000]),
#  'std': tensor([0.1216, 0.1767, 0.0000])}

the expected behaviour could be a flattened dict where the keys are prepended to the statistics

{
    'acc_mean': tensor(...),
    'acc_std': tensor(...),
    'f1_mean': tensor(...),
    'f1_std': tensor(...),
}

or a nested dict

{
    'acc': {
        'mean': tensor(...),
        'std': tensor(...),
    },
    'f1': {
        'mean': tensor(...),
        'std': tensor(...),
    },
}
@pietrolesci pietrolesci added the enhancement New feature or request label Sep 3, 2023
@SkafteNicki
Copy link
Member

Hi @pietrolesci, thanks for reporting this issue.
Happy to report that it is already fixed by this PR #2027 and the fix is available in v1.1.1 of torchmetrics.
It will still only work when it is build as a dict, because it is a general restriction of the MetricCollection object that two metrics cannot have the same name. That said, if you evaluate:

metrics_dict = MetricCollection({"acc": bootstrap_acc, "f1": bootstrap_f1})
print(metrics_dict(preds, target))

using v1.1.1 of torchmetrics you should get something like:

{
 'acc_mean': tensor([0.4767, 0.7992, 0.0000]), 
 'acc_std': tensor([0.2124, 0.1736, 0.0000]), 
 'f1_mean': tensor([0.5259, 0.7331, 0.0000]), 
 'f1_std': tensor([0.1216, 0.1767, 0.0000])
}

Closing issue.

@pietrolesci
Copy link
Author

Thanks a lot for your swift reply @SkafteNicki!

@pietrolesci
Copy link
Author

pietrolesci commented Sep 8, 2023

Hi @SkafteNicki,

The behaviour of the BootStrapper is a bit confusing now. If I pass a dict with only one BootStrapper instance to MetricCollection I still get {"mean": ..., "std": ...} which is counterintuitive.

For a case like

metrics_dict = MetricCollection({"acc": bootstrap_acc})

I think the expected output should be

{
 'acc_mean': tensor([0.4767, 0.7992, 0.0000]), 
 'acc_std': tensor([0.2124, 0.1736, 0.0000]), 
}

because the user passed a key which likely makes sense for them. I have this problem when I have multiple metrics in the dict and I want to differentiate between them, e.g.,

for

{
    "bootf1_micro": BootStrapper(F1Score(task, num_classes=num_classes, average=average)),
     "accuracy_micro": Accuracy(task, num_classes=num_classes, average=average),
     "f1_micro": F1Score(task, num_classes=num_classes, average=average),
     "precision_micro": Precision(task, num_classes=num_classes, average=average),
     "recall_micro": Recall(task, num_classes=num_classes, average=average),
}
{
    'mean': tensor([0.4939, 0.7287, 0.0000]),  # this should have been bootf1_micro_mean
    'std': tensor([0.1622, 0.1454, 0.0000]),  # this should have been bootf1_micro_std
    'accuracy_micro': tensor(0.5000),
    'f1_micro': tensor(0.5000),
    'precision_micro': tensor(0.5000),
    'recall_micro': tensor(0.5000),
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants