Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Adds class_weight to those classifier that support it. Required for imbalanced datasets. #1776

Merged
merged 3 commits into from
Jul 9, 2024

Conversation

patrickzib
Copy link
Contributor

@patrickzib patrickzib commented Jul 9, 2024

  • This PR adds class_weight to those classifiers that support it by sklearn design. class_weight is intended for training with imbalanced datasets.

From ExtraTreesClassifier:

"""
class_weight{“balanced”, “balanced_subsample”}, dict or list of dicts, default=None
Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.
Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))
The “balanced_subsample” mode is the same as “balanced” except that weights are computed based on the bootstrap sample for every tree grown.
For multi-output, the weights of each column of y will be multiplied.
Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
"""
  • Fixes a bug in Quant, as the random_state parameter was not passed to the internal estimator.

@patrickzib patrickzib added enhancement New feature, improvement request or other non-bug code enhancement classification Classification package labels Jul 9, 2024
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I would have added the following labels to this PR based on the changes made: [ $\color{#BCAE15}{\textsf{classification}}$ ], however some package labels are already present.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)

@patrickzib
Copy link
Contributor Author

patrickzib commented Jul 9, 2024

There was also a bug in Quant (not Hydra), as the random_state parameter was not passed to the classifier. Now fixed

@patrickzib patrickzib changed the title [ENH] Adds class_weight to those classifier that support it. [ENH] Adds class_weight to those classifier that support it. Required for imbalanced datasets. Jul 9, 2024
@patrickzib patrickzib self-assigned this Jul 9, 2024
@hadifawaz1999
Copy link
Member

i think you mean you fixed the random_state thing in quant, not hydra @patrickzib no ?

@patrickzib
Copy link
Contributor Author

i think you mean you fixed the random_state thing in quant, not hydra @patrickzib no ?

Uh, yes :)

@hadifawaz1999
Copy link
Member

will raise an issue to keep in mind doing this for the rest of classifiers as lots others use sklearn based estimators

hadifawaz1999
hadifawaz1999 previously approved these changes Jul 9, 2024
Copy link
Member

@hadifawaz1999 hadifawaz1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! check #1777 for the future

TonyBagnall
TonyBagnall previously approved these changes Jul 9, 2024
Copy link
Contributor

@TonyBagnall TonyBagnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, be interested to see if it makes any difference. Just saw the conflict

@patrickzib patrickzib dismissed stale reviews from TonyBagnall and hadifawaz1999 via 1a93ef2 July 9, 2024 19:57
@patrickzib
Copy link
Contributor Author

Thank you @TonyBagnall @hadifawaz1999 . Would you mind approving again? I had to merge main.

@patrickzib patrickzib merged commit 34cf7b5 into main Jul 9, 2024
13 of 14 checks passed
@patrickzib patrickzib deleted the adds_class_weights branch July 9, 2024 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
classification Classification package enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants