Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Implement Proximity Forest classifier #1729

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

itsdivya1309
Copy link
Contributor

@itsdivya1309 itsdivya1309 commented Jun 27, 2024

Reference Issues/PRs

Closes #159

What does this implement/fix? Explain your changes.

Implementation of Proximity Forest Algorithm using the Proximity Trees.

@aeon-actions-bot aeon-actions-bot bot added classification Classification package enhancement New feature, improvement request or other non-bug code enhancement labels Jun 27, 2024
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ].
I have added the following labels to this PR based on the changes made: [ $\color{#BCAE15}{\textsf{classification}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)

@itsdivya1309 itsdivya1309 marked this pull request as ready for review July 3, 2024 04:58
Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, we'll have to give it a run through on the UCR archive datasets as we discussed. Next steps are up to you really, we can discuss on Slack.

aeon/classification/distance_based/_proximity_forest.py Outdated Show resolved Hide resolved
aeon/classification/distance_based/_proximity_forest.py Outdated Show resolved Hide resolved
@MatthewMiddlehurst
Copy link
Member

This needs to be included in the API documentation also.

Comment on lines 103 to 112
def _fit_tree(self, X, y):
clf = ProximityTree(
n_splitters=self.n_splitters,
max_depth=self.max_depth,
min_samples_split=self.min_samples_split,
random_state=self.random_state,
n_jobs=self.n_jobs,
)
clf.fit(X, y)
return clf
Copy link
Member

@baraline baraline Jul 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment for predict, but I think it might be better to define the function you parallelize with joblib outside of the object you call them from. Something to do with the fact that joblib pickling the objects you parallelize, if I remember right ? This might mean that you create a copy of the ProximityForest object every time you call _fit_tree .

To avoid that, you would define _fit_tree as a function outside ProximityForest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out.

Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? I think we have functions elsewhere that do this. Interesting to see if that needs to be changed.

Copy link
Member

@baraline baraline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oher than this testing issue, the rest LGTM !

Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should investigate how we use joblib in other estimators in the classification module. Typically we use "threads" as a default backend, with a parameter to change that,

The docstring n_jobs needs updating.

Copy link
Member

@baraline baraline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only this small parameter missing and it should be good to go !

)

def _predict_proba(self, X):
output_probas = Parallel(n_jobs=self._n_jobs, prefer="threads")(
Copy link
Member

@baraline baraline Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed the need of a parameter for the joblib backend (i.e. threads vs processes), you should add a class parameter that default to threads.

It might be better to use the backend parameter instead of the prefer (see docs) to have a more fine grained control over the chosen backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
classification Classification package enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Implement the Proximity Forest classifier using aeon distances
3 participants