Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Rename transform_train to resample. #643

Open
anopsy opened this issue Mar 23, 2024 · 8 comments
Open

[BUG] Rename transform_train to resample. #643

anopsy opened this issue Mar 23, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@anopsy
Copy link
Contributor

anopsy commented Mar 23, 2024

The method in preprocessing.Outlier_Remover transform_train should be changed to fit_transform or transform.

@anopsy anopsy added the bug Something isn't working label Mar 23, 2024
@koaning
Copy link
Owner

koaning commented Mar 23, 2024

@FBruzzesi so this came up during the code spring. It indeed seems that our OutlierRemover doesn't have a transform method. And I also just noticed that we also only allow X as input.

I really forgot what we had in mind when we designed this one. But I'm wondering what we might want to do with it. @MBrouns do you remember? Is there a reason why we called it transform_train? Maybe related to the fact that we cannot really use it in a pipeline because it changes the shape of X?

@FBruzzesi
Copy link
Collaborator

FBruzzesi commented Mar 23, 2024

It should be related to #342.
The TL;DR is that the scikit learn Pipeline would not filter y and this would not work with supervised learning

@FBruzzesi FBruzzesi changed the title [BUG] change the name of the method transform_train to fit_transform or transform in preprocessing.Outlier_Remover [BUG] change the name of the method transform_train to fit_transform or transform in preprocessing.OutlierRemover Mar 23, 2024
@MBrouns
Copy link
Collaborator

MBrouns commented Mar 23, 2024

Yea I think that's indeed it. I'm up for calling it resample to make it work with the imblearn folk

@koaning koaning changed the title [BUG] change the name of the method transform_train to fit_transform or transform in preprocessing.OutlierRemover [BUG] Rename transform_train to resample. Mar 23, 2024
@koaning
Copy link
Owner

koaning commented Mar 23, 2024

That does feel better, yeah. Let's do that and update the docs accordingly.

@FBruzzesi
Copy link
Collaborator

FBruzzesi commented Mar 24, 2024

I am certainly not a user of imblearn. I tried to play around with it and it seems not to be so straightforward.
Curiously enough, one of the user guide explaining how to create a custom sampler implements an outlier detection.

To add more details, when adding resample and fit_resample methods, then I end up with the issue of having both fit_resample and transform implemented (the latter due to inheritance) which the imblearn Pipeline seems to not like.

@koaning
Copy link
Owner

koaning commented Mar 24, 2024

I think we can drop the TransformerMixin here if we want it to be more like a resampler.

Then again, part of me would also be "ok" with dropping this feature from this library. Doing sampling stuff in a pipeline really requires imblearn and I'm not sure if I like the idea of adding imblearn as a dependency for scikit-lego.

@FBruzzesi
Copy link
Collaborator

I definitely agree in not adding imblearn as a dependency. I honestly like the idea of having such feature, but maybe it is the wrong place for it

@koaning
Copy link
Owner

koaning commented Mar 26, 2024

We can also just make a utility function that just removes outliers. Something like:

X_new, y_new = remove_outliers(estimator, X, y)

Wouldn't this be the simplest/cleanest?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants