-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Add SelectCols(cols)
and DropCols(cols)
transformers
#804
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at this PR, and I'm quite convinced that this is the right way to go. Great job @jeromedockes !
Can you add the two objects in the API documentation?
And maybe add a tiny section on https://skrub-data.org/stable/assembling.html, between "joining" and "going further". We should stress here that the benefit is to be able to build pipelines.
That's definitely an option, too. A few small things added up and made me lean towards the version with 2 classes:
Of course that's only a first impression and we'll have to see how it plays out in practice and in the examples. They're simple things so we can prototype several approaches and see what we prefer. The issue also suggested adding a "drop" parameter to most transformers. I thought it would be more work (especially if we want to add "rename", "select" parameters), add parameters to already complex classes with other interactions to think about (do we drop before or after vectorizing?), duplicate some of the work (or at least parameter documentation) across the existing transformers. OTOH, it would result in shorter pipelines and fewer classes for users to discover, so it is still certainly worth discussing. |
also pandas and polars both have separate select / getitem and drop |
I added the api entries and the paragraph in the user guide. I put the api section after "joining" to keep the order similar to the user guide |
do we want to allow passing a string instead of a list of (1) string if we want to select or drop a single column? |
do we want to allow passing a string instead of a list of (1) string if we want to select or drop a single column?
I think that it would be great!
|
SelectCols(cols)
and DropCols(cols)
transformersSelectCols(cols)
and DropCols(cols)
transformers
ok it is ready to review |
SelectCols(cols)
and DropCols(cols)
transformersSelectCols(cols)
and DropCols(cols)
transformers
Also, I named it |
Question: could it, should it, be used in https://skrub-data.org/stable/auto_examples/08_join_aggregation.html#sphx-glr-auto-examples-08-join-aggregation-py ? |
Question: could it, should it, be used in https://skrub-data.org/stable/auto_examples/08_join_aggregation.html#sphx-glr-auto-examples-08-join-aggregation-py ?
I'm not sure; the only projection I see is to separate X and y from the dataset that contains both at the beginning. And then X only has 3 columns, all of which are used. Looking at the features added by the transformers in the pipeline, I don't see any that should be dropped
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments.
I am not sure if we have polars support. Is it easy to add? In all cases, we should be explicit
Co-authored-by: Gael Varoquaux <[email protected]>
yes you can see in the tests |
Awesome!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One tiny comment, but to me, this is good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice and useful features, thanks @jeromedockes!
OK, merging then. Thanks @jeromedockes ! |
closes #670
a decision on the API has actually not been made in the linked issue, this is to help make the discussion more concrete