-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): Add require_all
parameter to the by_name
column selector
#15028
Conversation
2d39d0e
to
1509ab1
Compare
I was thinking of using |
It actually isn't, as we don't overwrite the global "any", or use "any" in the function. However, I do usually use "any_" as per that comment, as it's better practice overall - should probably do so here too. I'll change that when I finish in the gym ;)
Hmm, I don't think it's as obvious in this context what "strict" would imply? Not as obvious (to me) as "any" (or "any_"), at least. |
IMHO, |
1509ab1
to
9eca36e
Compare
by_name
selectorby_name
selector
Well now you're making me want to add a |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #15028 +/- ##
=======================================
Coverage 80.82% 80.82%
=======================================
Files 1393 1393
Lines 179341 179345 +4
Branches 2918 2920 +2
=======================================
+ Hits 144955 144964 +9
+ Misses 33884 33879 -5
Partials 502 502 ☔ View full report in Codecov by Sentry. |
+1 for |
This is similar to the discussion in #11913 I'm not sure there needs to be a parameter for this, but if there is one, I wouldn't want to call it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above (just tagging this with 'request changes' to organize my personal backlog).
Any updates on this one? @alexander-beedie |
9eca36e
to
51a1d9c
Compare
@ritchie46: Yup, just updated it and rebased against the latest code; left this dangling a bit longer than planned! I believe we do want a parameter here as it enables a clean selection pattern - apparently it just took me two months to think of an improved parameter name :)) cs.by_name("baz","foo","zap", match_any=True) This reads a lot better than before; you're matching by name, and it'll match any of the names. Final feels self-explanatory now? ( |
by_name
selectorby_name
selector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, I guess a parameter here could be helpful.
match_any
is definitely more clear than any_
, but it still doesn't make sense to me.
Without the flag, we're saying "Select the columns A, B, and C."
With the new flag, we're saying "Select columns A, B, and C, and if any of those don't exist, you can just ignore them.".
With that in mind, wouldn't ignore_missing
make much more sense? match_any
could mean anything. I would read it as "Select any column that matches these names", which is in fact the default behavior. It tells me nothing about what should happen if one of those columns doesn't exist.
Really happy to see something like this @alexander-beedie Seeing the new name, could this be generalised to the below? def matches_any(*names: str) -> SelectorType:
"""
Select any columns with an exact match in `names`.
Where no columns match, no operation is performed.
"""
return cs.matches(f"^{"|".join(names)}$") Where |
No, we're saying "match any of A, B or C", which is even more straightforward.
Not to me; labelling columns that weren't matched as "missing" isn't correct, as "missing" implies that they were expected to be there. But if you have set "match_any" then you clearly don't expect some of them to be there, so they aren't missing - they simply aren't present, which isn't the same thing 😜
That's not the default behaviour though; the default behaviour requires that you match all columns, not match any column. You only get to match any column if you set... |
😎👍
Not really, as it would tread on the existing (Your pattern match looks like it would fail on names with special chars in them? Could tweak like so: Footnotes
|
That makes sense. Looking forward to using the
Ooh good find! Thanks I hadn't tried this yet with any cases containing special chars. I'm only just seeing now that |
c213085
to
610804d
Compare
Rebased / resolved conflicts... |
610804d
to
ea1bd98
Compare
I really don't want to merge this with the current parameter name. It's extremely unclear to me. What are you matching? Any of what? What would it mean to set it to False, or True? I went ahead and asked ChatGPT for help:
Of course the prompt is leading it in that direction - but if you can show me a prompt that returns I propose we change it to |
I dislike the involvement of "missing" as it is incorrect. Unmatched columns are not missing, as that word implies an expectation that they should be there, which is wrong; if you expected them to be there then you wouldn't set a param that allows them not to be:
In general l also prefer parameter names to indicate something active/explicit...
...rather than passive/implicit:
ChatGPT 4o result:
It feels somewhat self-evident from the selector name; |
In case it helps, I have two additional suggestions for this argument:
(One could also replace |
What about the somewhat vague but rather all-purpose |
8f40dd4
to
cd8f700
Compare
by_name
selectorby_name
selector
by_name
selectorby_name
selector
cd8f700
to
1639789
Compare
@stinodego: ok, here we go! Thanks for the ideas all; we've discussed it thoroughly and are going with |
by_name
selectorrequire_all
parameter to the by_name
column selector
Closes #15023.
Closes #16219.
Adds a new (opt-in) "match_any" param to the
by_name
selector.Default behaviour remains unchanged.
Update: have changed the parameter name from "any" (and then "match_any"), to "require_all" and adjusted the default value accordingly 😎
Before
Default behaviour; expect all given columns to be present:
After
Enable permissive matching on any of the given columns: