-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(python): More accurate and helpful docs for user defined functions #15194
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #15194 +/- ##
==========================================
+ Coverage 80.81% 81.41% +0.59%
==========================================
Files 1464 1425 -39
Lines 192019 187964 -4055
Branches 2743 2704 -39
==========================================
- Hits 155185 153027 -2158
+ Misses 36323 34440 -1883
+ Partials 511 497 -14 ☔ View full report in Codecov by Sentry. |
Thanks for working on this!
The purpose of Regarding Numba: given that is' quite dangerous to use it when the Series has missing values, I don't think it should currently be recommended in the user guide. I'd suggest either keeping it out for now, or sorting out Numba first |
Merged forward, ready for review again @MarcoGorelli. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, looking better
I think map_elements
needs mentioning though. perhaps you could even start with this one? Something like:
- show that you can do
pl.col('a').map_elements(lambda x: math.log(x))
- then, say that it's better to use vectorised operations, and if you have a numpy function you want to apply, you can pass the column all in at once with
pl.col('a').map_batches(np.log)
- then, show how you can use numba to pass a custom vectorised function
docs/src/python/user-guide/expressions/user-defined-functions.py
Outdated
Show resolved
Hide resolved
Thank you, I will address these. |
@MarcoGorelli back to you, either addressed or commented above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing
FAILED tests/docs/test_user_guide.py::test_run_python_snippets[path10] - polars.exceptions.PolarsInefficientMapWarning:
Expr.map_elements is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
Replace this expression...
- pl.col("values").map_elements(my_log)
with this one instead:
+ pl.col("values").log()
pop up in the CI logs brought a smile to my face 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @itamarst , good one!
I made some minor comments, and made a commit to address them, as they're really minor
Leaving open a little in case others have objections, will merge if they don't
EDIT: I can't merge this one, but I think it's ready
@MarcoGorelli I merged forward to deal with conflicts. |
@MarcoGorelli can we get this merged? |
Can you do a rebase? |
Will do that now. |
My editor is having Issues so still need to manually verify that merge was correct. |
Head branch was pushed to by a user without write access
OK I think it's good now. Note that auto-merge won't happen because I edited something, you'll need re-enable it if you're happy. |
Fixes #14699
Fixes #14508
map_batches()
, given implementation changes.map_elements()
, for simplicity's sake. This can eventually be added in a new section, presuming I can figure out whymap_elements()
even exists.This adds Numba as documentation requirement, so when e.g. Python 3.13 is released, docs won't be buildable on 3.13 until Numba has a new release.