-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recipes.partitionby slower than itertools.groupby #99
Comments
When you say "run-length encoding", on what kind of data? |
Hooray for optimization issue! I forgot that To echo what both of you have said, it would be great to have sample benchmarks that reflect your use case. |
Okay, 15 hours of travel later... here it goes! (end of vacation, yak shaving is my version of solitaire)
make sure apples == apples:
put it to the test:
gives 0.03869020694401115
gives 0.06634469900745898 repeated tests for a few different dtypes, seems consistent. edit: Thanks + mad props to all for your great work making fast-n-fun Python faster-n-more-fun. |
hm, you're only timing how long it takes to create the generator, not how long it takes to exhaust it.
However if you want
At this point the bottleneck is that numpy-arrays are slow when iterated over, even with
The time difference here is mostly due to the fact that
Still a bit slower but the difference isn't all that much anymore. You're working with homogeneous array-data, so you should consider switching to a numpy-based solution instead of working in pure-Python (or using PyPy?). Just to give an example using
So this can speed up the function by a factor of 100. I'm not sure what the takeaway message of this comment should be: I think |
Failure of the design of my provided test case, thanks for working through it. The observation stems from a software system that exhausts the iterator. It looks like my observation holds under tighter scrutiny, although the gap is narrowed.
I've built a prototype of a soft-realtime dataprocessing system in Python. I'll rewrite it in Rust when I have nothing better to do, but surprisingly Python is more than "fast enough."
Besides that y'all are awesome? I think "check out pypy / numba / cython" is about what I got. Cheers! |
Not surprising after reading the source and discovering
partitionby
merely wrapsgroupby
, but my use of groupby is a bottleneck for run-length encoding and I was hoping to see gains by leveraging cytoolz.I welcome suggestions and would be glad to provide sample code when at my desk.
Thanks!
The text was updated successfully, but these errors were encountered: