-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster computation of quantiles in describe
#2909
Conversation
Computing all quantiles when we only need the median is signficantly slower. Also avoid trying to compute quantiles for string columns since the failure only happens after sorting the vector, which is almost all of the work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. We could add a correctness test.
Also note that this assumes that some custom subtype of AbstractString
does not define arithmetic operations on it, but I assume it is OK to make this assumption in practice.
We already cover this AFAICT.
Yes, in theory that's possible, though I'm not sure what sense it would make... An alternative would be to call |
Can we be even smarter about this and use |
"Partially sorted" refers to |
Ah I see. Nevermind, this looks good. |
I've pushed a commit implementing the more general approach, let me know what you think. It's funny to note that on Julia >= 1.7, thanks to https://github.com/JuliaLang/Statistics.jl/pull/72, a test on |
Computing all quantiles when we only need the median is signficantly slower (and it's even worse than it should due to https://github.com/JuliaLang/Statistics.jl/issues/84).
Also avoid trying to compute quantiles for string columns since the failure only happens after sorting the vector, which is almost all of the work.