-
Notifications
You must be signed in to change notification settings - Fork 228
20x Slowdown in hypergeom.cdf #923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here are the values using current develop with different compilers: Apple Clang 14 w/libc++ using C++11 on M1
Apple Clang 14 w/libc++ using C++14 on M1
GCC 12 w/libstdc++ using C++11 on M1
GCC 12 w/libstdc++ using C++14 on M1
GCC 12 w/libstdc++ using C++17 on M1
GCC 12 w/libstdc++ using C++20 on M1
|
And here are values on x86-64 (i7-1255U) using Ubuntu 20.04.5: GCC 10 w/libstdc++ using C++11
GCC 10 w/libstdc++ using C++14
GCC 10 w/libstdc++ using C++17
The x86-64 slowdown is 10x instead of 20x on M1. Edit: Using Boost 1.77 and 1.75 yield the same results. |
Hmmm... Couple ideas.
|
Vague recolection about something |
This is weird (and rather worrying), just stepping through the code I see nothing that would obviously change with std version - there's no constexpr stuff in there either @ckormanyos. It is very computationally intensive though, lot's of table lookup of the prime numbers too. Ah... I wonder if this could be a thread safety thing doing table lookup? Some hidden locking going on maybe? |
Cygwin x86: C++11: real 0m0.734s c++14: real 0m18.668s Wow! |
Reproduced with PDF as well as CDF (the former being called by the latter internally). |
It looks like it might be related to prime number lookup - which uses std::array internally - were there any changes between c++11 and 14? Also noted that boost::math::prime is NOT expanded inline in C++14 at least even though it's a fairly trivial function. |
There was something seemingly trivial involving the member |
The only thing that would have changed for |
If you remove all of the contextual C++11
C++14
C++17
|
Yup, just found that too: it's the constexpr std::array's that are totally killing things. |
No change by making them C style arrays. |
Maybe because the storage isn't treated as |
Sometimes with big tables such as This is a pending issue that I have on microcontroller code in completely different domains. I tend to avoid |
For posterity here's a comparison of the code generated between the C++11 and C++14 version |
Thanks Matt. Is there a Boost.Math/Multiprecision coding rule emerging? For me it's actually more than posterity. I'm wondering what the take-away is here?
I know that we can't hash this all out here, but I've encountered (and forgotten) and encountered (and forgotten) this problem a few times. Cc: @jzmaddock and @mborland |
That's a good question. I guess I have falsely assumed that a |
Big slowdown in evaluation of constexpr tables. Fixes: #923
@mborland After reading this: https://quuxplusone.github.io/blog/2022/07/08/inline-constexpr/ |
These aren't global variables - they're members of a class template, so I think and certainly hope, they should be merged at link time ever since C++98. Wouldn't hurt to check though! |
I mean arrays a1, a2, a3 - they are global variables declared in namespace boost::math::detail, outside of function template prime. |
No more, they're class members:
|
Oops, sorry, I looked at the linked PR at the top. |
Does it make sense to move array values to some macros (A1_VALUES, A2_VALUES, A3_VALUES) to avoid repetition and reduce file size? |
See: scipy/scipy#16079
Minimal example with benchmarks
It's interesting that the slowdown is reproduced by C++11 vs C++14. I am hoping this will not evolve into a larger issue as we move to C++14 as the minimum standard.
The text was updated successfully, but these errors were encountered: