20x Slowdown in hypergeom.cdf #923

mborland · 2023-01-23T16:14:40Z

It's interesting that the slowdown is reproduced by C++11 vs C++14. I am hoping this will not evolve into a larger issue as we move to C++14 as the minimum standard.

mborland · 2023-01-23T17:55:42Z

Here are the values using current develop with different compilers:

Apple Clang 14 w/libc++ using C++11 on M1

./hypergeom1  0.59s user 0.00s system 67% cpu 0.880 total

Apple Clang 14 w/libc++ using C++14 on M1

./hypergeom2  11.84s user 0.02s system 97% cpu 12.108 total

GCC 12 w/libstdc++ using C++11 on M1

./hypergeom3  0.70s user 0.01s system 72% cpu 0.983 total

GCC 12 w/libstdc++ using C++14 on M1

./hypergeom4  12.36s user 0.03s system 97% cpu 12.745 total

GCC 12 w/libstdc++ using C++17 on M1

./hypergeom5  12.44s user 0.03s system 98% cpu 12.660 total

GCC 12 w/libstdc++ using C++20 on M1

./hypergeom6  12.72s user 0.03s system 97% cpu 13.123 total

mborland · 2023-01-23T18:16:48Z

And here are values on x86-64 (i7-1255U) using Ubuntu 20.04.5:

GCC 10 w/libstdc++ using C++11

real	0m0.619s
user	0m0.618s
sys	0m0.000s

GCC 10 w/libstdc++ using C++14

real	0m6.886s
user	0m6.881s
sys	0m0.004s

GCC 10 w/libstdc++ using C++17

real	0m5.766s
user	0m5.766s
sys	0m0.000s

The x86-64 slowdown is 10x instead of 20x on M1.

Edit: Using Boost 1.77 and 1.75 yield the same results.

ckormanyos · 2023-01-23T18:23:11Z

Hmmm...

Couple ideas.

Try to rule out optimization level? Try -Os and maybe try -O2 (which is not so radical)?
I've seen slowdowns with constexpr previously, but a factor of 20 seems huge.
the clang++ compiler allows you to actually specify stuff like -fconstexpr-steps=8 or also -fconstexpr-depth=8, which I expect wil lead to compilation errors, but may give you insight into what is or is not all getting a constexpr hit?

ckormanyos · 2023-01-23T18:24:07Z

Vague recolection about something chrono-esque going on in there.

jzmaddock · 2023-01-23T18:27:53Z

This is weird (and rather worrying), just stepping through the code I see nothing that would obviously change with std version - there's no constexpr stuff in there either @ckormanyos. It is very computationally intensive though, lot's of table lookup of the prime numbers too. Ah... I wonder if this could be a thread safety thing doing table lookup? Some hidden locking going on maybe?

jzmaddock · 2023-01-23T18:31:06Z

Cygwin x86:

C++11:

real 0m0.734s
user 0m0.640s
sys 0m0.000s

c++14:

real 0m18.668s
user 0m17.734s
sys 0m0.016s

Wow!

jzmaddock · 2023-01-23T18:33:18Z

Reproduced with PDF as well as CDF (the former being called by the latter internally).

jzmaddock · 2023-01-23T18:42:23Z

It looks like it might be related to prime number lookup - which uses std::array internally - were there any changes between c++11 and 14? Also noted that boost::math::prime is NOT expanded inline in C++14 at least even though it's a fairly trivial function.

ckormanyos · 2023-01-23T18:46:17Z

uses std::array internally - were there any changes between c++11 and 14

There was something seemingly trivial involving the member at() (which may throw). It became constexpr, but that seems irrelevant?

mborland · 2023-01-23T18:47:31Z

It looks like it might be related to prime number lookup - which uses std::array internally - were there any changes between c++11 and 14? Also noted that boost::math::prime is NOT expanded inline in C++14 at least even though it's a fairly trivial function.

The only thing that would have changed for std::array between C++11 and C++14 would be aggregate initialization rules. I doubt that would be in-play here.

mborland · 2023-01-23T18:53:29Z

If you remove all of the contextual constexpr-ness from prime and make it all follow the C++11 static const and inline you get:

C++11

./hypergeom1  0.69s user 0.01s system 74% cpu 0.947 total

C++14

./hypergeom1  0.65s user 0.01s system 81% cpu 0.810 total

C++17

./hypergeom1  0.65s user 0.01s system 67% cpu 0.991 total

jzmaddock · 2023-01-23T18:54:26Z

Yup, just found that too: it's the constexpr std::array's that are totally killing things.

jzmaddock · 2023-01-23T18:59:52Z

No change by making them C style arrays.

mborland · 2023-01-23T19:00:10Z

Yup, just found that too: it's the constexpr std::array's that are totally killing things.

Maybe because the storage isn't treated as static? We don't get that as an option until C++23

ckormanyos · 2023-01-23T19:07:51Z

Maybe because the storage isn't treated as static

Sometimes with big tables such as static const std::array<std::uint16_t, 3458> a3 I've seen compilers make a lot of intermediate code in order to handle the constexpr-ness. I do not know if this is a compelling factor in this case.

This is a pending issue that I have on microcontroller code in completely different domains. I tend to avoid constexpr talbes with more than 8-16 elements.

mborland · 2023-01-23T23:13:48Z

For posterity here's a comparison of the code generated between the C++11 and C++14 version

https://godbolt.org/z/zds83adMd

ckormanyos · 2023-01-24T07:49:58Z

For posterity here's a comparison of the code generated between the C++11 and C++14 version

Thanks Matt.

Is there a Boost.Math/Multiprecision coding rule emerging?

For me it's actually more than posterity. I'm wondering what the take-away is here?

Use constexpr std::array<> with caution?
Use only small constexpr std::array<>s with dimension limited to what?
Don't use constexpr std::array<>?
Use constexpr std::array<> if (and only if) the use case is known to be unequivocally and exclusively constexpr in its context?

I know that we can't hash this all out here, but I've encountered (and forgotten) and encountered (and forgotten) this problem a few times.

Cc: @jzmaddock and @mborland

mborland · 2023-01-24T17:06:37Z

That's a good question. I guess I have falsely assumed that a constexpr std::array<> didn't have to be regenerated every time the function was called. In the liked PR I should be able to get a wrapper class working that would allow us to use static constexpr std::array<> with C++14 instead of C++23. It's kind of an ugly solution but it would be the best of both worlds for the end user. It helps that we are moving to C++14 in the next release anyway because it would definitely break C++11 constexpr guarantees.

Big slowdown in evaluation of constexpr tables. Fixes: #923

valiko-ua · 2023-03-17T17:50:35Z

@mborland After reading this: https://quuxplusone.github.io/blog/2022/07/08/inline-constexpr/
I wonder if we should mark those arrays as static constexpr inline?
Won't we have multiple copies of those arrays in every compilation unit if we don't mark them inline?

jzmaddock · 2023-03-17T17:58:22Z

These aren't global variables - they're members of a class template, so I think and certainly hope, they should be merged at link time ever since C++98. Wouldn't hurt to check though!

valiko-ua · 2023-03-17T18:05:12Z

These aren't global variables - they're members of a class template, so I think and certainly hope, they should be merged at link time ever since C++98. Wouldn't hurt to check though!

I mean arrays a1, a2, a3 - they are global variables declared in namespace boost::math::detail, outside of function template prime.

jzmaddock · 2023-03-17T18:30:32Z

I mean arrays a1, a2, a3 - they are global variables declared in namespace boost::math::detail, outside of function template prime.

No more, they're class members:

math/include/boost/math/special_functions/prime.hpp

Line 22 in 5376689

struct prime_data_imp

valiko-ua · 2023-03-17T18:46:41Z

Oops, sorry, I looked at the linked PR at the top.

valiko-ua · 2023-03-18T20:24:58Z

Does it make sense to move array values to some macros (A1_VALUES, A2_VALUES, A3_VALUES) to avoid repetition and reduce file size?

mborland mentioned this issue Jan 23, 2023

BUG: hypergeom.cdf slower in 1.8.0 than 1.7.3 scipy/scipy#16079

Closed

mborland linked a pull request Jan 23, 2023 that will close this issue

Fix for Scipy Issue 16079 #925

Closed

mborland changed the title ~~20x Slowdown in hypergeom.cdf~~ BREAKING CHANGE: Fix for Scipy Issue 16079 Jan 23, 2023

mborland changed the title ~~BREAKING CHANGE: Fix for Scipy Issue 16079~~ 20x Slowdown in hypergeom.cdf Jan 23, 2023

jzmaddock added a commit that referenced this issue Jan 27, 2023

Fix for Scipy Issue 16079.

c560d86

Big slowdown in evaluation of constexpr tables. Fixes: #923

jzmaddock mentioned this issue Jan 27, 2023

Fix for Scipy Issue 16079. #930

Merged

jzmaddock closed this as completed in #930 Jan 29, 2023

mborland mentioned this issue Jan 16, 2024

Template coefficient lists for tables cppalliance/decimal#380

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20x Slowdown in hypergeom.cdf #923

20x Slowdown in hypergeom.cdf #923

mborland commented Jan 23, 2023

mborland commented Jan 23, 2023

mborland commented Jan 23, 2023 •

edited

Loading

ckormanyos commented Jan 23, 2023

ckormanyos commented Jan 23, 2023

jzmaddock commented Jan 23, 2023

jzmaddock commented Jan 23, 2023

jzmaddock commented Jan 23, 2023

jzmaddock commented Jan 23, 2023

ckormanyos commented Jan 23, 2023 •

edited

Loading

mborland commented Jan 23, 2023

mborland commented Jan 23, 2023

jzmaddock commented Jan 23, 2023

jzmaddock commented Jan 23, 2023

mborland commented Jan 23, 2023

ckormanyos commented Jan 23, 2023

mborland commented Jan 23, 2023

ckormanyos commented Jan 24, 2023 •

edited

Loading

mborland commented Jan 24, 2023

valiko-ua commented Mar 17, 2023

jzmaddock commented Mar 17, 2023

valiko-ua commented Mar 17, 2023

jzmaddock commented Mar 17, 2023

valiko-ua commented Mar 17, 2023

valiko-ua commented Mar 18, 2023

20x Slowdown in hypergeom.cdf #923

20x Slowdown in hypergeom.cdf #923

Comments

mborland commented Jan 23, 2023

mborland commented Jan 23, 2023

mborland commented Jan 23, 2023 • edited Loading

ckormanyos commented Jan 23, 2023

ckormanyos commented Jan 23, 2023

jzmaddock commented Jan 23, 2023

jzmaddock commented Jan 23, 2023

jzmaddock commented Jan 23, 2023

jzmaddock commented Jan 23, 2023

ckormanyos commented Jan 23, 2023 • edited Loading

mborland commented Jan 23, 2023

mborland commented Jan 23, 2023

jzmaddock commented Jan 23, 2023

jzmaddock commented Jan 23, 2023

mborland commented Jan 23, 2023

ckormanyos commented Jan 23, 2023

mborland commented Jan 23, 2023

ckormanyos commented Jan 24, 2023 • edited Loading

mborland commented Jan 24, 2023

valiko-ua commented Mar 17, 2023

jzmaddock commented Mar 17, 2023

valiko-ua commented Mar 17, 2023

jzmaddock commented Mar 17, 2023

valiko-ua commented Mar 17, 2023

valiko-ua commented Mar 18, 2023

mborland commented Jan 23, 2023 •

edited

Loading

ckormanyos commented Jan 23, 2023 •

edited

Loading

ckormanyos commented Jan 24, 2023 •

edited

Loading