-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mulmod_shoup and nmod_vec_scalar_mul_nmod #2055
Conversation
…preinv; on zen4 the threshold is clearly too high for switching to the Shoup precomp multiplication
…ove constraint t < p (or b < n); unify naming (a*b mod n like mulmod2_preinv or mulmod_precomp, instead of w*t mod p) in both doc and code; start explanations of correctness and restrictions
Excerpts from my measurements. For
And for mulmod routines (cycle counts should all be off by the same constant factor). The kind of useful feedback I was mentioning above is: are these "microbenchmarks" somehow accurate, can we for example deduce that on this machine, running
|
On x86, this makes a lot of sense since it basically just consists of a
Yeah, we would really need a test suite as hardware changes over time. Moreover (and I believe I have mentioned this in some issue), we don't know why the threshold of 10 was set (perhaps a specific CPU, whatever). I could only guess that it was, in general, faster on x86 fifteen years ago or so, but that it today is faster the other way around. |
umul_ppmm(p1, p2, a, b); | ||
p1 %= d; | ||
udiv_qrnnd(q, r2, p1, p2, d); | ||
|
||
result = (r1 == r2); | ||
int result = (r1 == r2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually declare variables as the good old days, i.e. in the beginning of the function. I know this differs somewhat (I know the fft_small
module declares them where needed), but I think we should be consistent. Perhaps @fredrik-johansson could give his input when he comes back from his vacation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm indeed interested in viewpoints on this. As I wrote in another comment above, for these test files and some specific variables in them (loop counter, result
, etc.) I totally understand that we want consistency and may want to declare them once for all at the beginning. But for core code (non-test, non-profile), I would not say the same, I find that in some places it makes the code harder to read and write because it makes less obvious --- sometimes far less when there many variables or long functions --- the scope of variables, whether they already have a value at some point, or even simply what their type is. Also in some pieces of code it even forbids us to do some things, like declaring some variables const
. This being said, I guess (well, I hope) that the compiler is good enough for this to make no difference, so if there is a strong standpoint on the "old C style" declaration, I will follow it; in this case this should probably be written somewhere (I don't see it in the current code conventions in contributing.html).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the aspect of the compiler, I don't think it makes a difference. For me, it is about being able to study how many registers will be used, sort of. However, this obviously goes against the thought of a higher-level language. Moreover, I suspect I am one of the few that still prefers this way of declaring variables. And yes, whatever way we choose to go, it would be good if it is documented.
…ar (closing FLINT_TEST_INIT) into FLINT_TEST_CLEAR
Co-authored-by: Albin Ahlbäck <[email protected]>
Co-authored-by: Albin Ahlbäck <[email protected]>
Co-authored-by: Albin Ahlbäck <[email protected]>
Done, in the last commits. |
… variable names for nmod matrix*scalar; minor fixes
…e nmod_mat scalar mul into functions
@albinahlback a quick check about the |
An extremely minor nitpick: I think I prefer threshold values to be defined so that one writes
rather than
Feel free to change this (or don't bother if you have better things to do). Other than that, positive review from me. |
I did this. It actually also simplifies away some ugly macro checks.
Ok! I've made minor changes and will merge if CI passes. I'll update broadwell's |
This PR touches things related to single word modular multiplication with precomputation (
n_mulmod_shoup
).ulong_extras/mulmod_precomp_shoup.c
(could be put elsewhere, because this file might disappear, see below)nmod_vec_scalar_mul/addmul
andnmod_mat_nmod_vec_mul
Two conclusions from the latter performance tests, done on two machines (zen4 / icelake):
n_mulmod_precomp_shoup
inline: any opinion against that?mulmod2_preinv
tomulmod_shoup
, which is 10, seems too high. In fact, I get better results withmulmod_shoup
as soon as the length exceeds... 1. For length 1, the results are quite close and sometimes in favor ofmulmod_shoup
(not really sure why). Some feedback could be useful about the accuracy of the new profiling files for small lengths, because I am not so familiar with benchmarking on so "small" computations. If this conclusion turns out correct, it seems that we could lower the threshold to 2, or even avoid the_generic
code entirely (the_fullword
one must be kept because this modular multiplication method does not accept 64-bit moduli).