You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering if you were open to multi-threaded submissions, seeing as the problem is basically embarrassingly parallel. The only potentially tricky part would be keeping separate RNG streams/seeds (see e.g. Good Parameters and Implementations for Combined Multiple Recursive Random Number Generators (1999), Pierre L'Ecuyer, open access pdf here: https://www.iro.umontreal.ca/~lecuyer/myftp/papers/opres-combmrg2-1999.pdf).
This requirement (separate RNG streams) could be waived (seeing as it has no real impact on the benchmark speed).
I am asking about this in part because I believe that is the likely approach someone would take for speeding this up further (before going to something like AVX vector extensions).
Further, some of my preliminary benchmarks indicate that this could be worthwhile (the overhead of setting up threads is much smaller than the computation, and the speedups are almost proportional to the number of threads).
Thoughts?
The text was updated successfully, but these errors were encountered:
Hello!
I was wondering if you were open to multi-threaded submissions, seeing as the problem is basically embarrassingly parallel. The only potentially tricky part would be keeping separate RNG streams/seeds (see e.g. Good Parameters and Implementations for Combined Multiple Recursive Random Number Generators (1999), Pierre L'Ecuyer, open access pdf here: https://www.iro.umontreal.ca/~lecuyer/myftp/papers/opres-combmrg2-1999.pdf).
This requirement (separate RNG streams) could be waived (seeing as it has no real impact on the benchmark speed).
I am asking about this in part because I believe that is the likely approach someone would take for speeding this up further (before going to something like AVX vector extensions).
Further, some of my preliminary benchmarks indicate that this could be worthwhile (the overhead of setting up threads is much smaller than the computation, and the speedups are almost proportional to the number of threads).
Thoughts?
The text was updated successfully, but these errors were encountered: