Add configuration option for choosing number of GPU streams when they're not per-thread #1222

msimberg · 2024-11-27T12:10:18Z

pika-org/pika#1294 will be part of pika 0.31.0, and changes the meaning of the "number of streams" parameters passed to the cuda_pool. Instead of signaling the number of streams per worker thread, they now mean the number of streams in total. This is unfortunately a silent breaking change in pika, but this PR attempts to make it somewhat loud in DLA-Future.

This PR introduces two new configuration options: num_np_gpu_streams and num_hp_gpu_streams. These will be used when pika is version 0.31.0 or newer. The old options will be used when using a version before 0.31.0. When attempting to use an option with the wrong version of pika, DLA-Future will print a warning that the option will be ignored. For compatibility one can set both the old and the new options at the same time to cover any version of pika, at the cost of a warning.

I'm not too worried about this causing problems, since so far we've never had the need to change the defaults on different systems.

32 streams (each for normal and high priority) is the same as the default in pika. DLA-Future's miniapps show no meaningful performance difference with the new option compared to the old per-thread streams. 32 was chosen as a reasonable middle ground. Going to something low like 4 or 8 showed a small slowdown, and going to something really high like 128 has no use since the GPUs don't support that much concurrency anyway. Note that with the previous setup we would actually create 192 normal and high priority streams on e.g. Grace (with 64 worker threads), which was clearly overkill. Despite creating so many streams, we could still be limited by the three streams per worker thread.

Note that the change in pika is really meant as a conceptual simplification (rather than a performance improvement), since it's easier to reason about how much concurrency the pool provides when the number of streams given is the total, rather than varying with the number of worker threads. It also matches now how we deal with the cuBLAS and cuSOLVER handles. However, it may allow corner cases to exploit more concurrency as well. In the case that @albestro encountered, where different continuations (launching CUDA work) end up running on the same worker thread, this new option allows the same worker thread to use all streams instead of being limited to the previous default of three streams per worker thread.

Note that I've updated test_init to test with a different configuration option. This is simply to avoid having to do tests conditional on the pika version there. The actual configuration option used for testing was never the important part, just that some configuration option is used.

Until pika 0.31.0 is released, pika main identifies itself as 0.30.1, so DLA-Future will not use the correct option, despite pika main already having the change from pika-org/pika#1294. I recommend staying with pika 0.30.1 until then.

msimberg · 2024-11-27T12:10:28Z

cscs-ci run

…'re not per-thread

msimberg · 2024-11-27T13:31:57Z

cscs-ci run

msimberg added this to the v0.7.0 milestone Nov 27, 2024

msimberg force-pushed the cuda-pool-shared-streams branch from dfa1c28 to f59aaa9 Compare November 27, 2024 13:30

Add configuration option for choosing number of GPU streams when they…

f601132

…'re not per-thread

msimberg force-pushed the cuda-pool-shared-streams branch from f59aaa9 to f601132 Compare November 27, 2024 13:31

msimberg requested review from rasolca, albestro and RMeli and removed request for rasolca November 27, 2024 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configuration option for choosing number of GPU streams when they're not per-thread #1222

Add configuration option for choosing number of GPU streams when they're not per-thread #1222

msimberg commented Nov 27, 2024

msimberg commented Nov 27, 2024

msimberg commented Nov 27, 2024

Add configuration option for choosing number of GPU streams when they're not per-thread #1222

Are you sure you want to change the base?

Add configuration option for choosing number of GPU streams when they're not per-thread #1222

Conversation

msimberg commented Nov 27, 2024

msimberg commented Nov 27, 2024

msimberg commented Nov 27, 2024