Best strategies for multi-GPU and multi-CPU simulations? #3885

ali-ramadhan · 2024-10-30T13:27:17Z

ali-ramadhan
Oct 30, 2024
Maintainer

Been digging into the distributed and multi-region modules (really awesome work!) and I'm curious about the best strategies for running large simulations on multiple CPUs and GPUs.

I know @simone-silvestri has run simulations on hundreds to thousands (!!) of GPUs. I'm thinking more of the case of using 2-16 (maybe up to 64) GPUs to run hydrostatic and non-hydrostatic simulations on rectilinear and lat-lon grids.

Here's what I think I've figured out so far, but would love to hear any thoughts or experience from others.

For multi-GPU on a single node (usually up to 4-8 GPUs), it seems like you should use multi-regions to avoid touching MPI?
For multi-GPU on multiple nodes, is it possible to use multi-region on each node then use CUDA-aware MPI between nodes (one rank per node)?
For multiple cores on a single CPU on one node, I know we can either use multi-threading via KernelAbstractions.jl or use MPI. If I remember correctly MPI scales better here?
For multi-CPU on multiple nodes, has anyone tried doing this? Going full MPI (one rank per CPU core) seems like you'll end up with too many ranks and too much communication. So maybe the winning strategy is multi-threading with KernelAbstractions.jl on each node and MPI between nodes (one rank per node).

I don't think multi-CPU is really of much use or interest as you probably need multiple beefy CPUs to rival just one GPU (and deal with MPI), but I'm mentioning it out of personal curiosity I suppose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best strategies for multi-GPU and multi-CPU simulations? #3885

{{title}}

Replies: 0 comments

Select a reply

Best strategies for multi-GPU and multi-CPU simulations? #3885

ali-ramadhan Oct 30, 2024 Maintainer

Replies: 0 comments

ali-ramadhan
Oct 30, 2024
Maintainer