Best strategies for multi-GPU and multi-CPU simulations? #3885
ali-ramadhan
started this conversation in
High performance computing
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Been digging into the distributed and multi-region modules (really awesome work!) and I'm curious about the best strategies for running large simulations on multiple CPUs and GPUs.
I know @simone-silvestri has run simulations on hundreds to thousands (!!) of GPUs. I'm thinking more of the case of using 2-16 (maybe up to 64) GPUs to run hydrostatic and non-hydrostatic simulations on rectilinear and lat-lon grids.
Here's what I think I've figured out so far, but would love to hear any thoughts or experience from others.
For multi-GPU on a single node (usually up to 4-8 GPUs), it seems like you should use multi-regions to avoid touching MPI?
For multi-GPU on multiple nodes, is it possible to use multi-region on each node then use CUDA-aware MPI between nodes (one rank per node)?
For multiple cores on a single CPU on one node, I know we can either use multi-threading via KernelAbstractions.jl or use MPI. If I remember correctly MPI scales better here?
For multi-CPU on multiple nodes, has anyone tried doing this? Going full MPI (one rank per CPU core) seems like you'll end up with too many ranks and too much communication. So maybe the winning strategy is multi-threading with KernelAbstractions.jl on each node and MPI between nodes (one rank per node).
I don't think multi-CPU is really of much use or interest as you probably need multiple beefy CPUs to rival just one GPU (and deal with MPI), but I'm mentioning it out of personal curiosity I suppose.
Beta Was this translation helpful? Give feedback.
All reactions