Can we make Oceananigans + MPI less painful? #2345
Replies: 6 comments 2 replies
-
A note on
I can make it work on |
Beta Was this translation helpful? Give feedback.
-
Thanks for opening the discussion Greg! A few thoughts:
My 2 cents are that it would be great to have MPI parallelism 'operational'. I'm not trying to say that anyone should make this a priority of what they are already working on since it looks like there is a lot of great work going on and many directions to go. Perhaps we can help contribute to the distributed code (although it will take us some time to learn the code before I/we can contribute in a meaningful way). |
Beta Was this translation helpful? Give feedback.
-
I'm still getting an error when I run the distributed nonhydrostatic benchmark and test scripts (but not the distributed shallow water ones). I'll create an issue and post the output there so that we can keep this discussion general |
Beta Was this translation helpful? Give feedback.
-
Agreed! I don't think all that much more work is needed for a reasonable CPU parallelism either (mostly because of the awesomeness of Worth noting too that the work we're doing to implement a buffered communication abstraction for #2253 and "fusing" the halo-filling kernels / fusing communication via #2335 might also make distributed models more performant (we'll see). |
Beta Was this translation helpful? Give feedback.
-
@johnryantaylor as @francispoulin / @glwagner note, there is some good scalability hiding in there! We don't have a lot of good regressions tests to check for things that might interfere, or examples to follow. If you are happy to share your views on a couple of possible useful setups - we could use something based on those as good reference things, that we could maintain tests for and end to end MPI setup examples against. |
Beta Was this translation helpful? Give feedback.
-
Hi @johnryantaylor @francispoulin @glwagner. I share @johnryantaylor's comments on having access to a large number of CPUs and being generally memory-limited on GPU for large-scale problems. I (naively) started running simulations using up to 12 threads on a local machine after seeing the weak scaling results that @francispoulin mentioned above. This week I have been interested in a setup that requires more memory than can fit on a Tesla V100, so I did some tests running on the Stampede2 supercomputer, on the ICX nodes (Icy Lake nodes with 2*40 cores and 160 hardware threads total). If this works reasonably well, would it be interesting to get some basic weak/strong scaling tests on a "real life" application (including outputting large 3D data sets at regular intervals, for instance)? |
Beta Was this translation helpful? Give feedback.
-
I'm hoping we can have a general conversation about using Oceananigans and Julia with MPI and maybe glean some tips and tricks from people. Ease of use issues span from
MPI.jl
on various platforms (often not trivial!)tmpi
help? To use this we have to setup bothtmpi
andMPI.jl
with the same MPI implementation)Many of the developers use GPUs for research, which has allowed us to put this issue off for a bit. But distributing across CPUs is an important use case too (note that distributing across GPUs is not performant right now because
Distributed
does not use buffered communication needed for performant CUDA-aware communication between GPU devices, but a solution to that is in progress at #2253).Beta Was this translation helpful? Give feedback.
All reactions