Usability concerns and possible improvements for MultiArch
#3022
Replies: 12 comments
-
Can someone explain the purpose of |
Beta Was this translation helpful? Give feedback.
-
I guess changing to Do we need default ranks? We can always find the number of runs in any run, and why not just check? Sorry if I am missing something. To answer the above question, you can see here, but the short version is it initializes MPI in the current process. I guess this tells each process that it should get ready for the possibility of sharing information, otherwise, we probably get an error. It sounds cleaner to me to hide it in |
Beta Was this translation helpful? Give feedback.
-
Just to follow on from what @francispoulin wrote, we normally use MPI_Init() at the start and MPI_Finalize() at the end. All code that calls MPI needs to be between these statements. To your question about whether you can call MPI_Init() within MultiArch (or MultiArchitecture), I think the question is whether it would ever be useful to use an MPI routines before calling MultiArch. For example, you can't find out the local rank until you call MPI_Init(). Also, it is good practice to use both MPI_Init() and MPI_Finalize(), so if MPI_Init() is in MultiArch, it leaves the question of where MPI_Finalize() should be. This might be an argument for having them both outside of MultiArch where they are visible to the user. |
Beta Was this translation helpful? Give feedback.
-
I agree with @johnryantaylor , even though it might make for shorter code, there are good reasons for keeping |
Beta Was this translation helpful? Give feedback.
-
It does make sense, so we have to live with that necessary distinction for now. But lets continue to think about how to bring the syntax for distributed and single process scripts as close together as possible. |
Beta Was this translation helpful? Give feedback.
-
@simone-silvestri suggests removing I'm not 100% sure how to solve it though. One possibility is to 1) rebuild |
Beta Was this translation helpful? Give feedback.
-
After some discussions I also think "Distributed" is a confusing name for this module. The module is really specific to "multi-process" execution with MPI, rather than distributed-memory execution (though it supports distributed memory setups). I think drawing a closer connection to MPI might help, something like:
Or, if we want to be even more explicit, |
Beta Was this translation helpful? Give feedback.
-
(1) should be already implemented (rebuilding arch in grid constructor) I.e., grid.architecture is necessary equal to the input architecture. I think the way to go is specify a partitioning object. That will define how you split the grid. The connectivity itself (which core you communicate to) should go in the BC |
Beta Was this translation helpful? Give feedback.
-
I guess I think it makes sense to delay building One approach could produce code like arch = MPIArchitecture() # note no arguments
grid = RectilinearGrid(arch, grid_kw..., partition=XPartition()) internally, the constructors maybe take the form function RectilinearGrid(arch, size, other_grid_kw, architecture_kwargs...)
arch = rebuild_arch(arch; architecture_kwargs...)
# etc
end That's possibility (1). Possibility (2) is to use a wrapper for MPI jobs, perhaps completely eliminating arch = CPU()
grid = RectilinearGrid(arch, kw...)
grid = MultiProcessGrid(grid, partition=XPartition()) Possibilty (3) is to change all the The downside to (2) is that it introduces yet another level of indirection, on top of both |
Beta Was this translation helpful? Give feedback.
-
I am in favor of option (3). It will allow us embed a little more multi process in the code.
then connectivity will be explicited when you build boundary conditions |
Beta Was this translation helpful? Give feedback.
-
Can you explain this point? To me it looks like the main difference / improvement of (3) is the user interface rather than functionality, because every grid has an The primary tradeoff against the user interface change is an increase in code complexity. New developers have to wrestle with and users have to puzzle over But maybe I am missing something? |
Beta Was this translation helpful? Give feedback.
-
I'll convert this to a discussion since there are no immediate action items. |
Beta Was this translation helpful? Give feedback.
-
First of all...
MultiArch
toMultiArchitecture
? It's not a big deal, butMultiArch
does seem to violate our "verbosity" rules.Some observations and questions:
ranks
? Mayberanks = (1, MPI.Comm_size(MPI.COMM_WORLD), 1)
or something? Or is this a bad idea? I forsee boilerplate for scripts that are intended to automatically scale...MultiArch
andRectilinearGrid
; the grid constructor forMultiArch
can grab topology fromarch
.MultiArch
should take a similar approach as the grid constructor; ie we can't distribute a grid inFlat
directions, soranks
should be a 2-tuple for 2D topology, 1-element for 1D topology.arch
(and alsogrid
,model
,sim
):communicator(grid)
returns egMPI.COMM_WORLD
local_rank(grid)
returns egMPI.Comm_rank(communicator(grid))
(or maybe justrank(grid)
, or something)MPI.jl
) but it does seem like it'd be nice to have a macro that a) prints normally for non-mpi and b) prints just from rank 0 if using MPI. That way we only have to changearch
when switching from single-process runs to distributed runs.Base.summary(::MultiArch)
so logs don't get mutilatedmean
over all dims reduce everything, or should we do a local reduction (as we do now?) Perhaps we want a specialmean
forDistributedField
, andmean(interior(field))
can still be used for local reductions? We already have to redefine any reductions onField
, so it makes sense that we further extend forField
onMultiArch
ReducedField
across the partition is also hard. We can reduce only locally, which could be fine for output in some cases. But ifReducedField
are used inAbstractOperation
we clearly need to gather and scatter for that to work. Probably the right thing is to implement gather and scatter by default, and then to add features for "localReducedField
that could maybe be used to optimize I/O performance for the biggest problems.Note: a macro / logger manipulation that avoids "extra" logging for distributed simulations might actually be essential because Oceananigans is pretty chatty:
That's for 2 ranks. Imagine we scale to thousands of ranks...
Beta Was this translation helpful? Give feedback.
All reactions