Handoff data to other distributed services #223

mrocklin · 2017-03-10T15:00:54Z

My end goal is to share data between distributed Dask.array and Elemental. For most of this question though Dask-specifics shouldn't be important. Instead, consider the case where I have several Python processes running in an MPI world and that each Python process has a few numpy arrays which, when arranged together, form the chunks of a distributed array. To be concrete, here is a case with a world of size two:

Process 1

>>> MPI.COMM_WORLD.Get_rank()
0
>>> arrays
{(0, 0): np.array(...),
 (0, 1): np.array(...),
 (1, 1): np.array(...)}

>>> elemental_array = give_arrays_to_elemental(arrays)

Process 2

>>> MPI.COMM_WORLD.Get_rank()
1
>>> arrays
{(1, 0): np.array(...)}

>>> give_arrays_to_elemental(arrays)

I also know the shape of the array, the datatypes, etc. My chunks aren't necessarily uniformly arranged. In this case rank 0 has three chunks while rank 1 has one. I would be willing to rearrange chunks arbitrarily if necessary, but would prefer to avoid the communication if possible.

I would like to take all of these numpy arrays hand them to Elemental within each process, and then do more computationally intense operations using Elemental's algorithms. Afterwards, I would like to reverse the process and get back a bunch of NumPy arrays in all of my Python processes:

Process 1

>>> do stuff with elemental_array
>>> arrays = get_local_chunks_from_elemental(...)

Process 2

>>> participate in MPI/elemental computations
>>> arrays = get_local_chunks_from_elemental(...)

Is this feasible? If so then what is the best way to go about it?

poulson · 2017-03-10T15:39:51Z

There is a generic (but usually not the fastest) El::DistMatrix<T>::QueueUpdate( El::Int row, El::Int column, const T& value ) interface that can queue an update to an arbitrary entry of a distributed matrix from any process. The setup routine is El::DistMatrix<T>::Reserve( El::Int numUpdates ) and the collective finalization routine is El::DistMatrix<T>::ProcessQueues().

I would recommend starting with these interfaces for loading data into Elemental, and there are equivalent "pull" analogues to the above "puts": El::DistMatrix<T>::ReservePulls, El::DistMatrix<T>::QueuePull( El::Int row, El::Int column ), and El::DistMatrix<T>::ProcessPullQueue( T* pullBuf ).

Please see

Elemental/include/El/core/DistMatrix/Abstract.hpp

Line 226 in 73c658c

// Batch updating of remote entries

for the prototypes.

Once this is working, we could look into ways of avoiding all of the unnecessary row and column metadata sent by said routines.

poulson · 2017-03-10T15:42:09Z

And, as luck would have it, I already exposed a Python interface to the above:

Elemental/python/core/DistMatrix.py

Line 1675 in 73c658c

lib.ElDistMatrixQueueUpdate_i.argtypes = [c_void_p,iType,iType,iType]

mrocklin · 2017-03-10T15:43:16Z

That is indeed fortunate. I suspect we might end up being bound by python for loops, but I agree that this is probably enough to demonstrate feasibility and that that should come before optimization.

poulson · 2017-03-10T15:55:16Z

I haven't yet added C/Python interfaces to El::DoubleDouble, El::QuadDouble, El::BigFloat, or their complex variants, but I am assuming single and double-precision are okay for you for now.

mrocklin mentioned this issue Jun 29, 2017

Write blogpost showing how to interface with an MPI application dask/distributed#1220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handoff data to other distributed services #223

Handoff data to other distributed services #223

mrocklin commented Mar 10, 2017 •

edited

Loading

poulson commented Mar 10, 2017

poulson commented Mar 10, 2017

mrocklin commented Mar 10, 2017

poulson commented Mar 10, 2017

Handoff data to other distributed services #223

Handoff data to other distributed services #223

Comments

mrocklin commented Mar 10, 2017 • edited Loading

Process 1

Process 2

Process 1

Process 2

poulson commented Mar 10, 2017

poulson commented Mar 10, 2017

mrocklin commented Mar 10, 2017

poulson commented Mar 10, 2017

mrocklin commented Mar 10, 2017 •

edited

Loading