Skip to content

MPI3 RMA Goals and Constraints

William Gropp edited this page Dec 3, 2015 · 2 revisions

MPI3 RMA Goals and Constraints

There are many competing goals and constraints for an MPI RMA model. This page is intended to capture, in a concise fashion, those issues, including references where appropriate. One problem in discussions to date is that different uses may have different, incompatible requirements. Note that MPI may provide ways to optimize for certain choices, or may decide not to support certain programming models.

Support for BSP-style RMA

The Bulk Synchronous Programming (Refs. 1 and 2) style of one-sided operations provides a model with an active target synchronization (fence-like).

Support for PGAS Langauges

See the paper by Bonochea et al. (Ref. 3) Issues include:

Certain operations should be undefined rather than erroneous 

Support for ARMCI and/or Global Arrays

  1. MPI-3 RMA as a drop-in replacement for ARMCI is favored by many Global Arrays developers.
  2. When a feature not currently available in ARMCI would enable better a implementation of certain Global Arrays operations, this feature should be added to MPI-3 RMA. For example, GA_Sync ( http://www.emsl.pnl.gov/docs/global/c_nga_ops.html#ga_sync) is a collective flush operation plus a barrier, hence providing a collective flush operation in MPI-3 RMA is likely to ease the implementation burden and improve performance of Global Arrays on some networks. 
    
  3. Although a common use pattern of Global Arrays resembles BSP, GA requires local completion waiting and probing for individual non-blocking operations. ( http://www.emsl.pnl.gov/docs/global/c_nga_ops.html#ga_nbwait) 
    
  4. Global Arrays blocking operations are ordered with respect to destination whereas non-blocking operations are not. ( http://www.emsl.pnl.gov/docs/global/um/one-side.html) 
    

Forward-Looking Hardware Issues

Support for both Cache-Coherent and Non-Cache-Coherent Hardware

Support for Heterogeneous Data Representations

MPI supports communication among processes with different data representations and this must be possible with MPI RMA. However, optimizations for systems with the same data representation may be considered (either as an optimization in the implementation or through additional user-visible MPI routines that are erroneous to use in the heterogeneous case).

Bibliography

[1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel

[2] Valiant, L. G. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (Aug. 1990), 103-111. ( http://doi.acm.org/10.1145/79173.79181)

[3] D. Bonachea and J. Duell, "Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations." 2nd Workshop on Hardware/Software Support for High Performance Scientific and Engineering Computing, SHPSEC-PACT03 , 2003. ( http://www.cs.berkeley.edu/~bonachea/upc/bonachea-duell-mpi.pdf) Journal version: Int. J. High Performance Computing and Networking, Vol. 1, Nos. 1/2/3, pp.91-99, 2004. ( http://www.cs.berkeley.edu/~bonachea/upc/IJHPCN_Bonachea_duell_MPI.pdf)