- 
                Notifications
    
You must be signed in to change notification settings  - Fork 929
 
Meeting 2018 09
        Tomislav Janjusic edited this page Jan 6, 2024 
        ·
        1 revision
      
    (yes, the "2018 09" title on this wiki page is a bit of a lie -- cope)
- 9am, Tuesday, Oct 16, 2018 - noonish Thursday, Oct 18, 2018.
 - Cisco buildings 2 and 3 (right next to each other), San Jose, CA, USA.
- Tuesday: Cisco building 2 (Google Maps link)
- NOTE 1: The Tuesday meeting is immediately after the weekly Webex. You're welcome to show up after 7:30am US Pacific to be in the Cisco conference room for the weekly Webex.
 - NOTE 2: There is no Lobby Ambassador in Cisco building 2. You need to iMessage or SMS text Jeff Squyres, and I'll come escort you to the meeting room.
 
 - Wednesday, Thursday: Cisco building 3 (Google Maps link)
- There is a Lobby Ambassador in Cisco building 3; they will alert me when you arrive (but iMessaging or SMS texting me wouldn't hurt, either).
 
 
 - Tuesday: Cisco building 2 (Google Maps link)
 
Please put your name on this list if you plan to attend any/all of the meeting so that Jeff can register you for a Cisco guest badge+wifi.
- Jeff Squyres, Cisco
 - George Bosilca, UTK
 - Arm Thananon Patinyasakdikul, UTK
 - Andrew Friedley, Intel
 - Geoff Paulsen, IBM
 - Howard Pritchard, LANL
 - Edgar Gabriel, UH
 - Shinji Sumimoto, Fujitsu
 - Thomas Naughton, ORNL
 - Matias Cabral, Intel (16th only)
 - Neil Spruit , Intel (16th only)
 - Brian Barrett, Amazon (16th only)
 - Akshay Venkatesh, NVIDIA
 - Xin Zhao, Mellanox
 - Artem Polyakov, Mellanox
 
- Nathan/Brian: Vader bug cleanups
- Want to strengthen the recent vader fixes to be fully bulletproof
 
 
- OFI (Libfabric):
- OFI Presentation
 - Scalable endpoints support in MTL and BTL
 - Registering specialized communication functions based on provider capabilities
- m4 generated C code to avoid code duplication?
 
 - Discussion: OFI components to set their priority based on the provider found.
 - OFI Common module creation.
 
 - PR 5241: Add MCA param for multithread opal_progress() (George, Arm)
 - Multithreading stuff (George, Arm)
 - r2 / BTLs are initialized even when they are not used (Jeff)
 - 4.0.x status / roadmap
 - TCP bric-a-brac:
- Discuss TCP multilink support. What is possible, what we want to do and how can we do it.
 - Discuss TCP multiple IP on the same interface. What we want to do, and how we plan to do it.
 - TCP BTL progress thread. Does 1 IP interface vs. >1 IP interface matter?
 
 - Should we limit the number of C compilers that can be used to compile the OMPI core (e.g., limit the amount of assembly/atomic stuff we need to support).
- E.g., PGI doesn't give us the guarantees we need
 - Probably need to add some extra wrapper glue: e.g., compile OMPI core with C compiler X and use C compiler Y in 
mpicc.- Are there any implications for Fortran? Probably not, but Jeff worries that there may be some assumption(s) about LDFLAGS/LIBS (and/or other things?) such that: "if it works for the C compiler, it works for the Fortran compiler".
 
 
 - PMIx as "first class" citizen?
- Shall we remove the OPAL pmix framework and directly call PMIx functions?
- Require all with non-PMIx environments to provide a plugin that implements PMIx functions with their non-PMIx library
 - In other words, invert the current approach that abstracted all PMIx-related interfaces
 
 
 - Shall we remove the OPAL pmix framework and directly call PMIx functions?
 - ORTE support model
- See https://docs.google.com/document/d/1VwqUVAhkeJt7PmaQCBQpUXBYdy9Elg5m-TBzY-6lLQY/edit
 - Should we remove ORTE from the Open MPI repo and make it a separate project?
 - How do we resolve the "one package" philosophy we have embraced from day one?
 
 - Should we publish Open MPI release tarballs to https://github.com/open-mpi/ompi/releases?
- Per https://github.com/open-mpi/ompi/issues/5604, I posted a bunch of "Official releases aren't here..." on the github releases page.
 
 - Remove orte-dvm and redirect users to PRRTE? (Ralph)
 - public ompi-tests repository for easier sharing of testsuites among collaborators (Edgar)
 - Ralph+Jeff: discuss PMIx compatibility issues and how to communicate them
- https://pmix.org/support/faq/how-does-pmix-work-with-containers/ addresses (some of) PMIx-to-PMIx compatibility issues.
 - But what about OMPI to PMIx compatibility?
 - And what about RM to PMIx compatibility?
 - How do we convey what this multi-dimensional variable space means to users in terms of delivered MPI features?
 - Case in point: https://github.com/open-mpi/ompi/issues/5260#issuecomment-421407400 (OMPI v3.0.x used with external PMIx 1.2.5, which resulted in some OMPI features not working).
 
 - Discuss memory utilization/scalability (ThomasN)
 - Mellanox/Xin: Performance optimization on OMPI/OSC/UCX multithreading
 - Fujitsu's status
- Fujitsu MPI for Post-K Computer
 - Development Status in Fujitsu Going ARM
 - QA Activity in Fujitsu
 
 - 5.0.x roadmap
 - PMIx Roadmap
- Review of v3 and v4 features
 - Outline changes in OMPI required to support them
 - Outline changes for minimizing footprint (modex pointers instead of copies)
 - Decide which features OMPI wants to use
 
 - Mail service (mailman, etc.) discussion - here are the lists we could consolidate down to:
- OMPI core
 - OMPI devel/packagers
 - OMPI users/announce
 - OMPI commits
 - HWLOC commits
 - HWLOC devel/users/announce
 - MTT users/devel
 - Do we want to move to a commercial hosting site? Would cost about $21/month, or about $250/year
 - Additionally: The Mail Archive is going through some changes. Unclear yet as to whether this will impact us or not
 - As of 4 Sep 2018, Open MPI has $575 in our account at SPI.
 
 - Debugger transition from MPIR to PMIx
- How to orchestrate it?
 
 - ABI-changing commit on master (after v4.0.x branch) which will affect future v5.0.x branch: https://github.com/open-mpi/ompi/commit/11ab621555876e3f116d65f954a6fe184ff9d522.
- Do we want to keep it? It's a minor update / could easily be deferred.
 
 - Need vendors to reply to their issues on the github issue tracker
 - openib: persistent error reported by multiple users
- https://github.com/open-mpi/ompi/issues/5914
 - Feels like this should be easy to fix...?