- 
                Notifications
    
You must be signed in to change notification settings  - Fork 929
 
WeeklyTelcon_20230509
        Geoffrey Paulsen edited this page Jul 25, 2023 
        ·
        2 revisions
      
    - Dialup Info: (Do not post to public mailing list or public wiki)
 
- Geoff Paulsen (IBM)
 - Howard Pritchard (LANL)
 - Edgar Gabriel (AMD)
 - Luke Robison (Amazon)
 - Joseph Schuchart
 - Thomas Huber
 - Thomas Naughton (ORNL)
 - Todd Kordenbrock (Sandia)
 - Tommy Janjusic (nVidia)
 
- 
https://github.com/open-mpi/ompi/pull/11649 - an OFI callback scoped incorrectly.
- This probably affects main, v5.0.x, and v4.x
 - could craft a testcase.  Is this only seen with MT app?  No, not only MT.
- reentrant for single thread.
 - Got more than one completion because overwrote an array.
 - Luke will search for test case
 
 - Blocker for v5.0.0
 
 
- No new updates
 
- MAC Params issues are biggest issues now
- https://github.com/openmpi/ompi/issues/11532
 - https://github.com/openpmix/prrte/issues/1731
 - Just want 2 of the 3 fixes for v5.0.0, 3rd issue can wait for 5.0.x
 - Quincy was going to take this over, but busy with other things.
 
 - Might be in PMIx base / framework
- https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20230425#pmix-mca-parameter-issues
 - Ralph volunteered to help, but might take a month
 - Luke will check with Ralph
 
 - Need to cherry-pick NIC selection to v5.0.x
- commit that went into main broke some AWS configurations
 - Caused some coverity issues, but fixed already PRed against main
 
 - 2 MTT issues
- UCX and DSO - may be a fix needed to be cherry-picked back to v5.0.x
- Issue 11632 - Fix provided in re-review
 
 
 - UCX and DSO - may be a fix needed to be cherry-picked back to v5.0.x
 - Good to retest ABI with v4.1.x before v5.0.0
- Geoff will do this or next week
 
 - SMCuda to disqualify itself if no Cuda HW available.
- Want this for v5.0
 - one rank or singleton closes itself early.
 - Edge case in SMCuda and attempts to clean up and tie into framework.
- When it gets unloaded, there are dangling pointers.
 - Fix - doesn't setup callback functions unless Cuda_Init succeeds.
 
 - Edgar's PR is still trying to compile Cuda collective always (PR 11617)
- Waiting for review
 - Summary, we want both
 
 
 - Doc work still remaining, will enumerate next week any remaining issues
- A fix 20 minutes ago, other than there's some pmix cross version 11658
 - These same doc fixes will trickle through pmix/prrte and
 
 - New Issue, nVidia's internal MTT found an async-modex
- global dstore has an issue.
 - If you set async-modex, or set dstore-hash.
 - Issue of scale... minimal required 4nodes x 4ppn.
 - UCX and ob1 both affected.
 - just Init+Finalize can trigger
 - v5.0 blocker.
 
 
- Behvaior of MPI_Comm_disconnect - a lot of discussion with George
 - MPI_Finalize - what happens to persistent communication handles that the user didn't explicitly free?
 - Option number 3 is under
 - C Const in headerfiles, but open mpi and mpich are both doing what's acceptable for ABI definitions, but not discussed this forum