- 
                Notifications
    
You must be signed in to change notification settings  - Fork 929
 
WeeklyTelcon_20170410
        Geoffrey Paulsen edited this page Jan 9, 2018 
        ·
        1 revision
      
    - Dialup Info: (Do not post to public mailing list or public wiki)
 
- Geoff Paulsen
 - Jeff Squyres
 - Artem Polyakov
 - Brian Barrett
 - David Bernholdt
 - Geoffroy Vallee
 - Howard
 - josh Hursey
 - Joshua Ladd
 - Ralph
 - Thomas Naughton
 - Todd Kordenbrock
 
Review All Open Blockers
Review Milestones v2.1.0
- 
https://github.com/open-mpi/ompi/issues/3267 - a v2.1.1 based blocker
- Jeff seems to remember some persistent one sided failure.
 - Looks like issue still opened but PRs PULLed in?
 - Cisco can turn on MTT for master.
 
 - 
https://github.com/open-mpi/ompi/issues/3268
- Artem still sees this, but hasn't seen it since Nathan's merge.
 
 - Segfault when trying to launch under a debugger specific to v2.1.1
- Ralph created a PR with a fix, that should go into a v2.x release.
 
 
Review Milestones v3.0
- Load Leveler support was removed, but code remains. IBM approves removal on master.
 - v3.0 Support items:
- 64bit
 - MacOSX10.12
 - FreeBSD
 - Cisco MTT is going -m32 builds.
 
 
Review Master Pull Requests
Review Master MTT testing
- GIT PR - Why do merge, and not rebase and merge?
- Shows empty (or sometimes non-empty) merge commits.
 - Idea that we merge exactly what the CI tested.
 - Can be very hard to line up PRs.
 - Good to periodically audit what we're doing, and discuss.
 - the Merge-commit is not signed off (and gets flagged a bunch in CI).
 
 - 
https://github.com/open-mpi/ompi/pull/3288
- Ralph noticed that there was a bunch of OMPI_ env vars that were being propagated, but shouldn't be.
 - ALL OMPI_* was being propagated, but we really should be propagating OMPI_MCA_*.
- We do set some OMPI_UNIVERSE_SIZE type env vars.
 - Surprised. It was forwarding env vars that it shouldn't have been.
 - Document that users should stop doing this.
 
 - We'll continue to discuss next week.
 - There are times when you need to capture something prior to calling OPAL_Init, so influencing STDOUT.
- These can't be MCA params, because that won't be open yet.
 
 
 - Ralph has an issue when using -btl sm.
- Could put an abort when can't find an endpoint. But this in BML R2. Error message coming from there.
 - Portion of code in end_procs - abort will give a stack trace, and can figure out there.
 - this communication is removing advantage of not-doing full modex.  But then doing on-demand modex because they're trying to see who they can talk to.
- Shouldn't be happing, Ralph will look into R2, and try to figure out who's communicating and why.
 
 - Ralph will give a presentation next time. Looks really good, minus a Kernel issue with KNL.
 
 - FYI - You will see lots of Jenkins jobs, that's Brian's adding stuff. jenkins.open-mpi.org - will see lots of builder things. Amazon fiddling with Jenkin's settings.
 
- Cisco, ORNL, UTK, NVIDIA
 - Mellanox, Sandia, Intel
 - LANL, Houston, IBM, Fujitsu