- 
                Notifications
    
You must be signed in to change notification settings  - Fork 929
 
WeeklyTelcon_20170627
        Geoffrey Paulsen edited this page Jan 9, 2018 
        ·
        1 revision
      
    - Dialup Info: (Do not post to public mailing list or public wiki)
 
- Geoff Paulsen
 - Jeff Squyres (Cisco)
 - Howard Pritchard
 - Josh Hursey
 - Todd Kordenbrock
 - David Bernholdt (ORNL)
 - Nathan Hjelm
 - Ralph
 - Brian Barrett (Amazon)
 - Artem
 
Review All Open Blockers
- Targeting next 2.0.x October.
 
- Targeting next 2.1.x mid-August.
 
Review Milestones v3.0
- No RC last week.
 - Going to merge in last couple of Ralph's PMIx changes.
 - Josh Hursey needs to review PR 3754.
 - Will create an Open MPI v3.0.0 RC1 today.
 - Focus of RC1 testing will be around Orte launching.
- Some orteds are still getting killed sometimes.
 - Some complaints in killing changes
 
 - Larger picture schedule for v3.0?
- Like to get feedback on RC1.
 - Haven't had a lot of testing on v3.0 branch now.
 
 - There are a bunch of MPI layer PRs  (some are review required)
- two PRs
 - ROMIO PR (requires REVIEW)
 - RDMA PR (requires REVIEW)
 - Any special features for NEWS? Only responses from Mellanox.
 
 - MTT Cisco turned off Leave Session Attached is busted.
 - IBM added some MPI dependencies in OPAL layer, but no CI caught it.
- autogen.pl -nompi and some other flag, would catch some abstraction layer violations like this.
 
 - Branch for next release will be End of Face to Face in July.
 - Expectations for Folks to test RC.
- Down the road we should make a release tarball each night, and have MTT test THAT nightly.
 - Very different in how they're built, until they call 'make dist'.
 
 
Review Master Pull Requests
- Some corruption in Cray PMIX component on Master, about a week ago.
 - Monitoring components - replaces ptraces stuff.  Some segv in this.
- Don't think they're supposed to be on by default. Possibly bug in GLUE.
 
 
Review Master MTT testing
- Mellanox was having some MTT testing issue, Artem will look at it.
- Mellanox might be seeing it because of deprecated build status stuff.
 
 - Some issues with tests running successfully, but then hangs at the end of output, and dies due to Timeout.
 - Right Now PRs, building exactly what the person PRs,
- But could build AFTER a merge of the PR and test THAT.
 - IBM has seen internally this method has caught a failure before it was merged to the branch.
 - Amazon likes this approach also.
 
 
- Intel is pushing content somewhat regularly, but unclear how much longer.
- Not seeing much benefit.
 
 - Howard - Trying to use it an trying to work on viewer.
 
- Face2Face Meeting-2017-07
- Date: July 11-13 (9am Tuesday - noon on Thursday.
 - Cisco has booked space in Chicago.
 - Jeff will see about setting up a Web-Ex for those who are interested.
- Please email him if you are interested in attending via Web-Ex.
 
 - No Fees at this face to face.
 
 - From mailing list (From SuSE) - Reproducability of the build.
- Whatever build you want to be able to binary compare to see if it's the same, but can't because of date.
 - Lots of pros / cons to having date in build.
 - Put it in ompi_info - build host, build date, Manpages (stamped at make dist).
 - maybe add some DATE env to force the date for post v3.0
 
 - dlopen LOCAL is painful - Issue 3705
- each mca library should be linked against libraries they have actual dependencies
 - We used to link the components against the libraries, but then we stopped.
- Jeff Recalls: But then we stopped because we'd link MPI components against both MPI and ORTE.
 - Jeff Recalls: But if you do an upgrade, then you're screwed...
 - Brian Recalls: OSX namespacing issue...
 - need to do some archeology
 - Ralph remembers there was SOME reason we don't do this linkage.
 
 - Not for v3.0 - on Face 2 Face discussion.
 - Maybe add a configure option to do this.
 
 - For v4.0 do we want to keep hwloc internal, or just use external?
- Compromise would be to change precedent to use external over internal for all of our libs?
- Then in a future release, remove internals (or some at least) completely?
 
 - RHEL5 doesn't have hwloc.
 - Fixed something that now allows Open MPI to use older hwloc 1.3, 1.4, 1.5 or something, but still not v1.0.
 
 - Compromise would be to change precedent to use external over internal for all of our libs?
 - What to do about libevent? - look at all of them at face to face.
 
- Mellanox, Sandia, Intel
 - LANL, Houston, IBM, Fujitsu
 - Amazon,
 - Cisco, ORNL, UTK, NVIDIA