Skip to content

Threads WG Meeting 06 26 2018

Manjunath Gorentla Venkata edited this page Oct 2, 2018 · 1 revision

(Thanks Swaroop for the notes)



  • Issue 223

    • What is the OSH safe way to create child processes ?
      • Users used vfork - create child processes and parent process is suspended.
        • Child has lower memory overhead
        • same shared memory
        • POSIX-2001 deprecated it - dangerous if _exit is not called
        • posix_spawn() - works with all osh implementations
    • Should OSH put constraints such that the implementation should support forking like posix_spawn()?
      • ORNL - Leave it undefined: as multithreaded + forking can get complicated
        • What would we need to standardize this behavior ?
      • Intel : Does the fork call OSH ?
        • Children are not expected to shmem_init().
      • Cray: If it does not work, something is wrong with the implementation.
        • Cray already supports it.
      • Intel: Concerns regarding its interaction with other components of the software stack.
      • Intel: This should work on all commodity distributions.
        • Are we trying to specify the semantics wrt symmetric variables (updates by child process) ?
      • ORNL: Does OMPI-OSH support it ?
        • Segfaults.
      • DoD: Usecase: Children copy symmetric memory (maybe at a checkpoint) but not use it as symmetric memory.
      • ORNL: Need to understand the implications in greater detail.
    • Action Items:
      • Nick: Create ticket, supply test code for everyone to try.
      • Manju: Find OpenMPI's support for fork.
  • PR 103

    • Discussing the changes to PR from last meeting.
      • Text changes - more to come with chapter edits
      • Language clarification in shmem_wait_nbe
    • Rename API - shmem_wait_nbe and shmem_test_nbe
      • since wait is actually blocking
    • Merged handles with multiple requests have been re-moved from this PR
    • Questions:
      • DoD:
        • No ordering guarantee between same or different merged req guarantee ?
          • Both
        • The relationship between memory allocation behind the scenes and state of the request handle is not clear.
        • what does the data structure look like ?
          • Opaque to user
          • State should be query-able
            • ORNL: We have the distinction internally.
        • Does wait uninitialize ?
          • Yes, if the associated operations are completed.
        • API is not clear about handle allocation
          • There is both implicit and and explicit support in OSH-X implementation
            • Explicit was removed with the merged handle semantics.
            • Exposing state makes sense for explicit.
        • How much space is required to track a request ?
          • small - Don't require allocation request
          • significant - Have allocation call
          • A: 2 words — Allocation request not required.
        • Context variants of non-blocking calls ?
          • Not at this time.
            • Con: There will be a ctx and req object.
      • Intel:
        • Return value shmem_request_allocate passed by ref or value ?
          • By Ref (for allocate and put)
        • Put routine could be allocating a request ?
          • Yes
          • This is confusing.
            • Why reuse handles? What is the stale value that wait re-sets?
              • Allocate provides hints to the runtime
        • Does quiet affect the completion ?
          • Wait gives remote completion.
          • Why do it this way? - Usually wait gives a local completion.
        • Throughput vs. tracking
          • Cray has a way to chain nbe operations.
        • Add use cases to the proposal.
    • Action Items:
      • Swen: Make requirements and states more explicit.
      • Nick and Jim: Comments on GitHub
Clone this wiki locally