Skip to content

PRRTE Process Naming Convention

Ralph Castain edited this page Jan 15, 2021 · 1 revision

PRRTE uses the PMIx naming convention for processes - i.e., the pmix_proc_t structure containing a character array nspace plus a uint32_t rank. Assignment of nspace/rank values for each process remains the responsibility of the PRRTE infrastructure.

There are three avenues for nspace assignment:

  1. The Distributed Virtual Machine (DVM) master must define the nspace for itself and all of its daemons. Unless specified by the host environment, the DVM master defines its base nspace as "<cmd-basename>-<hostname>-<pid>". For example, this could result in a base nspace of "prte-myhost01-12345". The complete DVM nspace is formed by taking the base nspace and adding a ":0". The zero at the end indicates that this is the initial job within the DVM. All daemons started by a given master will share the same DVM nspace. The master is automatically assigned rank=0.

  2. The DVM master also defines the nspace for each job upon submission for execution. The nspace is created by taking the DVM's base nspace and adding a trailing number that increments with each successive job. Thus, the first job launched by the DVM will be assigned an nspace of "<DVM-base-nspace>:1", while the next job after that will be "<DVM-base-nspace>:2". The job number is defined as a uint32_t. Once the maximum value is reached, the DVM master will cycle thru the valid job numbers (starting at 1) until it finds the first one that is no longer in use - this value will be assigned to the new job. If all UINT32_MAX jobs are in use, then PRRTE will return an error.

  3. Tools have two options. They can ask that the daemon they connect to provide them with an nspace/rank, or they can self-assign an nspace/rank and tell the daemon to which they connect what it is. The second method is required as a tool may choose to not immediately connect to a server, or it may want to connect to multiple servers representing different DVMs. Thus, PRRTE has to support both methods when tracking tool connections.

Clone this wiki locally