You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the project winding down, it is time to define a stable landing point where we can leave it for those wanting to use it. This means:
removing all stale code, particularly components that aren't actively used
collapsing frameworks into single code directories where multiple variations are not required (e.g., rtc)
reducing complexity wherever possible
We'll keep a checklist here as we work thru the process - will culminate in a new PRRTE v4 release series
Code pruning and correction
Remove "likwid" mapper - never implemented
Remove "slurm" and "mpich" personalities - never fully implemented nor used
Collapse "rtc" framework
Collapse "oob" framework - consolidate the messaging system and refactor it
Remove "psched" tool - being replaced by external "dynasched" Python project
Revamp tool system - replace individual tools (e.g., "pterm") with options to "prte" itself to remove conflicts with other packages, need to design this as we must retain "prterun" and "prun" as separate cmds
Resolve "permanent" solution to the Slurm plm problem - use new launcher lib if it becomes available, otherwise may need to remove envar support for the internal "srun" cmd line options (see also: Slurm integration #1974)
Enhancements
Add PRRTE-internal resiliency support - recover connections to grandparents when parent connection is lost, restore parent connection if/when parent returns, number collective messages to ensure replay when necessary
Scheduler integration
Resolve question of moving scheduler integration support into separate branch
Complete node extension support for adding nodes on-the-fly
Complete session directive support - e.g., session/job preemption
The text was updated successfully, but these errors were encountered:
* [ ] Resolve "permanent" solution to the Slurm plm problem - use new launcher lib _if_ it becomes available, otherwise may need to remove envar support for the internal "srun" cmd line options
Quick follow-up after 3oct2024 teleconf, I was mistaken and the SLURM_VERSION is not exported as an envvar within the allocation. Appears you must go through one of the utilities (e.g., srun --version, scontrol show config | grep SLURM_VERSION).
If you just get an allocation (salloc and no srun) is there anything you can see that might give us a hint as to version, even if it doesn't give us a direct value?
With the project winding down, it is time to define a stable landing point where we can leave it for those wanting to use it. This means:
We'll keep a checklist here as we work thru the process - will culminate in a new PRRTE v4 release series
Code pruning and correction
Enhancements
Scheduler integration
The text was updated successfully, but these errors were encountered: