Support multiple jobs on the same node #41

stephenlienharrell · 2023-06-13T15:54:00Z

Currently we collect everything at a node-level. We need to examine what metrics can be split out (on a core or socket basis) and what is not able to be split out and if that is useful.

stephenlienharrell · 2023-06-20T15:57:41Z

for CPU
need core-affinity matched to job id

For Memory:
Need to find all memory usage from primary job starter programmatically. Find job starter, then get all child process memory: ps -o pid,ppid,pgid,comm,%cpu,%me

Snapshot this at the same time as the rest of the metrics - find out if there is a way to get the job id, then match jobid to specific processes on-node to get snapshot of memory usage.

Can we do this programmatically for any other statistics?

stephenlienharrell · 2023-06-21T18:11:55Z

regarding the approach above, need to make sure we can capture detached processes

sanga1999 · 2024-08-20T15:37:44Z

Duplicate of #46

sanga1999 marked this as a duplicate of #46 Aug 20, 2024

sanga1999 closed this as completed Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple jobs on the same node #41

Support multiple jobs on the same node #41

stephenlienharrell commented Jun 13, 2023

stephenlienharrell commented Jun 20, 2023

stephenlienharrell commented Jun 21, 2023

sanga1999 commented Aug 20, 2024

Support multiple jobs on the same node #41

Support multiple jobs on the same node #41

Comments

stephenlienharrell commented Jun 13, 2023

stephenlienharrell commented Jun 20, 2023

stephenlienharrell commented Jun 21, 2023

sanga1999 commented Aug 20, 2024