You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we collect everything at a node-level. We need to examine what metrics can be split out (on a core or socket basis) and what is not able to be split out and if that is useful.
The text was updated successfully, but these errors were encountered:
For Memory:
Need to find all memory usage from primary job starter programmatically. Find job starter, then get all child process memory: ps -o pid,ppid,pgid,comm,%cpu,%me
Snapshot this at the same time as the rest of the metrics - find out if there is a way to get the job id, then match jobid to specific processes on-node to get snapshot of memory usage.
Can we do this programmatically for any other statistics?
Currently we collect everything at a node-level. We need to examine what metrics can be split out (on a core or socket basis) and what is not able to be split out and if that is useful.
The text was updated successfully, but these errors were encountered: