Large spikes for throughput metrics at beginning of job #103

moebiusband73 · 2023-04-12T06:11:42Z

This ticket was migrated from cc-backend: ClusterCockpit/cc-backend#86

There are spikes with impossible high values, especially for Likwid metrics, at the start of jobs.

Solution:
A quick fix would be to filter out values that are impossibly high, especially at the stat of Jobs.
On the long term the real cause for those spikes has to be investigated.

TomTheBear · 2023-05-01T10:53:34Z

The issue is caused by permission changes of the likwid.lock file. Only the owner of the lock file is allowed to use the LIKWID library. In order to get these permission changes early, the LikwidCollector uses fsnotify. The fsnotify callback did not only fire when actual owner changes happen but anytime a chown/chmod is executed on the file, even without any changes.

The fundamental problem is in the LIKWID library and the LIKWID team is working on it.

moebiusband73 mentioned this issue Apr 12, 2023

Large spikes for throughput metrics at beginning of job ClusterCockpit/cc-backend#86

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large spikes for throughput metrics at beginning of job #103

Large spikes for throughput metrics at beginning of job #103

moebiusband73 commented Apr 12, 2023

TomTheBear commented May 1, 2023

Large spikes for throughput metrics at beginning of job #103

Large spikes for throughput metrics at beginning of job #103

Comments

moebiusband73 commented Apr 12, 2023

TomTheBear commented May 1, 2023