Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large spikes for throughput metrics at beginning of job #103

Open
moebiusband73 opened this issue Apr 12, 2023 · 1 comment
Open

Large spikes for throughput metrics at beginning of job #103

moebiusband73 opened this issue Apr 12, 2023 · 1 comment

Comments

@moebiusband73
Copy link
Member

This ticket was migrated from cc-backend: ClusterCockpit/cc-backend#86

There are spikes with impossible high values, especially for Likwid metrics, at the start of jobs.

Solution:
A quick fix would be to filter out values that are impossibly high, especially at the stat of Jobs.
On the long term the real cause for those spikes has to be investigated.

@TomTheBear
Copy link
Member

The issue is caused by permission changes of the likwid.lock file. Only the owner of the lock file is allowed to use the LIKWID library. In order to get these permission changes early, the LikwidCollector uses fsnotify. The fsnotify callback did not only fire when actual owner changes happen but anytime a chown/chmod is executed on the file, even without any changes.

The fundamental problem is in the LIKWID library and the LIKWID team is working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants