-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow node performance due to audit configuration #3669
Comments
Additionally we are using
We need to migrate those using:
Solution for Docker rules, increasing the perfomance
|
Another idea would be changing the
Currently |
Auditd seems to integrated by @giantswarm/team-atlas, but not sure what the reason behind was: @QuentinBisson do you remember why? |
After talking with Quentin as a quick solution for Vintage cluster which are affected: Applying k8s-initiator-app on the nodepools where jenkins is running Removing and reloading the rules without those
For CAPI we would need to integrate a toggle which enables auditd when needed but should be disabled by default, PR to disable it: giantswarm/cluster#325 |
For Vintage we exhausted our options getting around a new release, so we need a new v20 release. Prepare a new
|
@T-Kukawka Could Phoenix start working on it next week please? I'm off the next days otherwise I would jumped in |
For tracking: Adidas issue |
@T-Kukawka it looks like we don't need to it. It was a final test and it seems we can get around doing a new vintage release: https://gigantic.slack.com/archives/C062HB29BDG/p1726086709037459?thread_ts=1725542379.870169&cid=C062HB29BDG Daniel figured out setting |
Everything should be covered. Marco created a new release for CAPA ❤ |
Slack Thread: https://gigantic.slack.com/archives/C6L8J93N0/p1724419948903269
TL;DR
Slowed Jenkins operations take longer, generating even more audit events over time.
Context
After upgrading release v19.3.0 to v20.1.2
to
Customer noticed a heavy impact on node pools where Jenkins Agents are running. Those nodes were becoming ultra slow. We were able to identify that writing audit messages is the bottleneck:
We were identifying the audit rules to track what the system is doing:
This rule audits all program executions (via the execve system call) on 64-bit systems. It’s a broad rule that captures when any program is run.
Similar to the previous rule, but for 32-bit systems. It also audits all program executions.
When running Jenkins it happens that nodes becoming unresponsive for seconds
Example:
When flushing all audit rules the node becomes instantly responsive again:
We're still not sure why this happens now it might be that Jenkins is executing now more commands or auditd has been changed since the last release.
The text was updated successfully, but these errors were encountered: