-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[utilities,plugins] speedup journal collection (v2) #3879
base: main
Are you sure you want to change the base?
Conversation
Congratulations! One of the builds has completed. 🍾 You can install the built RPMs by following these steps:
Please note that the RPMs should be used only in a testing environment. |
cc1c0a8
to
b4534fd
Compare
I can confirm VERY evident speedup on a tested RHEL8 system with:
where current upstream executes Some test is failing - can you @champtar review it or do you need some help? I might get to the failure later today /tomorrow. |
I'm off until the 27th so no rush for the review, and I was thinking about even implementing 'head' in Python |
needs to be updated to reflect changed command exec The
And With this PR, I see:
That suggests the |
Hi @arif-ali and @TurboTurtle , is this approach worth to try? My answer is "carefully yes". Hi @champtar , do you need some help with updating the tests? |
b4534fd
to
63c2a52
Compare
b23de87
to
04f3361
Compare
tac_logs() reverses the order of the logs in a text file but keeps multiline logs in order. It is aimed at reversing 'journalctl --reverse'. This avoids depending on GNU coreutils version of tac (busybox tac is missing some flags) and is way faster. Testing with 100MB of logs: - tac -brs '^[^ ]' (handle multiline): ~30s - tac_logs (our python implementation): ~0.7s - tac (not handling multiline logs): ~0.3s If we are concerned about disk usage, we could call mmap.resize() to truncate the input file while processing. Signed-off-by: Etienne Champetier <[email protected]>
We are going to introduce HeadReader. Signed-off-by: Etienne Champetier <[email protected]>
Will be used to keep the first sizelimit MB from a program stdout. Signed-off-by: Etienne Champetier <[email protected]>
89fc059
to
2ff50e2
Compare
Using sizelimit with to_file now keep the start of the command output instead of being ignored. You can use tac with and without sizelimit. Signed-off-by: Etienne Champetier <[email protected]>
Instead of generating all the logs and tailing the last 100M, we get the first 100M of 'journalctl --reverse' that we then reverse again using tac_logs(). On journalctl timeout we now get the most recents logs where previously we were getting some random old logs. During collection, logs are now buffered on disk, so we use 2xsizelimit. Previously buffering was in RAM (also 2xsizelimit). On my test server, logs plugin runtime goes from 34s to 9.5s. Signed-off-by: Etienne Champetier <[email protected]>
2ff50e2
to
5fb7009
Compare
@pmoravec cirrus-ci is broken right now, but this should be ready for review |
Instead of generating all the logs and tailing the last 100M, we get the first 100M of 'journalctl --reverse' that we then reverse again using our own implementation of tac.
To handle multiline logs we would need to use "tac -brs '^[^ ]'" that takes ~30s on 100M of logs when plain 'tac' takes ~0.3s. Our simple implementation in python takes 0.7s, and avoid an extra dependency.
On journalctl timeout we now get the most recents logs.
During collection logs are now buffered on disk, so we use 2xsizelimit. While running our tac we could actually truncate the source file to limit disk usage. Previously buffering was in RAM (also 2xsizelimit).
On my test server, logs plugin runtime goes from 34s to 9.5s.
Fixes #3615
Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines