Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[utilities,plugins] speedup journal collection (v2) #3879

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

champtar
Copy link
Contributor

Instead of generating all the logs and tailing the last 100M, we get the first 100M of 'journalctl --reverse' that we then reverse again using our own implementation of tac.

To handle multiline logs we would need to use "tac -brs '^[^ ]'" that takes ~30s on 100M of logs when plain 'tac' takes ~0.3s. Our simple implementation in python takes 0.7s, and avoid an extra dependency.

On journalctl timeout we now get the most recents logs.

During collection logs are now buffered on disk, so we use 2xsizelimit. While running our tac we could actually truncate the source file to limit disk usage. Previously buffering was in RAM (also 2xsizelimit).

On my test server, logs plugin runtime goes from 34s to 9.5s.

Fixes #3615


Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines

  • Is the commit message split over multiple lines and hard-wrapped at 72 characters?
  • Is the subject and message clear and concise?
  • Does the subject start with [plugin_name] if submitting a plugin patch or a [section_name] if part of the core sosreport code?
  • Does the commit contain a Signed-off-by: First Lastname [email protected]?
  • Are any related Issues or existing PRs properly referenced via a Closes (Issue) or Resolved (PR) line?
  • Are all passwords or private data gathered by this PR obfuscated?

@champtar champtar changed the title [plugins] speedup journal collection [plugins] speedup journal collection (v2) Dec 13, 2024
@champtar champtar marked this pull request as draft December 13, 2024 05:00
Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/sosreport-sos-3879
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@pmoravec
Copy link
Contributor

I can confirm VERY evident speedup on a tested RHEL8 system with:

# journalctl --disk-usage 
Archived and active journals take up 560.0M in the file system.
#

where current upstream executes python3 bin/sos report -o logs --batch --build in 18s and this PR shrinks it to 3s.

Some test is failing - can you @champtar review it or do you need some help? I might get to the failure later today /tomorrow.

@champtar
Copy link
Contributor Author

I'm off until the 27th so no rush for the review, and I was thinking about even implementing 'head' in Python

@pmoravec
Copy link
Contributor

tests/report_tests/plugin_tests/defaults.py:40:            if cmd['exec'] == 'journalctl --no-pager  --unit cups':

needs to be updated to reflect changed command exec "sh -c 'journalctl --no-pager --unit cups --reverse | head -c 104857600'" - I vote for 'journalctl --no-pager --unit cups' in cmd['exec'] test.

The 2-tests/report_tests/plugin_tests/logs.py:JournalSizeLimitTest.test_journal_tailed_and_linked failure: The PR breaks part of tailit functionality: when a command output is bigger than --log-size (or --journal-size here), sosreport looks like:

# ll /var/tmp/sosreport-pmoravec-rhel9-8675309-2024-12-18-dusvggk/sos_commands/logs/
total 24
-rw-r--r--. 1 root root    63 Dec 18 15:50 journalctl_--disk-usage
lrwxrwxrwx. 1 root root    51 Dec 18 15:50 journalctl_--no-pager -> ../../sos_strings/logs/journalctl_--no-pager.tailed
lrwxrwxrwx. 1 root root    58 Dec 18 15:50 journalctl_--no-pager_--boot -> ../../sos_strings/logs/journalctl_--no-pager_--boot.tailed
-rw-r--r--. 1 root root    82 Dec 18 15:50 journalctl_--no-pager_--boot_-1
-rw-r--r--. 1 root root 13355 Dec 18 15:50 ls_-alRh_.var.log
# ll /var/tmp/sosreport-pmoravec-rhel9-8675309-2024-12-18-dusvggk/sos_strings/logs/
total 24824
-rw-r--r--. 1 root root 10484923 Dec 18 15:50 journalctl_--no-pager_--boot.tailed
-rw-r--r--. 1 root root 10484923 Dec 18 15:50 journalctl_--no-pager.tailed
-rw-r--r--. 1 root root  4445393 Dec 18 15:50 var.log.messages-20241201.tailed
#

And manifest.json tracks that via "truncated": true,.

With this PR, I see:

# ll /var/tmp/sosreport-pmoravec-rhel9-8675309-2024-12-18-zagarel/sos_commands/logs/
total 20464
-rw-r--r--. 1 root root       63 Dec 18 15:50 journalctl_--disk-usage
-rw-r--r--. 1 root root 10460043 Dec 18 15:50 journalctl_--no-pager
-rw-r--r--. 1 root root 10460043 Dec 18 15:50 journalctl_--no-pager_--boot
-rw-r--r--. 1 root root       82 Dec 18 15:50 journalctl_--no-pager_--boot_-1
-rw-r--r--. 1 root root    22730 Dec 18 15:50 ls_-alZR_.var.log
#

That suggests the journalctl outputs are not truncated, which is misleading.

@pmoravec
Copy link
Contributor

Hi @arif-ali and @TurboTurtle , is this approach worth to try? My answer is "carefully yes".

Hi @champtar , do you need some help with updating the tests?

sos/utilities.py Fixed Show fixed Hide fixed
@champtar champtar force-pushed the journal-speed2 branch 6 times, most recently from b23de87 to 04f3361 Compare January 26, 2025 02:28
sos/utilities.py Fixed Show fixed Hide fixed
sos/utilities.py Fixed Show fixed Hide fixed
tac_logs() reverses the order of the logs in a text
file but keeps multiline logs in order.
It is aimed at reversing 'journalctl --reverse'.

This avoids depending on GNU coreutils version of tac
(busybox tac is missing some flags) and is way faster.
Testing with 100MB of logs:
- tac -brs '^[^ ]' (handle multiline): ~30s
- tac_logs (our python implementation): ~0.7s
- tac (not handling multiline logs): ~0.3s

If we are concerned about disk usage, we could call
mmap.resize() to truncate the input file while processing.

Signed-off-by: Etienne Champetier <[email protected]>
We are going to introduce HeadReader.

Signed-off-by: Etienne Champetier <[email protected]>
Will be used to keep the first sizelimit MB from a program stdout.

Signed-off-by: Etienne Champetier <[email protected]>
@champtar champtar force-pushed the journal-speed2 branch 2 times, most recently from 89fc059 to 2ff50e2 Compare January 26, 2025 03:26
@champtar champtar marked this pull request as ready for review January 26, 2025 03:31
@champtar champtar changed the title [plugins] speedup journal collection (v2) [utilities,plugins] speedup journal collection (v2) Jan 26, 2025
Using sizelimit with to_file now keep the start of
the command output instead of being ignored.
You can use tac with and without sizelimit.

Signed-off-by: Etienne Champetier <[email protected]>
Instead of generating all the logs and tailing the last 100M,
we get the first 100M of 'journalctl --reverse' that we then
reverse again using tac_logs().

On journalctl timeout we now get the most recents logs
where previously we were getting some random old logs.

During collection, logs are now buffered on disk, so we use 2xsizelimit.
Previously buffering was in RAM (also 2xsizelimit).

On my test server, logs plugin runtime goes from 34s to 9.5s.

Signed-off-by: Etienne Champetier <[email protected]>
@champtar
Copy link
Contributor Author

@pmoravec cirrus-ci is broken right now, but this should be ready for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFE] speedup journal export
2 participants