Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cleaner] Use data filter for extraction #3330

Conversation

pmoravec
Copy link
Contributor

@pmoravec pmoravec commented Aug 8, 2023

Since Python 3.12, archive.extractall would require setting an extraction filter otherwise an exception is raised. See PEP 706 for more.

Since sos extracts just tarballs that should be compliant with safe extraction (no extracted file outside the target directory etc.), we should stick to the data filter.

Relevant: #3319
Closes: #3330


Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines

  • Is the commit message split over multiple lines and hard-wrapped at 72 characters?
  • Is the subject and message clear and concise?
  • Does the subject start with [plugin_name] if submitting a plugin patch or a [section_name] if part of the core sosreport code?
  • Does the commit contain a Signed-off-by: First Lastname [email protected]?
  • Are any related Issues or existing PRs properly referenced via a Closes (Issue) or Resolved (PR) line?

@pmoravec
Copy link
Contributor Author

pmoravec commented Aug 8, 2023

Sadly, Python 3.12 is not yet out sufficiently to let me test the data filter on it. Just backward compatibility tested on a few archives against a few Python versions.

@packit-as-a-service
Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/sosreport-sos-3330
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

1 similar comment
@packit-as-a-service
Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/sosreport-sos-3330
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@pmoravec
Copy link
Contributor Author

pmoravec commented Aug 9, 2023

Ubuntu, we have a problem.

The failure behind ubuntu-22.04 is as follows:

Failed to extract contents of /tmp/sos.fwtzjpae/cleaner: 'tmp/sosreport-442d729a9e0a-2023-08-09-okmvfti/sys/class/net/lo' would link to '/tmp/devices/virtual/net/lo', which is outside the destination

While the file to extract is:

# file /tmp/sosreport-442d729a9e0a-2023-08-09-okmvfti/sys/class/net/lo
/tmp/sosreport-442d729a9e0a-2023-08-09-okmvfti/sys/class/net/lo: symbolic link to ../../devices/virtual/net/lo
#

Which matches the original file:

# file /sys/class/net/lo
/sys/class/net/lo: symbolic link to ../../devices/virtual/net/lo
#

But the problem is, Python 3.10.12 on Ubuntu 22.04 treats the destinations of extracted symlinks relative to the working directory - hence ../../devices/virtual/net/lo is treated as /tmp/sos.fwtzjpae/cleaner/../../devices/virtual/net/lo, i.e. the complained outside-archive /tmp/devices/virtual/net/lo.

Any idea how to react on the Python(-on-Ubuntu) bug? (the best I figured out is to fallback to the "ignore the errors" fully_trusted filter, but I dont like it much). Cc @arif-ali

@arif-ali
Copy link
Member

arif-ali commented Aug 9, 2023

Interesting issue, will need to double back, and check this out in an environment in collaboration with my team. Will get back to you on this

@arif-ali
Copy link
Member

arif-ali commented Aug 9, 2023

So I have been playing around with this issue overnight, to understand it, and jammy (22.04), lunar(23.04) and mantic (23.10 -- development release) all fail once this is added as data_filter. As a side, I also created a fedora-38 container and ran the same test, and I had the same issues as well, so I think this issue will appear in any new python3 from what I can see, but I could be wrong.

[root@fedora-38 ~]# sos clean -v --batch --tmp-dir /var/tmp/avocado_au4njzovsos_tests.py.ExistingArchiveCleanTest -v sos/tests/test_data/sosreport-cleanertest-2021-08-03-qpkxdid.tar.xz
[cleaner] Cleaner initialized. From cmdline: True

sos clean (version 4.5.6)

This command will attempt to obfuscate information that is generally considered
to be potentially sensitive. Such information includes IP addresses, MAC
addresses, domain names, and any user-provided keywords.

Note that this utility provides a best-effort approach to data obfuscation, but
it does not guarantee that such obfuscation provides complete coverage of all
such data in the archive, or that any obfuscation is provided to data that does
not fit the description above.

Users should review any resulting data and/or archives generated or processed by
this utility for remaining sensitive content before being passed to a third
party.

[cleaner:sosreport-cleanertest-2021-08-03-qpkxdid] Loaded sos/tests/test_data/sosreport-cleanertest-2021-08-03-qpkxdid.tar.xz as type sos report archive
[cleaner] Pre-loading all archives into obfuscation maps
[cleaner] Prepping hostname mapping with items from sosreport-cleanertest-2021-08-03-qpkxdid
[cleaner] Prepping ip parser with file sos_commands/networking/ip_-o_addr from sosreport-cleanertest-2021-08-03-qpkxdid
[cleaner] Prepping ipv6 parser with file sos_commands/networking/ip_-o_addr from sosreport-cleanertest-2021-08-03-qpkxdid
[cleaner] Failed to prep ipv6 map from sos_commands/networking/ip_-o_addr: '6 ::1/64' does not appear to be an IPv4 or IPv6 network
[cleaner] Prepping mac parser with file sos_commands/networking/ip_-d_address from sosreport-cleanertest-2021-08-03-qpkxdid
[cleaner:sosreport-cleanertest-2021-08-03-qpkxdid] Unable to retrieve etc/cron.allow: no such file in archive
[cleaner:sosreport-cleanertest-2021-08-03-qpkxdid] Unable to retrieve etc/cron.deny: no such file in archive
[cleaner] Prepping username mapping with items from sosreport-cleanertest-2021-08-03-qpkxdid
Found 1 total reports to obfuscate, processing up to 4 concurrently

sosreport-cleanertest-2021-08-03-qpkxdid :         Extracting...
Exception while processing sosreport-cleanertest-2021-08-03-qpkxdid: A process in the process pool was terminated abruptly while the future was running or pending.
No reports obfuscated, aborting...

Could it be that fedora's old image works, and the new one doesn't?

Maybe worth updating the gce images for cirrus to latest ones, and see how we get on?

So, how did I test this

  • download the repo with the added filter
  • created a fedora38 container
lxc launch images:fedora/38 fedora-38
lxc file push sos.tgz fedora-38/root/
lxc exec fedora-38 -- tar xf sos.tgz
lxc exec fedora-38 bash
  • installed some deps, python3-file-magic and python3-setuptools via dnf
  • Then ran the same sos clean command that is being run, and we get the above error

Even the tarball that is in the test folder does have absolute links, and it would throw the AbsoluteLinkError, as per the documentation https://docs.python.org/3/library/tarfile.html#tarfile.data_filter. At least my understanding from this, is that it is doing the right thing.

Based on what I am reading, we need to have fully_trusted_filter. I am willing to be wrong, and others may need to chime in here

below are the versions of python3, that from the respective containers I tested on tonight

[00:08:36] [~] ❱❱❱ lxc exec jammy01 -- python3 --version
Python 3.10.12

[00:08:51] [~] ❱❱❱ lxc exec lunar01 -- python3 --version
Python 3.11.4

[00:08:56] [~] ❱❱❱ lxc exec mantic01 -- python3 --version
Python 3.11.4

[00:09:01] [~] ❱❱❱ lxc exec fedora-38 -- python3 --version 
Python 3.11.4

@pmoravec
Copy link
Contributor Author

Indeed, 3.11.4 on Fedora38 (the combination I tested) is also affected - interesting that CI tests didnt reveal it..

So really, until this works well in Python, we must fallback to fully_trusted_filter.

Since Python 3.12, archive.extractall would require setting an
extraction filter otherwise an exception is raised. See PEP 706 for
more.

Since Python 3.10 and 3.11 can raise false alarms as exceptions (see sosreport#3330
for examples), we must use the legacy fully_trusted filter.

Relevant: sosreport#3319
Closes: sosreport#3330

Signed-off-by: Pavel Moravec <[email protected]>
@pmoravec pmoravec force-pushed the sos-pmoravec-archive-extractall-data-filter branch from 4e42e5f to 8db4861 Compare August 10, 2023 17:19
@pmoravec
Copy link
Contributor Author

Hi @TurboTurtle , we would like to get this to the next sos upstream release/tag. Could you please review the PR? Thanks in advance.

@TurboTurtle TurboTurtle merged commit d71e6b5 into sosreport:main Aug 15, 2023
30 checks passed
@jjansky1 jjansky1 mentioned this pull request Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants