Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 721: Use the data_filter when extracting tarballs, if it's available. #12214

Merged
merged 4 commits into from
May 6, 2024

Conversation

encukou
Copy link
Contributor

@encukou encukou commented Aug 8, 2023

Previous behaviour is used on Python without PEP-720 tarfile filters. (Note that the feature is now in security releases of all supported versions.)

A custom filter (which wraps data_filter) is used to retain pip-specific behaviour:

  • Removing a common leading directory
  • Setting the mode (Unix permissions)

Compared to the previous behaviour, if a file can't be unpacked, the unpacking operation will fail with InstallError, rather than skipping the individual file with a logger.warning. This means that "some corrupt tar files" now can't be unpacked.
IMO, this is the correct path forward, but maybe there should be a CLI option to use the old behaviour? Sadly I don't know enough about pip to add one and propagate the setting to untar_file.

The new code path avoids internal API (TarFile._extract_member).

Note that PEP 721 limits itself to sdists, this change affects unpacking any other tar file.

I hope I got the nox/mypy/ruff stuff right, guess CI will tell...


This PR indents a big block of code. To hide whitespace changes, use git diff -w or add ?w=1 to the GitHub URL.

Fixes: #12111

@uranusjr
Copy link
Member

uranusjr commented Aug 8, 2023

I feel it may be a good idea to pull the except and else blocks into separate functions for readability. It’s difficult for me to find where the except block ends now.

@encukou encukou marked this pull request as draft August 8, 2023 12:23
@henryiii
Copy link
Contributor

henryiii commented Sep 6, 2023

Needs same fix as pypa/build#675 to avoid breakage.

@encukou
Copy link
Contributor Author

encukou commented Sep 7, 2023

(This is still on my TODO list, it just keeps getting pushed back by conferences.)

@encukou
Copy link
Contributor Author

encukou commented Nov 15, 2023

Sorry for the delay.
A bug in my initial implementation of tarfile filters means that specific CPython versions raise spurious exceptions. Filtering those out, the “new” way unfortunately grew to be about as many lines as the “old” way -- although it does have more comments and a big chunk can be cleanly removed when support for specific versions ends.

@encukou encukou marked this pull request as ready for review November 15, 2023 14:53
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would break some existing code. Do we want to add a flag to allow users to disable data_filter temporarily? Or should we treat those as security issues and allow breakage?

…ailable.

Previous behaviour is used on Python without PEP-720 tarfile filters. (Note that the
feature is now in security releases of all supported versions.)

A custom filter (which wraps `data_filter`) is used to retain pip-specific behaviour:
- Removing a common leading directory
- Setting the mode (Unix permissions)

Compared to the previous behaviour, if a file can't be unpacked, the unpacking operation
will fail with `InstallError`, rather than skipping the individual file with
a `logger.warning`. This means that "some corrupt tar files" now can't be unpacked.

Note that PEP 721 limits itself to sdists, this change affects unpacking any
other tar file.
@pradyunsg pradyunsg merged commit 12c171f into pypa:main May 6, 2024
28 checks passed
@pradyunsg
Copy link
Member

Do we want to add a flag to allow users to disable data_filter temporarily? Or should we treat those as security issues and allow breakage?

The latter, IMO.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use tarfile filters (PEP 706) when unpacking?
6 participants