Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: Add extract filter for tarfile.extractall #3340

Merged
merged 3 commits into from
Jan 24, 2024

Conversation

echoix
Copy link
Member

@echoix echoix commented Jan 8, 2024

Fixed a CodeQL CWE-22 type issue (named Arbitrary file write during tarfile extraction), found when running an extended analysis (instead of only default).

The solution suggested in the alert didn’t seem appropriate (extracting files one by one), but I found that this issue was fixed in Python 3.12 and backported in the versions from 3.8 to 3.11. The extraction filter chosen is the strictest, since the archives that are supposed to be extracted are purely data, we do not expect any rare or advanced behaviour.

https://docs.python.org/3/library/tarfile.html#extraction-filters

https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractall

@github-actions github-actions bot added Python Related code is in Python libraries labels Jan 8, 2024
@neteler
Copy link
Member

neteler commented Jan 8, 2024

Should the (few) other uses fixed as well?

ag --python -l tar.extractall
python/grass/temporal/stds_import.py
python/grass/utils/download.py
scripts/g.extension/g.extension.py
scripts/r.unpack/r.unpack.py
scripts/v.unpack/v.unpack.py

@nilason
Copy link
Contributor

nilason commented Jan 8, 2024

As we can't be sure if the user's Python version is the latest (i.e. a backported version), at least the call should be in a try-catch clause falling back to supported call. Perhaps make a wrapper for more general use?

@echoix
Copy link
Member Author

echoix commented Jan 8, 2024

I’ll copy the full alert here:

Tool
CodeQL
Rule ID
py/tarslip
Query
View source

Extracting files from a malicious tar archive without validating that the destination file path is within the destination directory can cause files outside the destination directory to be overwritten, due to the possible presence of directory traversal elements (..) in archive paths.

Tar archives contain archive entries representing each file in the archive. These entries include a file path for the entry, but these file paths are not restricted and may contain unexpected special elements such as the directory traversal element (..). If these file paths are used to determine an output file to write the contents of the archive item to, then the file may be written to an unexpected location. This can result in sensitive information being revealed or deleted, or an attacker being able to influence behavior by modifying unexpected files.

For example, if a tar archive contains a file entry ..\sneaky-file, and the tar archive is extracted to the directory c:\output, then naively combining the paths would result in an output file path of c:\output\..\sneaky-file, which would cause the file to be written to c:\sneaky-file.

Recommendation

Ensure that output paths constructed from tar archive entries are validated to prevent writing files to unexpected locations.

The recommended way of writing an output file from a tar archive entry is to check that ".." does not occur in the path.

Example

In this example an archive is extracted without validating file paths. If archive.tar contained relative paths (for instance, if it were created by something like tar -cf archive.tar ../file.txt) then executing this code could write to locations outside the destination directory.

import sys
import tarfile

with tarfile.open(sys.argv[1]) as tar:
    #BAD : This could write any file on the filesystem.
    for entry in tar:
        tar.extract(entry, "/tmp/unpack/")

To fix this vulnerability, we need to check that the path does not contain any .. elements in it.

import sys
import tarfile
import os.path

with tarfile.open(sys.argv[1]) as tar:
    for entry in tar:
        #GOOD: Check that entry is safe
        if os.path.isabs(entry.name) or ".." in entry.name:
            raise ValueError("Illegal tar archive entry")
        tar.extract(entry, "/tmp/unpack/")

References

Snyk: Zip Slip Vulnerability.
OWASP: Path Traversal.
Python Library Reference: TarFile.extract.
Python Library Reference: TarFile.extractall.
Common Weakness Enumeration: CWE-22.

@echoix
Copy link
Member Author

echoix commented Jan 8, 2024

Should the (few) other uses fixed as well?


ag --python -l tar.extractall

python/grass/temporal/stds_import.py

python/grass/utils/download.py

scripts/g.extension/g.extension.py

scripts/r.unpack/r.unpack.py

scripts/v.unpack/v.unpack.py

Yes they should

@echoix
Copy link
Member Author

echoix commented Jan 8, 2024

As we can't be sure if the user's Python version is the latest (i.e. a backported version), at least the call should be in a try-catch clause falling back to supported call. Perhaps make a wrapper for more general use?

To really fix that specific issue reported, it should probably be something like the loop of the example. But using the 'data' filter solves way more.

@neteler
Copy link
Member

neteler commented Jan 8, 2024

(I have opened the current macOS g.extension doctest.sh CI issue here: #3341)

@echoix
Copy link
Member Author

echoix commented Jan 8, 2024

Is there some usages of shutil.unpack_archive() too?
They also accept the filter argument, released in the same patch version (June 6):

https://www.python.org/downloads/release/python-3817/ -> no installers are made since 3.8.10 (source only as in security fixes only)

https://www.python.org/downloads/release/python-3917/ -> no installers since 3.9.13

https://www.python.org/downloads/release/python-31012/ -> no installers since 3.10.11

https://www.python.org/downloads/release/python-3114/ -> installers available

@neteler
Copy link
Member

neteler commented Jan 8, 2024

Is there some usages of shutil.unpack_archive() too?

... not (yet?):

ag --python unpack_archive 
python/grass/utils/download.py
44:# TODO: Possibly migrate to shutil.unpack_archive.
74:# TODO: Possibly migrate to shutil.unpack_archive.

@echoix
Copy link
Member Author

echoix commented Jan 10, 2024

I didn't forget about this one, it's just a little too involved to do correctly only on a phone in the evening. It's on my weekend list.

@github-actions github-actions bot added vector Related to vector data processing raster Related to raster data processing module docs general labels Jan 13, 2024
@echoix echoix changed the title grass.temporal.stds_import: Add extract filter for tar.extractall python: Add extract filter for tarfile.extractall Jan 13, 2024
@echoix
Copy link
Member Author

echoix commented Jan 13, 2024

I think a good summary for this PR (maybe the final commit message contents) could be:

Extraction filters were added in Python 3.12,
and backported to 3.8.17, 3.9.17, 3.10.12, and 3.11.4
See https://docs.python.org/3.12/library/tarfile.html#tarfile-extraction-filter
and https://peps.python.org/pep-0706/
In Python 3.12, using `filter=None` triggers a DepreciationWarning,
and in Python 3.14, `filter='data'` will be the default

@echoix
Copy link
Member Author

echoix commented Jan 13, 2024

I took a look at how the changes were implemented, first suggested, and what other changes were made elsewhere to use this security backport. I found a couple issues/PRs of the author of the cpython fix or reviewed by him.

One of these is pypa/build#675, and had a comment with a suggestion here pypa/build#675 (comment) but I didn't like both, it's quite long. pypa/pip#12214 is even longer.
https://github.com/pypa/build/pull/609/files is different

Another reference https://packaging.python.org/en/latest/specifications/source-distribution-format/#unpacking-with-the-data-filter

But then I found the PEP 706 that was made for this issue, which suggests some usages:
https://peps.python.org/pep-0706/#backporting-forward-compatibility
I chose to use the data filter is available, or warn and use old behavior otherwise.
I think its cleaner for now.

On one file, I used debug instead of warning, since there was a debug function that wrapped the grass.script to prevent circular dependencies.

@echoix
Copy link
Member Author

echoix commented Jan 13, 2024

It seems only r.unpack has tests that uses any of the files I changed. But they are not found by gunittest, since the shell script and data are in a test_suite directory instead of testsuite. Locally, I renamed the folder and was able to see that the test script was picked up and passed. To make sure, I added flags to both code paths, and they were handled correctly for Python 3.8.16 (before the fix in 3.8.17) and 3.12.1. Let's assume that it works for the other files, since CI can't really prove it to us.

@echoix
Copy link
Member Author

echoix commented Jan 13, 2024

Ready to review!

@neteler neteler added this to the 8.4.0 milestone Jan 14, 2024
Copy link
Member

@ninsbl ninsbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Good solution!
Addressing security related issues is important in general, even if the risk of exploytation seems very low, Especially if there is such a good fix available.

@echoix echoix merged commit e276098 into OSGeo:main Jan 24, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs general libraries module Python Related code is in Python raster Related to raster data processing vector Related to vector data processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants