Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extractcode errors out on #61

Open
mloeser21 opened this issue Apr 27, 2024 · 3 comments
Open

extractcode errors out on #61

mloeser21 opened this issue Apr 27, 2024 · 3 comments

Comments

@mloeser21
Copy link

mloeser21 commented Apr 27, 2024

Hi,
I'm running into a problem with certain .lz4 and also .jar files. Example (lz4):

$:~/SCAN_IMAGES/release-1.13.zip-extract$ ~/scancode-toolkit/extractcode ./release/deploy_artifacts/router.tar.gz-extract/router.tar-extract/0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/layer.tar-extract/var/lib/apt/lists/deb.debian.org_debian_dists_bullseye_main_binary-amd64_Packages.lz4
Extracting archives...
[####################] 4
ERROR extracting: /home/joe/SCAN_IMAGES/release-1.13.zip-extract/release/deploy_artifacts/router.tar.gz-extract/router.tar-extract/0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/layer.tar-extract/var/lib/apt/lists/deb.debian.org_debian_dists_bullseye_main_binary-amd64_Packages.lz4: Unrecognized archive format
Extracting done.

But the file has substance and can be decompressed using the lz4 utility:

$:~/SCAN_IMAGES/release-1.13.zip-extract$ ls -al ./release/deploy_artifacts/router.tar.gz-extract/router.tar-extract/0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/layer.tar-extract/var/lib/apt/lists/deb
.debian.org_debian_dists_bullseye_main_binary-amd64_Packages.lz4
-rw-r--r-- 1 joe users 17315708 Apr 29  2023 ./release/deploy_artifacts/router.tar.gz-extract/router.tar-extract/0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/layer.tar-extract/var/lib/apt/lists/deb.debian.org_debian_dists_bullseye_main_binary-amd64_Packages.lz4

$:~/SCAN_IMAGES/release-1.13.zip-extract$ lz4 -t ./release/deploy_artifacts/router.tar.gz-extract/router.tar-extract/0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/layer.tar-extract/var/lib/apt/lists/deb.debian.org_debian_dists_bullseye_main_binary-amd64_Packages.lz4
./release/deploy_art : decoded 45545571 bytes
$:~/SCAN_IMAGES/release-1.13.zip-extract$ lz4 --list ./release/deploy_artifacts/router.tar.gz-extract/router.tar-extract/0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/layer.tar-extract/var/lib/apt/lists
/deb.debian.org_debian_dists_bullseye_main_binary-amd64_Packages.lz4
    Frames           Type Block  Compressed  Uncompressed     Ratio   Filename
         1       LZ4Frame   B4D      16.51M             -         -   deb.debian.org_debian_dists_bullseye_main_binary-amd64_Packages.lz4
$:~/SCAN_IMAGES/release-1.13.zip-extract$ lz4 -dv ./release/deploy_artifacts/router.tar.gz-extract/router.tar-extract/0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/layer.tar-extract/var/lib/apt/lists/de
b.debian.org_debian_dists_bullseye_main_binary-amd64_Packages.lz4
*** LZ4 command line interface 64-bits v1.9.3, by Yann Collet ***
Decoding file ./release/deploy_artifacts/router.tar.gz-extract/router.tar-extract/0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/layer.tar-extract/var/lib/apt/lists/deb.debian.org_debian_dists_bullseye_main_binary-amd64_Packages
./release/deploy_art : decoded 45545571 bytes

Following is what the file header looks like:

$:~/SCAN_IMAGES/release-1.13.zip-extract$ hexdump ./release/deploy_artifacts/router.tar.gz-extract/router.tar-extract/0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/layer.tar-extract/var/lib/apt/lists/deb.debian.org_debian_dists_bullseye_main_binary-amd64_Packages.lz4 | head
0000000 2204 184d 4040 cdc0 0078 f200 5003 6361
0000010 616b 6567 203a 6130 0a64 6f53 7275 0c63
0000020 f600 2008 3028 302e 322e 2e33 2d31 2935
0000030 560a 7265 6973 6e6f 203a 0015 7cf5 622b
0000040 0a31 6e49 7473 6c61 656c 2d64 6953 657a
0000050 203a 3032 3632 0a38 614d 6e69 6174 6e69
0000060 7265 203a 6544 6962 6e61 4720 6d61 7365
0000070 5420 6165 206d 703c 676b 672d 6d61 7365
0000080 642d 7665 6c65 6c40 7369 7374 612e 696c
0000090 746f 2e68 6564 6962 6e61 6f2e 6772 0a3e

The magic bytes are correct, pls refer to https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md

Why can lz4 decode it properly but extractcode cannot?

Regards,
Matthias

@pombredanne
Copy link
Member

pombredanne commented Apr 27, 2024

@mloeser21 Thanks for the report.

http://deb.debian.org/debian/dists/bullseye/main/binary-amd64/ does not seem to use lz4 .... just curious how this was made? This looks like a container image. Which base do you use?

Also could you attach the file in question? ( wrapped in a zip to make GH happy)?

@mloeser21
Copy link
Author

mloeser21 commented Apr 30, 2024

Hi @pombredanne Thanks for the quick response.
The container image was created as follows:

docker save ghcr.io/apollographql/router:v1.29.1 -o /tmp/
tar zcvf release/deploy_artifacts/router.tar.gz /tmp/router.tar

As part of router.tar you get various layer.tar archives:

$:~/SCAN_IMAGES/tmp$ tar xvf router.tar
0b1d3fcc8ae40e41993edcb2760b68907f0b94e2525bc1fb58537d5ef5c28018.json
0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/
0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/VERSION
0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/json
0f6a1467d0c8a8fce8ea65eedd0d2ee6e23f979498d128a0101318c7549f90a6/layer.tar
594bbcbc832bef41ae6140bdd25a807811947156b22df5d8ccb141ff1758e02a/
594bbcbc832bef41ae6140bdd25a807811947156b22df5d8ccb141ff1758e02a/VERSION
594bbcbc832bef41ae6140bdd25a807811947156b22df5d8ccb141ff1758e02a/json
594bbcbc832bef41ae6140bdd25a807811947156b22df5d8ccb141ff1758e02a/layer.tar
5ab70ccc1fc958e7cf78cadd641c642e8f4f5bc267797887bc1a636fb69b8a87/
[...]

And those layer.tar archives contain various .lz4 compressed files:

[...]
var/lib/apt/lists/deb.debian.org_debian-security_dists_bullseye-security_InRelease
var/lib/apt/lists/deb.debian.org_debian-security_dists_bullseye-security_main_binary-amd64_Packages.lz4
var/lib/apt/lists/deb.debian.org_debian_dists_bullseye-updates_InRelease
var/lib/apt/lists/deb.debian.org_debian_dists_bullseye-updates_main_binary-amd64_Packages.lz4
var/lib/apt/lists/deb.debian.org_debian_dists_bullseye_InRelease
var/lib/apt/lists/deb.debian.org_debian_dists_bullseye_main_binary-amd64_Packages.lz4
[...]
$ file var/lib/apt/lists/deb.debian.org_debian-security_dists_bullseye-security_main_binary-amd64_Packages.lz4
var/lib/apt/lists/deb.debian.org_debian-security_dists_bullseye-security_main_binary-amd64_Packages.lz4: LZ4 compressed data (v1.4+)

Does this help?

@pombredanne
Copy link
Member

@mloeser21 re:

Does this help?

Yes, this is exactly what's needed to track this down!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants