-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pixz | xz -d corrupts data #65
Comments
While dealing with data corruption, can I suggest running American Fuzzy Lop against every combination of pixz and xz? People found load of bugs in compressors with relatively little effort. Can I also suggest a major version change when the data corruption bug gets fixed? At this point we have decided not to use pixz at all. Not ever. The danger of someone installing a broken version somewhere and having to spend weeks to track down the problem trumps any speed gains. |
@JoernEngel Please try to reproduce with version 1.0.6. |
Also, @JoernEngel, is it possible for you to provide the file you are testing with as a download so we can try to work with it? For such a download, please also provide us with the md5 of its original, uncompressed version, and then compress it via another compression tool like gzip to exclude any xz-related issues. |
On Tue, Apr 26, 2016 at 12:41:20PM -0700, Christian Krause wrote:
I tried and failed.
There is no ./configure checked in. Autoconf is throwing errors: The file I picked contains confidential data. But seriously, I Jörn A surrounded army must be given a way out. |
configure is in here. You probably downloaded one of the GitHub-generated tarballs. These do not contain the |
On Tue, Apr 26, 2016 at 01:02:40PM -0700, Christian Krause wrote:
I used the git tree. Anyway, 1.0.6 reproduced the problem, file sizes And I found yet another fun bug. Jörn It's not that I'm so smart, it's just that I stay with problems longer. |
pixz -0 | pixz -d seems to work correctly, modulo the "Illegal Seek" Trouble is that files generated by pixz get detected as regular xz files But xz came first and pixz, for better or worse, has to be compatible. Jörn Measure. Don't tune for speed until you've measured, and even then |
I just tried with a large file and couldn't reproduce:
pixz 1.0.6 |
@vasi any insights? |
@JoernEngel : Could it be that the thing you compressed is a tar file? When it sees a tarball, pixz will by default add an index, so that it's possible to seek to a specific file. This is done in a safe way, nothing is "corrupted":
If you've discovered a type of file that is not a tarball, but can fool libarchive's checks, please let us know. Also, please try to be kind. pixz is a tool that other people wrote for free, and just gave to you (and everyone else), and are now supporting for free as well. Nobody is forcing you to use it. If you want to be helpful, that's great! But if you want to act aggressively, you're welcome to find someone offering paid support instead of using our issue tracker. |
On Tue, Apr 26, 2016 at 02:00:45PM -0700, Dave Vasilevsky wrote:
Indeed, the file is a tarball and the extra data looks like filenames I would still call this corruption. If the archive looks like regular Tarball mode clearly has its appeal. I can see why you did it. But I Historically there have been consumer flash devices that worked in The problem with autodetection is that things can go wrong and Anyway, another interesting question is whether the pixz archive can be Jörn Functionality is an asset, but code is a liability. |
On Tue, Apr 26, 2016 at 02:00:45PM -0700, Dave Vasilevsky wrote:
Would you consider paid support? I believe pixz is exactly what we need and would save us money. Sending Jörn The cost of changing business rules is much more expensive for software |
Yeah, unfortunately the xz file format doesn't allow for arbitrary extra data to be carried along, like gzip does. Or at least I haven't found a way to do that. While nowadays most people probably use pixz for its multi-core support, it was actually originally developed for the indexing feature! Once that was implemented, it turned out that parallelizing things was trivial, so that happened. But indexing is a pretty core part of pixz's mission. Sorry it doesn't suit you. You might want to create an alias pixz='pixz -t' or something, so you don't have to deal with it. I do think it would be nice if pixz eventually supported a configuration file, so folks could say whether or not they wanted this feature. (Maybe also for features like number of CPUs! I'm sure some people don't appreciate us maxing out all their cores.) |
@JoernEngel Would you please verify that the |
On Tue, Apr 26, 2016 at 02:55:46PM -0700, Dave Vasilevsky wrote:
Interesting. That might also explain whey "pixz | xz -d" is slower than Looks like "pixz | xz -d" keeps about 3.5 cpus busy on my notebook, Anyway, thank you for this interesting piece of software. Extra thanks Jörn When in doubt, use brute force. |
On Tue, Apr 26, 2016 at 03:02:30PM -0700, Christian Krause wrote:
Verified. "pixz -t -0 | xz -d" works correctly. Jörn The trouble with the world is that the stupid are cocksure and |
I checked whether pixz and xz format were compatible. Looks like they are. But on a large testfile, I don't get the same data back as before. In particular, the output file is larger than the input file.
All bytes up to the end of input file are identical. Output file simply has additional data. Additional data seems derived from input file, not all zeroes or such.
$ ls -l foo
-rw-r--r-- 1 joern joern 4380151296 Apr 26 11:30 foo
$ cat foo | pixz -0 | xz -d > bar
$ ls -l bar
-rw-r--r-- 1 joern joern 4384062468 Apr 26 12:08 bar
pixz 1.0.2
xz (XZ Utils) 5.1.0alpha
Both from Debian, running on x86_64.
The text was updated successfully, but these errors were encountered: