Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java Femtopzip gets into a nasty infinite loop on corrupt input data #6

Open
ehrmann opened this issue Jan 17, 2013 · 4 comments
Open

Comments

@ehrmann
Copy link
Contributor

ehrmann commented Jan 17, 2013

I have a slightly proprietary example I can give you (contact me at ehrmann+1923 <at> gmail).

What happened was that I base 64 encoded a byte array after compressing it, converted it to lower case, decoded it back to a byte array, then tried to decompress it. compressionModel.decompress(data) took much longer than it should have, then the JVM ran out of memory. There's a chance Femtozip is correctly decoding what becomes a massive byte array, but it could also be a bug.

A nice workaround might be to have a maxExpectedSize parameter on decompress to guard against this.

@gtoubassi
Copy link
Owner

Hi Dave,

I'd like to dig into the specific case but don't expect to have time over the next few weeks. I believe gzip avoids this problem by encoding an adler-32 checksum to verify the sanity of the data. I chose to avoid doing something like this to keep from growing the output (since fz is designed for small payloads, adding 2 or 4 bytes would be significant). I assume for your purposes putting your own checksum around the payload would be unacceptable? I like the idea of having a maxExpectedSize as a hint.

@ehrmann
Copy link
Contributor Author

ehrmann commented Jan 18, 2013

At least for now, I could prefix it with my own checksum. What I really want to make sure of is that there isn't a bug that's leading to this.

The other nice feature would be a decompressInterruptibly() method that can be aborted.

@gtoubassi
Copy link
Owner

Yes the first order issue is diagnosing the case at hand it would be nice
if the decompressor was more of a generator that you would invoke
repeatedly to pump out the bytes. gzi has somewhat this architecture. It
added significant complexity to the code and I figured for small payloads
it wasn't worth it.

I'm open to all apparoaches here..

On Thu, Jan 17, 2013 at 5:11 PM, David Ehrmann [email protected]:

At least for now, I could prefix it with my own checksum. What I really
want to make sure of is that there isn't a bug that's leading to this.

The other nice feature would be a decompressInterruptibly() method that
can be aborted.


Reply to this email directly or view it on GitHubhttps://github.com//issues/6#issuecomment-12402822.

@ehrmann
Copy link
Contributor Author

ehrmann commented Mar 7, 2013

When you get a chance, let me know if you'd like an example of the model and byte[] that cause the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants