Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue reading a stream after ZlibDecoder streams finishes #401

Closed
marxin opened this issue Apr 23, 2024 · 8 comments · Fixed by #402
Closed

Continue reading a stream after ZlibDecoder streams finishes #401

marxin opened this issue Apr 23, 2024 · 8 comments · Fixed by #402

Comments

@marxin
Copy link
Contributor

marxin commented Apr 23, 2024

I'm implementing the parsing of the git pack file format as part of the coding challenge:
https://app.codecrafters.io/courses/git/stages/7

It seems the git pack format is a binary file format where each object contains a header followed by a Zlib compressed stream. What's unpleasant one doesn't know the size of the compressed block. Is it possible to get back the underlying stream (with into_inner or get_mut) including the buffer data by ZlibDecoder so that I can carry on reading another object header?

@jongiddy
Copy link
Contributor

This should work with bufread::ZlibDecoder. See this test for bufread::GzDecoder. This code modified for zlib should work the same, allowing trailing data to be read from the BufRead after calling into_inner().

Note that the same test does not work for read::GzDecoder and similarly I do not expect it to work with read::ZlibDecoder.

@jongiddy
Copy link
Contributor

#402 adapts the gzip test to demonstrate that this does also work for deflate and zlib BufRead decoders.

@Byron Byron linked a pull request Apr 24, 2024 that will close this issue
@marxin
Copy link
Contributor Author

marxin commented Apr 24, 2024

Thank you very much for the fast response! It's great the current bufread::ZlibDecoder works as I needed.
I can confirm it works for me in my particular test-case.

Have 2 comments:

  • Would it be possible to document the behavior here: https://docs.rs/flate2/latest/flate2/bufread/struct.ZlibDecoder.html#method.into_inner ? Plus I would also make a caveat at read::ZlibDecoder::into_inner that one can't do the same.
  • Just out of curiosity: why does e.g. bufread::ZlibDecoder does not implement std::io::BufRead (would be handy as one does not have to wrap the bufread::ZlibDecoder again into a BufReader::new()?

@Byron
Copy link
Member

Byron commented Apr 24, 2024

As the original question was answered with tests, I think it's fair to close this issue despite inviting for continuing the conversation here.

Regarding documentation, please feel free to open a PR with the improvement to the docs that you would have wanted to see. Maybe you can play around with ZlibDecoder and implementing BufRead on it as well. Maybe even more improvements arise from that :).

@Byron Byron closed this as completed Apr 24, 2024
@jongiddy
Copy link
Contributor

There is an existing discussion on why the bufread decoders do not implement BufRead.

@jongiddy
Copy link
Contributor

The docs for bufread and write GzDecoder have text describing this behaviour. This can be copied to the docs for the other decoders.

@marxin
Copy link
Contributor Author

marxin commented Apr 25, 2024

The docs for bufread and write GzDecoder have text describing this behaviour.

Can you please send me a link to the behavior description? I can't find it :)

@jongiddy
Copy link
Contributor

bufread:

flate2-rs/src/gz/bufread.rs

Lines 171 to 174 in 8a502a7

/// After reading a single member of the gzip data this reader will return
/// Ok(0) even if there are more bytes available in the underlying reader.
/// If you need the following bytes, call `into_inner()` after Ok(0) to
/// recover the underlying reader.

write:

flate2-rs/src/gz/write.rs

Lines 174 to 176 in 8a502a7

/// After decoding a single member of the gzip data this writer will return the number of bytes up to
/// to the end of the gzip member and subsequent writes will return Ok(0) allowing the caller to
/// handle any data following the gzip member.

And there is an equivalent paragraph for the read decoder to say that this does not work:

flate2-rs/src/gz/read.rs

Lines 97 to 101 in 8a502a7

/// After reading a single member of the gzip data this reader will return
/// Ok(0) even if there are more bytes available in the underlying reader.
/// `GzDecoder` may have read additional bytes past the end of the gzip data.
/// If you need the following bytes, wrap the `Reader` in a `std::io::BufReader`
/// and use `bufread::GzDecoder` instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants