Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document whether it makes sense to use a BufReader or BufWriter with zip #167

Open
e00E opened this issue Jul 20, 2023 · 1 comment
Open

Comments

@e00E
Copy link

e00E commented Jul 20, 2023

Let's say I want to read a zip file. I open an std::fs::File to pass it into zip::ZipArchive::new . Should I wrap my file in a BufReader first?

Later I get a zip::read::ZipFile that I want to read from the archive. ZipFile implements Read. Should I wrap it in BufReader?

The same questions hold for the writing side. You can read the std docs [1] [2] for why generally these adapters make sense. The question is whether the structs in this crate already do buffering internally. In this case adding another layer of buffering is detrimental. The documentation should talk about this.

[1] https://doc.rust-lang.org/std/io/struct.BufReader.html
[2] https://doc.rust-lang.org/std/io/struct.BufWriter.html

@axnsan12
Copy link

The zip crate appears to do explicit buffering of reads for zstd decompression: https://github.com/zip-rs/zip/blob/21a20584bc9e05dfa4f3c5b0bc420a1389fae2c3/src/read.rs#L135
The bzip2::read::BzDecoder and flate2::read::DeflateDecoder structs appear to implicitly buffer their reads under the hood: https://docs.rs/bzip2/0.4.4/src/bzip2/read.rs.html#23
Nothing appears to do buffering for Stored reading, though.

There is no explicit buffering by the zip crate on the writer side.
The bzip2::write::BzEncoder and flate2::write::DeflateEncoder do not do any implicit buffering here.

Empirically, for writing with zstd compression I saw great speedups by wrapping the ZipWriter in a BufWrite for each file like so:

archive.start_file("file", opts)?;
let writer = BufWriter::with_capacity(2 * 1024 * 1024, &mut archive);

writer.write_all(b"blah blah")?;

writer.flush()?;
drop(writer);

Overall I think this question is not that simple to answer and document without doing extensive benchmarks for all supported compression backends.
It's not just a question of reducing file I/O, some compression algorithms might benefit by being given larger chunks even when doing memory-to-memory.
Maybe the zip crate should be itself adding buffering where it makes sense?

@Pr0methean Pr0methean transferred this issue from zip-rs/zip-old Jun 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants