Document whether it makes sense to use a BufReader or BufWriter with zip #167

e00E · 2023-07-20T20:21:49Z

Let's say I want to read a zip file. I open an std::fs::File to pass it into zip::ZipArchive::new . Should I wrap my file in a BufReader first?

Later I get a zip::read::ZipFile that I want to read from the archive. ZipFile implements Read. Should I wrap it in BufReader?

The same questions hold for the writing side. You can read the std docs [1] [2] for why generally these adapters make sense. The question is whether the structs in this crate already do buffering internally. In this case adding another layer of buffering is detrimental. The documentation should talk about this.

[1] https://doc.rust-lang.org/std/io/struct.BufReader.html
[2] https://doc.rust-lang.org/std/io/struct.BufWriter.html

axnsan12 · 2023-07-23T10:36:34Z

The zip crate appears to do explicit buffering of reads for zstd decompression: https://github.com/zip-rs/zip/blob/21a20584bc9e05dfa4f3c5b0bc420a1389fae2c3/src/read.rs#L135
The bzip2::read::BzDecoder and flate2::read::DeflateDecoder structs appear to implicitly buffer their reads under the hood: https://docs.rs/bzip2/0.4.4/src/bzip2/read.rs.html#23
Nothing appears to do buffering for Stored reading, though.

There is no explicit buffering by the zip crate on the writer side.
The bzip2::write::BzEncoder and flate2::write::DeflateEncoder do not do any implicit buffering here.

Empirically, for writing with zstd compression I saw great speedups by wrapping the ZipWriter in a BufWrite for each file like so:

archive.start_file("file", opts)?;
let writer = BufWriter::with_capacity(2 * 1024 * 1024, &mut archive);

writer.write_all(b"blah blah")?;

writer.flush()?;
drop(writer);

Overall I think this question is not that simple to answer and document without doing extensive benchmarks for all supported compression backends.
It's not just a question of reducing file I/O, some compression algorithms might benefit by being given larger chunks even when doing memory-to-memory.
Maybe the zip crate should be itself adding buffering where it makes sense?

Pr0methean transferred this issue from zip-rs/zip-old Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document whether it makes sense to use a BufReader or BufWriter with zip #167

Document whether it makes sense to use a BufReader or BufWriter with zip #167

e00E commented Jul 20, 2023 •

edited

Loading

axnsan12 commented Jul 23, 2023

Document whether it makes sense to use a BufReader or BufWriter with zip #167

Document whether it makes sense to use a BufReader or BufWriter with zip #167

Comments

e00E commented Jul 20, 2023 • edited Loading

axnsan12 commented Jul 23, 2023

e00E commented Jul 20, 2023 •

edited

Loading