Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse entries from a Readable stream #97

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

mojodna
Copy link

@mojodna mojodna commented Nov 13, 2018

This replaces multiple (2 x entry count) small reads with a single read stream containing the full central directory and eliminates many intermediate Buffer allocations and copies in the process, which were resulting in increased time spent reading entries. Readable stream implementations are responsible for determining appropriate buffer sizes.

This is a port of the old entry parsing code to a Transform stream (as an ES6 class; available since Node 6.0.0--I can change this if desired). The central directory size is read from the ZIP64 end of central directory record (if available, otherwise it's calculated from the size of the file) and used to bound the size of the requested Readable stream. (This was necessary to prevent the ZIP64 eocdr from being parsed as an entry with an invalid signature.)

lazyEntries: true is implemented by pausing the stream and reading individual Entry objects as requested, recursively via setImmediate. This is a little weird (I'd tried resuming and pausing again using .once("data"), but that didn't work), but seems to be the only way to ensure that only a single entry is emitted at a time.

Fixes #96

This replaces multiple (2 x entry count) small reads with a single read
stream containing the full central directory and eliminates many
intermediate Buffer allocations and copies in the process, which were
resulting in increased time spent reading entries.

This is a port of the old entry parsing code to a Transform stream (as
an ES6 class; available since Node 6.0.0). The central directory size is
read from the ZIP64 end of central directory record (if available) and
used to bound the size of the requested Readable stream.

lazyEntries: true is implemented by pausing the stream and reading
individual Entry objects as requested.
@thejoshwolfe
Copy link
Owner

Thanks for the PR! Did you get a chance to benchmark this in your AWS environment?

Regarding ES6, Node 6, and "All checks have failed", I'm not really sure what to do about this. Is it time to drop support for Node 0 yet? I don't use Node 0. 🤷‍♂️

@mojodna
Copy link
Author

mojodna commented Nov 15, 2018

Yep. Things look great on Lambda; effectively the same as #95 but without the downsides of unbounded buffer sizes.

Forcing index.js into strict mode gets things working down to Node 4.x, so I guess the question is whether you want to set 4.x as a minimum version (I do/will care about browser support, but I've already seen it work with rollup). Node 4 was EOLed at the end of April and the newest version actually being dropped was EOLed at the end of 2016.

@mojodna mojodna force-pushed the entry-readable-stream branch from f0ddcbf to 985a3f5 Compare December 1, 2018 19:21
@mojodna
Copy link
Author

mojodna commented Dec 1, 2018

I've made some changes to get this green on 0.10.

@mojodna
Copy link
Author

mojodna commented Dec 13, 2018

@thejoshwolfe nudge.

@thejoshwolfe
Copy link
Owner

thanks again for the contribution. And sorry, but it's going to take me some time to get around to this.

@mojodna
Copy link
Author

mojodna commented Dec 18, 2018

No worries. Much appreciated! If there’s anything I can do, let me know!

@jfsiii
Copy link

jfsiii commented Jun 6, 2019

@thejoshwolfe any ETA/prognosis for this? Anything others can do to help?

@gmaclennan
Copy link

Hey this would be great to add. @mojodna have you been using this fork in production and does it speed things up for archives with many files?

@mojodna
Copy link
Author

mojodna commented Nov 6, 2019

Not production per se, but I have tested a service that uses this patch quite heavily. It works great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read central directory from a stream instead of multiple small random access reads
4 participants