Use zstd instead of zip #398

asymmetric · 2020-04-13T18:09:55Z

Logs can sometimes be Gigabytes large, so transferring and decompressing them
can take a while.

zstd is very fast at all operations (de/compression, extraction) out of the box.

It's possible there are parameters we can tweak for our specific usecase, but I
haven't looked into that.

For the simple Vat_subui_pass_rough proof in examples/heal it's 40KB using zstd vs 3.8MB using zip.

On a 1.1GB zipped proof file I found on buildbot, unzip needed around 3 minutes.

zstd compressed the contents of the same file to 30MB (!!), and decompressed it in 1m36 seconds.

Since we might want to change the compression method.

See find's manual.

asymmetric · 2020-04-13T18:11:54Z

libexec/klab-compress

-    "bin_runtime.k"
+  \( -name "*$spec_hash*" \
+  -o -name "*$spec_name*" \
+  -o -name config.tmp.json \


I changed this from config.json to config.tmp.json, because the former is not in $KLAB_OUT while the latter is, and this made the code cleaner (since I only have to search in one dir).

AFAICT the former is a subset of the latter, so everything should be fine.

@mhhf true story?

Wins at benchmarks.

asymmetric · 2020-04-13T18:16:18Z

I tested this manually by doing

klab compress $PROOF_HASH
mkdir foo && tar xf $KLAB_OUT/log/$PROOF_HASH.tar -C foo
KLAB_OUT=foo klab debug $PROOF_HASH

asymmetric · 2020-04-13T21:52:34Z

libexec/klab-compress

+  -o -name bin_runtime.k \) \
+  -a \! -name "*.tar" \
+  -a \! -name "*.zip" \
+  -exec tar --create --zstd --file "./log/$spec_hash.tar" {} +


Here we could pass the --threads=N argument to zstd, to control how many threads are used for compression, in case we deemed it necessary.

do you have a rough idea of what kind of speedup we could get from this?

On the test archive on buildbot, using -T0, 29 vs 33 seconds, so negligible. Maybe there are other bottlenecks.

shell.nix

asymmetric · 2020-04-13T21:57:02Z

This is backwards-incompatible as is. Previously generated zip files are not fetchable anymore. Not sure if we should deal with that.

d-xo

In general lgtm. One small question:

I remember previously @mhhf mentioning in chat that zipping was additive, and so it should be pretty easy to start running klab zip every X minutes during prove-all to allow fetching of running proofs. Is this also the case for zstd?

d-xo · 2020-04-14T06:01:08Z

This is backwards-incompatible as is. Previously generated zip files are not fetchable anymore. Not sure if we should deal with that.

It seems like it would be fairly easy to add some conditional logic in klab fetch that can handle both?

asymmetric · 2020-04-14T07:42:00Z

@xwvvvvwx

I remember previously @mhhf mentioning in chat that zipping was additive, and so it should be pretty easy to start running klab zip every X minutes during prove-all to allow fetching of running proofs. Is this also the case for zstd?

No, it's not the case with tar, as it doesn't allow adding files to compressed tarballs ex post.

I'm still not sure if that solution you mention above is the best though - I think it would make sense to have logic that, when receiving the HTTP GET from klab fetch, zips up the file if it's not already there (and also if it's there but the proof is still running)

d-xo · 2020-04-14T07:52:14Z

I'm still not sure if that solution you mention above is the best though - I think it would make sense to have logic that, when receiving the HTTP GET from klab fetch, zips up the file if it's not already there (and also if it's there but the proof is still running)

Not really sure if this issue is the best place to bikeshed this potential feature, but I'm not sure how I feel about adding a long running server process to our CI infrastructure, feels like quite a lot of moving parts.

asymmetric · 2020-04-14T08:03:42Z

Not really sure if this issue is the best place to bikeshed this potential feature, but I'm not sure how I feel about adding a long running server process to our CI infrastructure, feels like quite a lot of moving parts.

Thanks for bringing this up in this issue, as it's relevant.

The way I see it, we wouldn't be adding any moving parts. The long-running process is already there, it's nginx. When a request arrives at a certain endpoint, it starts a (Lua or njs) script that does the tarballing and returns the tarball.

d-xo · 2020-04-14T08:13:49Z

The long-running process is already there, it's nginx. When a request arrives at a certain endpoint, it starts a (Lua or njs) script that does the tarballing and returns the tarball.

Didn't know that you could use nginx like that. I like it. It should even allow us to klab fetch accepted proofs, which is something I have often wanted.

I guess there is also the complication of having different branches / projects potentially using different versions of klab, which could cause some issues if we modify the proof serialization format between klab versions.

asymmetric · 2020-04-14T14:27:48Z

It seems like it would be fairly easy to add some conditional logic in klab fetch that can handle both?

@xwvvvvwx yes, we could do that if needed, it's true. Do we want to?

d-xo · 2020-04-15T05:00:13Z

yes, we could do that if needed, it's true. Do we want to?

On balance I think I'm fine with leaving it out and adding it back if it starts to become annoying.

asymmetric · 2020-04-15T09:30:59Z

yes, we could do that if needed, it's true. Do we want to?

On balance I think I'm fine with leaving it out and adding it back if it starts to become annoying.

I circled back on this, I think it would be much nicer if it would transparently work with zip files without users having to dig back old versions of klab. So I'll implement that in this PR.

asymmetric · 2020-04-15T14:50:50Z

OK, should be done.

Or actually, anything tar can handle.

libexec/klab-fetch

d-xo

one trivial nit. I'm fine to merge either way. lgtm 💖

asymmetric added 2 commits April 13, 2020 18:06

libkexec/klab-zip: rename to klab-compress

fb790e4

Since we might want to change the compression method.

libexec/klab-compress: using POSIX-compliant args

cdbcbd5

See find's manual.

asymmetric requested a review from a team April 13, 2020 18:10

asymmetric commented Apr 13, 2020

View reviewed changes

asymmetric added 2 commits April 13, 2020 20:12

libexec/klab-compress: use zstd

96ef42d

Wins at benchmarks.

*: replace occurrencies of zip with tar

50dd2ae

asymmetric force-pushed the zstd branch from 85a5def to 50dd2ae Compare April 13, 2020 18:12

asymmetric commented Apr 13, 2020

View reviewed changes

shell.nix Show resolved Hide resolved

d-xo previously approved these changes Apr 14, 2020

View reviewed changes

asymmetric mentioned this pull request Apr 14, 2020

fetch running proofs #400

Open

asymmetric dismissed d-xo’s stale review via 256c006 April 15, 2020 14:49

asymmetric requested a review from a team April 15, 2020 14:51

asymmetric added 2 commits April 15, 2020 16:56

libexec/klab-fetch: handle both zip and tar

156918a

Or actually, anything tar can handle.

libexec/klab-fetch: print usage on missing arg

98bb5ad

asymmetric force-pushed the zstd branch from 256c006 to 98bb5ad Compare April 15, 2020 14:56

d-xo reviewed Apr 16, 2020

View reviewed changes

libexec/klab-fetch Show resolved Hide resolved

d-xo approved these changes Apr 16, 2020

View reviewed changes

asymmetric merged commit 9a42e21 into master Apr 16, 2020

asymmetric deleted the zstd branch April 16, 2020 08:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use zstd instead of zip #398

Use zstd instead of zip #398

asymmetric commented Apr 13, 2020 •

edited

Loading

asymmetric Apr 13, 2020

asymmetric commented Apr 13, 2020 •

edited

Loading

asymmetric Apr 13, 2020

d-xo Apr 14, 2020

asymmetric Apr 14, 2020

asymmetric commented Apr 13, 2020

d-xo left a comment

d-xo commented Apr 14, 2020

asymmetric commented Apr 14, 2020

d-xo commented Apr 14, 2020

asymmetric commented Apr 14, 2020 •

edited

Loading

d-xo commented Apr 14, 2020 •

edited

Loading

asymmetric commented Apr 14, 2020

d-xo commented Apr 15, 2020

asymmetric commented Apr 15, 2020

asymmetric commented Apr 15, 2020

d-xo left a comment

Use zstd instead of zip #398

Use zstd instead of zip #398

Conversation

asymmetric commented Apr 13, 2020 • edited Loading

asymmetric Apr 13, 2020

Choose a reason for hiding this comment

asymmetric commented Apr 13, 2020 • edited Loading

asymmetric Apr 13, 2020

Choose a reason for hiding this comment

d-xo Apr 14, 2020

Choose a reason for hiding this comment

asymmetric Apr 14, 2020

Choose a reason for hiding this comment

asymmetric commented Apr 13, 2020

d-xo left a comment

Choose a reason for hiding this comment

d-xo commented Apr 14, 2020

asymmetric commented Apr 14, 2020

d-xo commented Apr 14, 2020

asymmetric commented Apr 14, 2020 • edited Loading

d-xo commented Apr 14, 2020 • edited Loading

asymmetric commented Apr 14, 2020

d-xo commented Apr 15, 2020

asymmetric commented Apr 15, 2020

asymmetric commented Apr 15, 2020

d-xo left a comment

Choose a reason for hiding this comment

asymmetric commented Apr 13, 2020 •

edited

Loading

asymmetric commented Apr 13, 2020 •

edited

Loading

asymmetric commented Apr 14, 2020 •

edited

Loading

d-xo commented Apr 14, 2020 •

edited

Loading