-
Notifications
You must be signed in to change notification settings - Fork 36
Conversation
Since we might want to change the compression method.
See find's manual.
"bin_runtime.k" | ||
\( -name "*$spec_hash*" \ | ||
-o -name "*$spec_name*" \ | ||
-o -name config.tmp.json \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this from config.json
to config.tmp.json
, because the former is not in $KLAB_OUT
while the latter is, and this made the code cleaner (since I only have to search in one dir).
AFAICT the former is a subset of the latter, so everything should be fine.
@mhhf true story?
Wins at benchmarks.
I tested this manually by doing
|
-o -name bin_runtime.k \) \ | ||
-a \! -name "*.tar" \ | ||
-a \! -name "*.zip" \ | ||
-exec tar --create --zstd --file "./log/$spec_hash.tar" {} + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we could pass the --threads=N
argument to zstd, to control how many threads are used for compression, in case we deemed it necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have a rough idea of what kind of speedup we could get from this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the test archive on buildbot, using -T0
, 29 vs 33 seconds, so negligible. Maybe there are other bottlenecks.
This is backwards-incompatible as is. Previously generated zip files are not fetchable anymore. Not sure if we should deal with that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general lgtm. One small question:
I remember previously @mhhf mentioning in chat that zipping was additive, and so it should be pretty easy to start running klab zip
every X minutes during prove-all
to allow fetching of running proofs. Is this also the case for zstd
?
It seems like it would be fairly easy to add some conditional logic in |
No, it's not the case with tar, as it doesn't allow adding files to compressed tarballs ex post. I'm still not sure if that solution you mention above is the best though - I think it would make sense to have logic that, when receiving the HTTP |
Not really sure if this issue is the best place to bikeshed this potential feature, but I'm not sure how I feel about adding a long running server process to our CI infrastructure, feels like quite a lot of moving parts. |
Thanks for bringing this up in this issue, as it's relevant. The way I see it, we wouldn't be adding any moving parts. The long-running process is already there, it's nginx. When a request arrives at a certain endpoint, it starts a (Lua or njs) script that does the tarballing and returns the tarball. |
Didn't know that you could use nginx like that. I like it. It should even allow us to I guess there is also the complication of having different branches / projects potentially using different versions of klab, which could cause some issues if we modify the proof serialization format between klab versions. |
@xwvvvvwx yes, we could do that if needed, it's true. Do we want to? |
On balance I think I'm fine with leaving it out and adding it back if it starts to become annoying. |
I circled back on this, I think it would be much nicer if it would transparently work with zip files without users having to dig back old versions of klab. So I'll implement that in this PR. |
OK, should be done. |
Or actually, anything tar can handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one trivial nit. I'm fine to merge either way. lgtm 💖
Logs can sometimes be Gigabytes large, so transferring and decompressing them
can take a while.
zstd is very fast at all operations (de/compression, extraction) out of the box.
It's possible there are parameters we can tweak for our specific usecase, but I
haven't looked into that.
For the simple
Vat_subui_pass_rough
proof inexamples/heal
it's 40KB using zstd vs 3.8MB using zip.On a 1.1GB zipped proof file I found on buildbot, unzip needed around 3 minutes.
zstd compressed the contents of the same file to 30MB (!!), and decompressed it in 1m36 seconds.