-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Steaming; PGO #19
Comments
I'd like to support streaming, but I'm not sure what the best way to do so is, other than the traditional zlib way. It needs more investigation. I haven't had much luck with profile-guided optimizations in the past. I just did a basic test of them with libdeflate, but they didn't make much of a difference. I already use branch prediction hints a lot, so it's possible that PGO is redundant where it would matter most. STOKE looks very interesting! |
Are you talking about the API or the internals? Assuming the former, zlib-style would make it very easy to port code from zlib to libdeflate… |
Internals, mostly. I am familiar with zlib's API. |
I thought you might have been considering an alternative to zlib's streaming API. For example, something more like zstd or density where you pass the input and output buffers as arguments instead of communicating through the stream struct, or callbacks for readers and writers like libzpaq, or libslz has some intriguing ideas to avoid unnecessary buffering… |
Yes, other options for the API should be considered too. |
It would be advantage in some scenarios to be able to decompress in smaller blocks; I use libdeflate in a PNG decoder and it would be more efficient to progressively decompress as the data becomes available (working set would fit into L1 very often). It's common for PNG file to store the compressed data in multiple IDAT chunks, typically 8192 byte blocks (other sizes are also used but this is easily the most common size). Now I have to do "multiple passes" over the data which keeps the cache colder + increases temporary memory usage. I am the kind of dude who does care about these things as the code is often deployed on 32 bit low-power devices, for desktop I don't care so much. :) |
On the bright side, even with colder caches the libdeflate defeats the miniz+zlib overhead so much that it's still a net win regardless. :) |
Hi, has there been any further consideration to this? I.e. is it on your roadmap? Btw, thanks heaps for making this fantastic library. Great to see the recent improvements for Apple Silicon. |
I don't plan to add streaming support. I think there's no easy way to do it, and it doesn't make too much sense to add it without going all the way and providing zlib API compatibility. I don't have time for that, though. This project is for "fun", so I focus on what I'm most interested in, which are the actual algorithms. I'm aware that zopfli (and ECT which uses a modified version of zopfli) can produce a slightly better compression ratio than libdeflate level 12 on many inputs, mainly because zopfli spends a lot more time doing block splitting. I haven't tried to match that exactly yet; instead the focus of libdeflate level 12 is near-optimal with much better performance. libdeflate v1.9 included improvements to block splitting that don't affect performance very much. |
Good answer! Perhaps you should close this issue? The thing I find interesting about ECT is it manages to achieve zopfli level compression but is actually incredibly fast. Like, it would fit perfectly on the end of libdeflate's performance curve as a hypothetical level 13 :) |
Hi Eric – libdeflate is hitting incredible performance numbers now. Are you still opposed to a streaming implementation? It would come in very handy, especially since the other zlib forks (Cloudflare's, zlib-ng, Intel's) don't seem to be usable on typical servers like nginx.
Also, have you tried profile guided optimization in gcc? If not, do you anticipate any wins there? libdeflate is already hitting incredible numbers, so maybe there isn't much headroom left, but I've been reading up on PGO in gcc 5/6 and was curious. I'll probably try it at some point. I've also discovered the source code annotations that gcc supports, but I doubt they would make much difference here. (The final frontier might be STOKE: https://github.com/StanfordPL/stoke)
The text was updated successfully, but these errors were encountered: