Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support Zstandard #519

Open
Kern-- opened this issue Mar 22, 2023 · 4 comments
Open

[FEATURE] Support Zstandard #519

Kern-- opened this issue Mar 22, 2023 · 4 comments
Labels
feature New feature or request tracking issue Issue that doesn't get worked on directly but tracks overall effort of multiple related issues

Comments

@Kern--
Copy link
Contributor

Kern-- commented Mar 22, 2023

Description

Zstandard is an alternative compression format to gzip that gets faster compression/decompression speeds for the same compression ratio. The containerd community is in the early phases of adoption, but we already have runtime support in containerd and build time support through buildkit.

Customers are reporting both smaller images and faster launches.

Describe the solution you'd like

We should consider implementing ZInfo for zstandard so that customer can get both the speedup benefit from zstandard and lazy loading.

Describe any alternative solutions/features you've considered

No response

Any additional context or information about the feature request

The OCI Image-spec added zstd as a layer media type suffix in 2019 opencontainers/image-spec#788
Containerd has supported running zstd images since 2020 containerd/containerd#4809

Blog posts about the speedups:
https://aws.amazon.com/blogs/containers/reducing-aws-fargate-startup-times-with-zstd-compressed-container-images/

@Kern-- Kern-- added the feature New feature or request label Mar 22, 2023
@github-project-automation github-project-automation bot moved this to ❓ Ungroomed in soci-snapshotter Mar 22, 2023
@Kern-- Kern-- moved this from ❓ Ungroomed to 📋 Backlog in soci-snapshotter Mar 22, 2023
@djdongjin djdongjin self-assigned this Mar 22, 2023
@djdongjin djdongjin removed their assignment Jun 2, 2023
@Kern-- Kern-- added the tracking issue Issue that doesn't get worked on directly but tracks overall effort of multiple related issues label Oct 26, 2023
@uhaiderdev
Copy link

+1 this would help to get benefits of both zstandard and lazy loading. Otherwise we have to let go one feature for the other.

@aochagavia
Copy link

Is there any fundamental limitation blocking zstd support? It's reasonably widespread nowadays and I assume the target audience of SOCI would very much welcome it (to get even shorter container boot times).

@Kern--
Copy link
Contributor Author

Kern-- commented Apr 9, 2024

The limitation is that zstandard needs a lot more state. Gzip uses a 32 KiB window, so in order to resume from the middle of the file, you only need the previous 32 KiB of uncompressed data. zstandard uses variable sized windows, but the RFC (https://datatracker.ietf.org/doc/html/rfc8878#name-window-descriptor) recommends implementations support at least 8MB windows, and up to 3.75 TB windows. Facebook's implementation supports up to 2GB windows https://engineering.fb.com/2018/12/19/core-infra/zstandard/. Right now we divide images into 4 MiB spans that can be independently decompressed. With 32 KiB/span, the compression state is < 1% of the image size. If we did the same for zstd and it used 8MB windows, the index would be 2x the compressed image size 🙃.

So we probably can't build a general purpose index for zstandard files that's smaller than the compressed file. I think we need to do analysis on container images in the wild to see what sort of window sizes are used - maybe they're small enough that SOCI could still be useful. Or maybe there's some way to compress the compression state to make this all work.

Alternatively, there is the zstd seekable format: https://github.com/facebook/zstd/blob/v1.5.6/contrib/seekable_format/zstd_seekable_compression_format.md. It works pretty much the same way as stargz (https://github.com/containerd/stargz-snapshotter/blob/main/docs/estargz.md) which means it either requires conversion or build time support.

Overall, zstandard is harder to index than gzip and isn't possible to do efficiently in all cases.

We're interested in hearing more use cases to help get this work prioritized. If can share any public zstd images that you're interested in, that would be helpful when we do feasibility analysis.

@aochagavia
Copy link

Thanks! Your comment gives a lot of clarity about the trade-offs involved here. I'll stick to the zstd-variant of eStargz for the time being and see how far that gets us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request tracking issue Issue that doesn't get worked on directly but tracks overall effort of multiple related issues
Projects
Status: 📋 Backlog
Development

No branches or pull requests

4 participants