-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix c/storage destination with partial pulls #2288
Conversation
I am manually testing it, and it seems to work fine so far, I've found only one issue (caused by a recent change I've done in c/storage where a converted image returns the UncompressedDigest since it was fully validated instead of the TOCDigest), or do you think we should address it in c/storage and return both the TOCDigest and the UncompressedDigest (and then use from c/image the one we prefer): https://github.com/containers/storage/blob/d1bf4f0cf1d648fdd2e92f8841229a2da0bc3249/pkg/chunked/storage_linux.go#L1669-L1673? I've added this patch to fix a pull of a converted to zstd:chunked image: diff --git a/storage/storage_dest.go b/storage/storage_dest.go
index f73a3080..25fe0686 100644
--- a/storage/storage_dest.go
+++ b/storage/storage_dest.go
@@ -289,8 +289,8 @@ func (s *storageImageDestination) PutBlobPartial(ctx context.Context, chunkAcces
return private.UploadedBlob{}, err
}
- if out.TOCDigest == "" {
- return private.UploadedBlob{}, errors.New("TOC digest is empty")
+ if out.TOCDigest == "" && out.UncompressedDigest == "" {
+ return private.UploadedBlob{}, errors.New("digest is empty")
}
blobDigest := srcInfo.Digest
@@ -299,9 +299,12 @@ func (s *storageImageDestination) PutBlobPartial(ctx context.Context, chunkAcces
s.fileSizes[blobDigest] = 0
s.filenames[blobDigest] = ""
s.diffOutputs[blobDigest] = out
- if index != nil {
+ if index != nil && out.TOCDigest != "" {
s.indexToTocDigest[*index] = out.TOCDigest
}
+ if out.UncompressedDigest != "" {
+ s.blobDiffIDs[blobDigest] = out.UncompressedDigest
+ }
s.lock.Unlock()
return private.UploadedBlob{ |
9bc6a75
to
d8d3103
Compare
Thanks; that seems to be out of scope of this correctness cleanup, but mentioning it is immediately valuable in how it is affecting the design of this cleanup. I have also added a link to #2189 .
There’s actually a deeper design question here: Assume
both with the same image (in this order, or maybe in some other order), in the same store: should the two pulls share the layer (and result in the same My first instinct is to prefer the identification by Either way, probably c/storage should return both values, so that c/image can do the right thing for such mixed-identification layers. And possibly to reuse the just-created layer in the rare case that a single image contains the same layer twice — whether that reuse happens by DiffID or by TOC. |
296516c
to
fa76546
Compare
386c41d
to
7355ab8
Compare
7e56b36
to
141a84f
Compare
I've manually tested it and it seems to work fine. I've some zstd:chunked images ready under quay.io/giuseppe/zstd-chunked if you'd like to try it as well. Just make sure to add the following snippet to your
|
Signed-off-by: Giuseppe Scrivano <[email protected]>
This update introduces an enhancement in the blob handling mechanism, specifically by separating the TOC digest from the uncompressed/compressed digest. Follow-up for: containers#1080. Signed-off-by: Giuseppe Scrivano <[email protected]>
Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
ApplyDiffFromStagingDirectory Rename()s diffOutput.Target to move it into the destination layer, so the diffOutput is not reusable. That might mean that we re-pull (partially?) the same partial layer again... but the same layer used more than once in an image should be rare, or at least very small. Signed-off-by: Miloslav Trmač <[email protected]>
This will allow us to name the more obscure parameters, and to change their names/semantics without having to update the 4 trivial implementations. Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
... because it is always available, and this allows us to remove a condition. Also rename it to LayerIndex, and make the Cache option first, for consistency with other private.*Options types. Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
It has no users at all; and transports should not be in the business in specifically managing that value, compression/decompression/ updates belong in transport-independent code. Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
The caller must already have provided options.TOCDigest, so we don't really need to return a value; the UI only needs a boolean. Also, document, again, that the non-TOC digest is a mandatory field. Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
141a84f
to
c0e3748
Compare
... to follow Go conventions a bit more closely. Also add a comment about the general trust design of layer storage/lookup/reuse mechanisms. Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
- It is unnecessary: We now only set diffOutputs in PutBlobPartial, and that also sets indexToTOCDigest - It is incorrect: it is indexing by tocDigest, but it is (currently) set by compressedDigest So, right now, it can only hurt things. Signed-off-by: Miloslav Trmač <[email protected]>
- Take TOC digests into account even if we are converting from OCI to another format - Compute the image ID based on whether we _used_ the TOC, not whether it just exists. - Also don't forget to include the config digest in the ID computation... Signed-off-by: Miloslav Trmač <[email protected]>
- Fix incorrect uncompressedDigest parameter name; it is actually the manifest-originated probably-compressed digest - Rename the function to not suggest it returns a storage.Layer.ID value - Use the same parameter order as commitLayer Signed-off-by: Miloslav Trmač <[email protected]>
- If the first layer uses a TOC, don't use a non-hex string as the layer ID for c/storage - Fix a panic on "".Hex() if we trigger the fallback "layer never seen" path Signed-off-by: Miloslav Trmač <[email protected]>
... to separate the concerns a bit. Now we have the updates of indexToStorageID closer together. Signed-off-by: Miloslav Trmač <[email protected]>
Signed-off-by: Miloslav Trmač <[email protected]>
... and identify them using UncompressedDigest, not TOCDigest On pushes, also use the trusted UncompressedDigest if available instead of preferring the untrusted value when a TOC digest is present. Signed-off-by: Miloslav Trmač <[email protected]>
- Setting filenames to "" is clearly useless - It is unnecessary: We set diffOutputs, so filenames are not consumed in commitLayer - The data from diffOutputs can only be used once, so setting filenames to affect TryReusingBlob... doesn't help Signed-off-by: Miloslav Trmač <[email protected]>
… Layer If we get the layer using LayersByUncompressedDigest, that value should always match. Using the value we have directly is trivially faster, and more importantly we don't have to worry at all about Layer.UncompressedDigest being unset in that location, making maintenance easier. Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
- When we extract a layer, allow reusing it only by the DiffID, not by the compressed digest; we don't have the compressed data, and reusing by compressed digest would result (via PutLayer LayerOptions.OriginalDigest) in a layer with an compressed CompressedDigest value, but an uncompressed CompressedSize value. Reuse by DiffID is quite a bit less likely to lead to a match in TryReusingBlob, probably causing us to find the reused layer and having to extract it again. We could improve on this by recording more data; for now, let's just assume that images which reuse the same compressed layer twice are pretty rare, and prefer simpler code. - On the positive side, record the item in fileSizes, so that we actually do find the layer in TryReusing, and not happen to reuse the file purely by accident. Signed-off-by: Miloslav Trmač <[email protected]>
That will allow us to read more data out of it. Signed-off-by: Miloslav Trmač <[email protected]>
We don't need it for anything, so shorten the scope. Should not change behavior, diffID is actually set on that path anyway. Signed-off-by: Miloslav Trmač <[email protected]>
c0e3748
to
e26dae2
Compare
- Ensure layers have an ID on every path before commitLayer, and it is consistently set before making data available. - Remove possibly misleading "for completeness" comments, ensuring identification is a basic responsibility. Signed-off-by: Miloslav Trmač <[email protected]>
Should not change behavior. Signed-off-by: Miloslav Trmač <[email protected]>
60a55d8
to
cef6c3f
Compare
When reusing contents from another layer, don't set - CompressedDigest = compressed digest (provided by us) - CompressedSize = uncompressed size (computed by PutLayer) because it is inconsistent. A c/storage API extension would be required to do that. Signed-off-by: Miloslav Trmač <[email protected]>
cef6c3f
to
a1f60f7
Compare
Look up the layer by TOC; and don't abort when diffID is not set. We could, instead, look up the layers only in tryReusingBlobAsPending, and record the layer metadata at that time. That would be simpler, but it would also widen a race with concurrent image pulls/deletions: the current code can find one layer (ChainID) to reuse, and when that layer is deleted, it can find some other layer (ChainID) to actually consume. The time between tryReusingBlobAsPending and createNewLayer can be fairly significant, so opening a ~deterministic race singificantly more might lead to reproducible issues. Even if anyone encountering such issues has fundamental workflow problems that should be fixed; it is our tools that would look bad first. Signed-off-by: Miloslav Trmač <[email protected]>
a1f60f7
to
e7a8eba
Compare
@giuseppe Now ready for review. PTAL. The FIXME? comments could be improved with containers/storage#1830 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…1114) This PR contains the following updates: Package Change Age Adoption Passing Confidence github.com/containers/image/v5 v5.29.2 -> v5.30.0 WarningSome dependencies could not be looked up. Check the warning logs for more information. Release Notes containers/image (github.com/containers/image/v5) v5.30.0 Compare Source What's Changed A fair number of improvements when working with zstd and zstd:chunked-compressed images. Note that make install now installs policy.json and registries.d/default.yaml. Refuse compression to zstd when using schema1 by @mtrmac in containers/image#2196 Don't expose local account details in oci-archive tar files by @mtrmac in containers/image#2202 Trigger a conversion to OCI when compressing to Zstd by @mtrmac in containers/image#2204 Add buildtags to avoid fulcio and rekor dependencies by @siretart in containers/image#2180 copy: do not fail if digest mismatches by @giuseppe in containers/image#1980 Moving policy.json and default.yaml from containers/skopeo by @rahilarious in containers/image#2215 Embrace codespell: config, workflow (to alert when new typos added) and get typos fixed by @yarikoptic in containers/image#2214 Fix raspberry pi zero cpu variant recognition by @lstolcman in containers/image#2086 storage: validate images converted to zstd:chunked by @giuseppe in containers/image#2243 Make blob reuse choices manifest-format-sensitive, and allow conversions when writing to format-agnostic transports by @mtrmac in containers/image#2213 Edit the manifest when pushing uncompressed data from c/storage by @mtrmac in containers/image#2273 Random storage-related cleanups by @mtrmac in containers/image#2287 Improve storage transport documentation, primarily about locking by @mtrmac in containers/image#2291 Fix c/storage destination with partial pulls by @mtrmac in containers/image#2288 Fix manifest updates when we match a layer by TOC digest by @mtrmac in containers/image#2294 Cleanly fail when trying to obtain a DiffID of a non-OCI image by @mtrmac in containers/image#2295 Beautify TOC-related parts of storageImageSource by @mtrmac in containers/image#2296 storage: use the new ApplyStagedLayer interface by @giuseppe in containers/image#2301 Also annotate image instances using zstd:chunked as using zstd by @mtrmac in containers/image#2302 Support editing ArtifactType, preserve it in lists by @nalind in containers/image#2304 Provide data to correctly report throughput on partial pulls by @mtrmac in containers/image#2308 Add validation error to digesting reader by @saschagrunert in containers/image#2312 Fix handling of errors when fetching layers by URLs by @mtrmac in containers/image#2310 Improve handling of zstd vs. zstd:chunked matching by @mtrmac in containers/image#2317 New Contributors @rahilarious made their first contribution in containers/image#2215 @yarikoptic made their first contribution in containers/image#2214 @lstolcman made their first contribution in containers/image#2086 @bainsy88 made their first contribution in containers/image#2260 Full Changelog: containers/[email protected] Configuration 📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied. ♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 Ignore: Close this PR and you won't be reminded about this update again. If you want to rebase/retry this PR, check this box This PR has been generated by Mend Renovate. View repository job log here.
This is #2218 rebased on top of #2287, + my WIP work fixing issues as they come up.
Absolutely untested (do we have relevant CI in any project?).
See individual commits for details.
@giuseppe FWIW .