Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy: do not fail if digest mismatches #1980

Merged
merged 8 commits into from
Dec 6, 2023

Conversation

giuseppe
Copy link
Member

@giuseppe giuseppe commented Jun 1, 2023

if the computed digest mismatches the expected one for a partial image. This is expected since partial images are stored by their TOC digest, not the diffID for the layer.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrestricted like this, the change would completely break the security of digest references, and signatures.

It’s reasonable to contemplate that in some cases, we need a non-digest layer identifier of some kind; currently, in some transports we need to compute a digest just to have a digest in a manifest and create a consumable ImageSource. But the assumption that layers are content-addressable and always have a digest is deeply and widely spread throughout the codebase, and we would need to both find a replacement for the current image integrity assumptions (or to decide that one isn’t necessary), and to carefully migrate code away from the existing content-addressability assumption.

@giuseppe
Copy link
Member Author

giuseppe commented Jun 5, 2023

Unrestricted like this, the change would completely break the security of digest references, and signatures.

It’s reasonable to contemplate that in some cases, we need a non-digest layer identifier of some kind; currently, in some transports we need to compute a digest just to have a digest in a manifest and create a consumable ImageSource. But the assumption that layers are content-addressable and always have a digest is deeply and widely spread throughout the codebase, and we would need to both find a replacement for the current image integrity assumptions (or to decide that one isn’t necessary), and to carefully migrate code away from the existing content-addressability assumption.

I've changed the implementation to use this behavior only for layers that were retrieved using the partial pull feature.

It needs this change in c/storage: containers/storage#1627 in particular the patch "store: store if the layer was created with a partial pull"

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most importantly, I’m much more worried about the pull path and layer deduplication. Let’s figure out how to do that first, and that will clarify the digests situation. Making the push path work should only come afterwards.

types/types.go Outdated Show resolved Hide resolved
@giuseppe
Copy link
Member Author

giuseppe commented Jun 5, 2023

Most importantly, I’m much more worried about the pull path and layer deduplication. Let’s figure out how to do that first, and that will clarify the digests situation. Making the push path work should only come afterwards.

the pull should not be a problem (just a little bit more expensive) since we will end up requesting the TOC for the layer but then all files are already present locally, so it won't be a problem.

If you refer to having the zstd:chunked version and the gzip version being deduplicated, I think we should not worry about this case, the long term goal is to replace the gzip format, so even if for now we end up in the worse case with two layers, that is fine.

@giuseppe giuseppe force-pushed the use-computed-digest branch 2 times, most recently from 9a87d2e to 851f840 Compare June 6, 2023 13:57
@giuseppe giuseppe force-pushed the use-computed-digest branch 4 times, most recently from 87f1a5d to ee10af7 Compare June 20, 2023 10:57
@giuseppe giuseppe marked this pull request as ready for review June 20, 2023 14:02
@giuseppe
Copy link
Member Author

@mtrmac what do you think of this approach?

@giuseppe giuseppe changed the title [RFC] copy: do not fail if digest mismatches copy: do not fail if digest mismatches Jun 20, 2023
@giuseppe giuseppe force-pushed the use-computed-digest branch 2 times, most recently from 3cd6552 to efc51d3 Compare July 4, 2023 15:43
@giuseppe
Copy link
Member Author

giuseppe commented Jul 4, 2023

if there are no big objections can we move forward with this? We can rework it again if necessary, but without this fix a partial layer is pulled every time we do a pull

@giuseppe
Copy link
Member Author

giuseppe commented Jul 4, 2023

@mtrmac @rhatdan @vrothberg PTAL

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM but I want @mtrmac to chime in. He's partially unavailable this week.

copy/blob.go Outdated Show resolved Hide resolved
@giuseppe giuseppe force-pushed the use-computed-digest branch 3 times, most recently from e503165 to d3e4c0b Compare July 24, 2023 10:08
@giuseppe giuseppe marked this pull request as draft July 25, 2023 10:49
@giuseppe giuseppe marked this pull request as ready for review July 25, 2023 12:24
@giuseppe
Copy link
Member Author

rebased

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Nit: the last commit message still points to the warning that has been dropped

copy/single.go Outdated Show resolved Hide resolved
copy/single.go Outdated Show resolved Hide resolved
copy/single.go Outdated Show resolved Hide resolved
storage/storage_dest.go Outdated Show resolved Hide resolved
storage/storage_dest.go Outdated Show resolved Hide resolved
@giuseppe giuseppe force-pushed the use-computed-digest branch 3 times, most recently from e600ada to cbea9be Compare December 1, 2023 21:11
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As partially a note to self, and to be clear there is progress:

Looking only at storage_src.go: overall approach and security LGTM.

storage/storage_src.go Outdated Show resolved Hide resolved
storage/storage_src.go Outdated Show resolved Hide resolved
storage/storage_src.go Outdated Show resolved Hide resolved
storage/storage_src.go Show resolved Hide resolved
storage/storage_src.go Outdated Show resolved Hide resolved
storage/storage_src.go Outdated Show resolved Hide resolved
Signed-off-by: Giuseppe Scrivano <[email protected]>
add a new field to TryReusingBlobOptions to store the digest of the
TOC for the layer.  It will be used by later commits to enable looking
up a partial layer.

Signed-off-by: Giuseppe Scrivano <[email protected]>
and generalize its naming to cover also the TOC digest, that is added
by a later commit.

Signed-off-by: Giuseppe Scrivano <[email protected]>
this is a preparation for the next commit to clarify that the map can
contain either the uncompressed digest for the layer (DiffID), or the
digest for the TOC of the layer.

Signed-off-by: Giuseppe Scrivano <[email protected]>
when an image using partial layers is pulled, generate a different
imageID that combines the digest of the TOC for the partial layers
since the partial pull didn't validate the diffID.

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe force-pushed the use-computed-digest branch 3 times, most recently from 0bd2e0c to 1032469 Compare December 5, 2023 21:23
when copying a partial image, store the expected diffID so that it can
be later used to validate the obtained layer stream.

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe marked this pull request as ready for review December 6, 2023 08:36
if err != nil {
return nil, fmt.Errorf("parsing expected diffID %q for layer %q: %w", expectedDigest, layerID, err)
}
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking nit: Returning early on this failure, and leaving the non-failure case less indented, would be a tiny bit more readable.

Comment on lines +314 to +315
// if the layer is stored by its TOC, report the expected diffID as the layer Digest
// but store the TOC digest so we can later retrieve it from the storage.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is no longer accurate now that we record digestToLayerID.

Comment on lines +121 to +122
// If the digest was overriden by LayerInfosForCopy, then we need to use the TOC digest
// to retrieve it from the storage.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs updating.

storage/storage_src.go Show resolved Hide resolved
storage/storage_src.go Show resolved Hide resolved
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation of storage_src.go LGTM.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another partial note, going to read storage_dest.go now.

copy/single.go Show resolved Hide resolved
manifest/oci.go Show resolved Hide resolved
copy/single.go Show resolved Hide resolved
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for containers/storage#1719 ; with that, I think there is a clean path for storageImageDestination to be unambiguous, see below about primarily keeping the state in a map[int]addedLayerInfo.

Admittedly it’s all a little vague/tentative at this point, do let me know if I’m missing something essential or if the attempt to sketch the idea is not clear enough to be understood.

storage/storage_dest.go Show resolved Hide resolved
storage/storage_dest.go Show resolved Hide resolved
storage/storage_dest.go Show resolved Hide resolved
storage/storage_dest.go Show resolved Hide resolved
storage/storage_dest.go Show resolved Hide resolved
storage/storage_dest.go Show resolved Hide resolved
storage/storage_dest.go Show resolved Hide resolved
@rhatdan
Copy link
Member

rhatdan commented Dec 6, 2023

As I understand it during the meeting last week, we came to the agreement that merging this PR plus fixups from @giuseppe going forward, is the easiest path forward. I will merge, and then we can continue working on small fixups.

@rhatdan rhatdan merged commit 28a299f into containers:main Dec 6, 2023
9 checks passed
atmosphere-ci bot pushed a commit to vexxhost/atmosphere that referenced this pull request Apr 21, 2024
…1114)

This PR contains the following updates:



Package
Change
Age
Adoption
Passing
Confidence




github.com/containers/image/v5
v5.29.2 -> v5.30.0








WarningSome dependencies could not be looked up. Check the warning logs for more information.


Release Notes

containers/image (github.com/containers/image/v5)
v5.30.0
Compare Source
What's Changed
A fair number of improvements when working with zstd and zstd:chunked-compressed images.
Note that make install now installs policy.json and registries.d/default.yaml.

Refuse compression to zstd when using schema1 by @​mtrmac in containers/image#2196
Don't expose local account details in oci-archive tar files by @​mtrmac in containers/image#2202
Trigger a conversion to OCI when compressing to Zstd by @​mtrmac in containers/image#2204
Add buildtags to avoid fulcio and rekor dependencies by @​siretart in containers/image#2180
copy: do not fail if digest mismatches by @​giuseppe in containers/image#1980
Moving policy.json and default.yaml from containers/skopeo by @​rahilarious in containers/image#2215
Embrace codespell: config, workflow (to alert when new typos added) and get typos fixed by @​yarikoptic in containers/image#2214
Fix raspberry pi zero cpu variant recognition by @​lstolcman in containers/image#2086
storage: validate images converted to zstd:chunked by @​giuseppe in containers/image#2243
Make blob reuse choices manifest-format-sensitive, and allow conversions when writing to format-agnostic transports by @​mtrmac in containers/image#2213
Edit the manifest when pushing uncompressed data from c/storage by @​mtrmac in containers/image#2273
Random storage-related cleanups by @​mtrmac in containers/image#2287
Improve storage transport documentation, primarily about locking by @​mtrmac in containers/image#2291
Fix c/storage destination with partial pulls by @​mtrmac in containers/image#2288
Fix manifest updates when we match a layer by TOC digest by @​mtrmac in containers/image#2294
Cleanly fail when trying to obtain a DiffID of a non-OCI image by @​mtrmac in containers/image#2295
Beautify TOC-related parts of storageImageSource by @​mtrmac in containers/image#2296
storage: use the new ApplyStagedLayer interface by @​giuseppe in containers/image#2301
Also annotate image instances using zstd:chunked as using zstd by @​mtrmac in containers/image#2302
Support editing ArtifactType, preserve it in lists by @​nalind in containers/image#2304
Provide data to correctly report throughput on partial pulls by @​mtrmac in containers/image#2308
Add validation error to digesting reader by @​saschagrunert in containers/image#2312
Fix handling of errors when fetching layers by URLs by @​mtrmac in containers/image#2310
Improve handling of zstd vs. zstd:chunked matching by @​mtrmac in containers/image#2317

New Contributors

@​rahilarious made their first contribution in containers/image#2215
@​yarikoptic made their first contribution in containers/image#2214
@​lstolcman made their first contribution in containers/image#2086
@​bainsy88 made their first contribution in containers/image#2260

Full Changelog: containers/[email protected]


Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.


 If you want to rebase/retry this PR, check this box


This PR has been generated by Mend Renovate. View repository job log here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants