Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

force copying an image #1854

Open
rittneje opened this issue Jan 13, 2023 · 16 comments
Open

force copying an image #1854

rittneje opened this issue Jan 13, 2023 · 16 comments
Labels
kind/feature A request for, or a PR adding, new functionality stale-issue

Comments

@rittneje
Copy link
Contributor

From what we've observed, skopeo copy is smart and won't actually copy anything if the same image already exists at the destination. While this is desirable 99% of the time, we currently have a need to forcibly copy over the image anyway. Is there any way to do this?

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 13, 2023

Thanks for reaching out.

What do you mean by forcibly copy, and what are you actually trying to do? Is the goal to avoid layer reuse and to copy exactly the original layers? To somehow re-upload data that is corrupt on the destination? Something else?

Note that at least with the reference registry implementation, re-upload does nothing to fix pre-preexisting corrupt data: https://github.com/distribution/distribution/blob/362910506bc213e9bfc3e3e8999e0cfc757d34ba/registry/storage/blobwriter.go#L310-L314 .

@rittneje
Copy link
Contributor Author

We are copying images to AWS ECR. ECR has a feature where it will scan images for vulnerabilities when you push them. However, it also has a misfeature where once the image is old enough, the scan result expires. According to AWS, once this happens, the only way to get it to scan again is to re-push the image. But since skopeo copy seems to be a no-op, we are stuck.

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 13, 2023

And that keys off of an upload of every individual layer, separately? The config? In Skopeo, redundant layer and config uploads are avoided, but manifests are always re-uploaded (except by skopeo sync, which assumes unchanged destinations).

Historically not avoiding the layer and config uploads was not optional because the reuse behavior seems unspecified, but on a second look that lack of specification might not actually be an issue — we assume automatic reuse for blobs that are being compressed already.

Does removing the blob info cache (see debug log for location) make a difference?

@rittneje
Copy link
Contributor Author

Unfortunately, AWS does not specify that level of detail. They just say you have to re-push. 😕

Is the blob info cache on the client side? We are pushing from within an ephemeral container, so it would not have any state preserved between calls to skopeo copy.

I might have to resort to skopeo delete + skopeo copy and hope for the best.

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 13, 2023

Yes, the blob info cache is client-side. It typically makes a difference when the push modifies data (e.g. when pushing a build output, and compressing it in the process); not when copying unmodified images around.

skopeo --debug copy should be fairly verbose about the HTTP requests it makes.

I’m open in principle to adding a tunable, but for that we need to understand what to tune.

@rittneje
Copy link
Contributor Author

@mtrmac I've asked AWS Support if they can provide more details on what is specifically required. I'll let you know what they say.

@rittneje
Copy link
Contributor Author

They say we need to re-push the "complete" image. I guess that means all the layers?

Also, I noticed something odd. I just ran skopeo copy to copy from the ECR image to itself for testing purposes, and it logged several lines of the form Copying blob dd1a79fb6ea3 skipped: already exists. However, when our build job copies from Artifactory to ECR, it instead logs Copying blob sha256:dd1a79fb6ea3e89d51a4e210777fdc20a6a65c5deb9226774e6b1ac94367c67b. But based on the build time it definitely seems like it is skipping. Do you know what could be causing this discrepancy? Also why is one log using the short digest and the other the full digest?

@mtrmac
Copy link
Collaborator

mtrmac commented Jan 16, 2023

Compare the --debug log.

@mtrmac mtrmac added the kind/feature A request for, or a PR adding, new functionality label Jan 20, 2023
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@MrFoxPro
Copy link

@mtrmac What if I need to push same blob but with different image or tag? It refuses: Copying blob 090d1abdb0e8 skipped: already exists

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 31, 2023

@MrFoxPro How does that make a difference? The blob exists on the registry, which is the effect you need, isn’t it?

If you want to push to a different tag, push to a different tag. This reuse just means that push is faster and uses less CPU, disk and network bandwidth.

Or is this also related to the AWS scanning trigger?

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@MarienL1995
Copy link

@rittneje , Did you manage to find a solution for your problem? We are running into the same issue when trying to enable AWS Inspector Enhanced scanning on older images.

@mtrmac
Copy link
Collaborator

mtrmac commented Oct 11, 2023

Is it know what exactly needs to happen to trigger the AWS behavior?

I don’t know whether we would want to solve this by adding an option to the transport / CLI, or by building a separate (simple, slow) “upload all blobs” tool — but the first step needs to be understanding what makes the difference.

@rittneje
Copy link
Contributor Author

@MarienL1995 We used the AWS CLI to delete the image (aws ecr batch-delete-image), and then skopeo to re-push it. That was enough to cause it to re-scan.

Copy link

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A request for, or a PR adding, new functionality stale-issue
Projects
None yet
Development

No branches or pull requests

4 participants