Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load docker desktop containerd managed images to cluster #3795

Open
iamvinov-atlassian opened this issue Nov 20, 2024 · 33 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/external upstream bugs

Comments

@iamvinov-atlassian
Copy link

What happened:
I am attempting to load Docker images into kind cluster: kind load docker-image busybox -n nebulae but getting the following error:
❯ kind load docker-image busybox -n nebulae Image: "busybox" with ID "sha256:5b0f33c83a97f5f7d12698df6732098b0cdb860d377f6307b68efe2c6821296f" not yet present on node "nebulae-control-plane", loading... ERROR: failed to load image: command "docker exec --privileged -i nebulae-control-plane ctr --namespace=k8s.io images import --all-platforms --digests --snapshotter=overlayfs -" failed with error: exit status 1 Command Output: ctr: content digest sha256:83e82a8dd385e27d95f2118c1332d414684aa665552f7f837f86da33674308c4: not found

What you expected to happen:
I expected the image to load successfully. I already have this image pulled locally using docker pull busybox. Upon further investigation, it seems to me that kind or (containerd) expects the image for all platforms to be present on the host for the load command to succeed.

How to reproduce it (as minimally and precisely as possible):

docker pull busybox
kind create cluster --name nebulae
kind load -v 10 docker-image -n nebulae busybox

Anything else we need to know?:
From looking at other answers on the internet, it seems generally this error occurs when the image arch doesn't match the host arch. But this is not the case. I did perform docker images --tree and made sure the images match my host (M3 MacBook Pro) OS.

busybox:latest                                                                         5b0f33c83a97       12.6MB            4MB
├─ linux/arm64/v8                                                                      6ca1ac3927a1       6.02MB         1.85MB
├─ linux/amd64                                                                         a3e1b257b47c       6.56MB         2.16MB
├─ linux/arm/v5                                                                        3076001161ce           0B             0B
├─ linux/arm/v6                                                                        a9fc789b4096           0B             0B
├─ linux/arm/v7                                                                        fb632082f5cb           0B             0B
├─ linux/386                                                                           c0d2f0e7a91f           0B             0B
├─ linux/mips64le                                                                      0e1d386b0b5d           0B             0B
├─ linux/ppc64le                                                                       fc082c5fdd21           0B             0B
├─ linux/riscv64                                                                       d55b3027f77f           0B             0B
└─ linux/s390x                                                                         4bc8b19fe938           0B             0B

Environment:

  • kind version: kind v0.25.0 go1.23.3 darwin/arm64
  • Runtime info: (use docker info, podman info or nerdctl info):
❯ docker info
Client:
 Version:    27.3.1
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  ai: Ask Gordon - Docker Agent (Docker Inc.)
    Version:  v0.1.0
    Path:     /Users/vvelu/.docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.18.0-desktop.2
    Path:     /Users/vvelu/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.30.3-desktop.1
    Path:     /Users/vvelu/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.37
    Path:     /Users/vvelu/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15
    Path:     /Users/vvelu/.docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/vvelu/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.27
    Path:     /Users/vvelu/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.5
    Path:     /Users/vvelu/.docker/cli-plugins/docker-feedback
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.4.0
    Path:     /Users/vvelu/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/vvelu/.docker/cli-plugins/docker-sbom
  scout: Docker Scout (Docker Inc.)
    Version:  v1.15.0
    Path:     /Users/vvelu/.docker/cli-plugins/docker-scout

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 21
 Server Version: 27.3.1
 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 472731909fa34bd7bc9c087e4c27943f9835f111
 runc version: v1.1.13-0-g58aa920
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.10.14-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 12
 Total Memory: 7.653GiB
 Name: docker-desktop
 ID: 794edb33-e6f7-4749-8c5c-edf7b3d5cf21
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=unix:///Users/vvelu/Library/Containers/com.docker.docker/Data/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile
  • OS: MacOS Sonoma v14.7
  • Kubernetes version: (use kubectl version):
❯ kubectl version
Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.31.2
  • Any proxies or other special environment settings?:
@iamvinov-atlassian iamvinov-atlassian added the kind/bug Categorizes issue or PR as related to a bug. label Nov 20, 2024
@iamvinov-atlassian
Copy link
Author

More information. Unticking: "Use containerd for pulling and storing images" in Docker Desktop actually resolves this.
Screenshot 2024-11-20 at 14 29 21

@BenTheElder
Copy link
Member

That error message is coming from ctr when we ask it to import the image saved from docker.

Unfortunately I can't run docker desktop at work, will have to find another way to reproduce this.

Can you look at the same image exported the way kind does with docker save,

commandArgs := append([]string{"save", "-o", dest}, images...)

or provide a tarball from that somewhere? That would speed things up (can replicate the rest with kind load image-archive)

this sounds like a containerd/docker bug but we need to confirm how before contacting them. The part kind is doing could be bugged but is pretty straightforward once we decide we need to load the image because it's not already available

@BenTheElder BenTheElder changed the title Unable to load images to kind cluster Unable to load docker desktop containerd managed images to cluster Nov 20, 2024
@BenTheElder
Copy link
Member

Could be a small sample image like busybox if you can confirm the bug still applies to that image and share the containerd vs dockerd mode versions that would speed things up. Otherwise it may be difficult to reproduce due to the licensing of the application and/or my employer's policies, I'll have to see if this is something I can replicate in some other way.

@porridge
Copy link

I'm hitting what seems to be the same issue in kuttl's integration tests. Kuttl embeds kind, currently v0.25.0.

Interestingly this works on CI (GHA, ubuntu 20.04 runner) but on my desktop this fails with the same message as for @iamvinov-atlassian

What I've been able to figure out using skopeo inspect --raw docker://docker.io/library/busybox:latest|jq . and docker image save docker.io/library/busybox:latest is that:

  • the digest that ctr complains about is claimed to be an attestation-manifest for the linux/amd64 manifest
  • the busybox' docker image mentions it in the index image, but the blob itself is nowhere to be found

So it seems like from the PoV of ctr the image is incomplete since it's lacking the attestation blob. FWIW, here is how the integration test fetches and loads the image.

I'm running:

[kuttl]$ docker info
Client:
 Version:    27.3.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.13.0
    Path:     /home/mowsiany/.docker/cli-plugins/docker-buildx

Server:
 Containers: 20
  Running: 0
  Paused: 0
  Stopped: 20
 Images: 3
 Server Version: 27.3.1
 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: /usr/bin/tini-static
 containerd version: 2.fc41
 runc version: 
 init version: 
 Security Options:
  seccomp
   Profile: builtin
  selinux
  cgroupns
 Kernel Version: 6.11.7-300.fc41.x86_64
 Operating System: Fedora Linux 41 (Workstation Edition)
 OSType: linux
 Architecture: x86_64
 CPUs: 20
 Total Memory: 62.5GiB
 Name: mowsiany-thinkpadp1gen5.remote.csb
 ID: e8f36c79-610a-4647-8cc3-b734cebd7050
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: porridgerox
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

@porridge
Copy link

@BenTheElder you might be able to reproduce this with:

git clone https://github.com/kudobuilder/kuttl
cd kuttl
make envtest
KUBEBUILDER_ASSETS=$(./bin/setup-envtest use 1.25.0 --bin-dir `pwd`/bin -p path) go test -tags integration ./pkg/test -v -mod=readonly -test.run TestAddContainers

Unfortunately the actual error is hidden as sigs.k8s.io/kind/pkg/cluster/nodeutils.LoadImageArchive is missing .SetStdout(os.Stdout).SetStderr(os.Stderr) at least in the version we use.

@dgl
Copy link
Contributor

dgl commented Dec 3, 2024

We came across similar in our environment, we're not using docker desktop but have configured Docker to use containerd as the image store per these docs.

I'm not sure if this is exactly the same as what @porridge reports, as it doesn't involve an attestation blob. However I can simply reproduce this with:

$ docker save -o nginx.tar nginx:1.27.0
$ sudo ctr images import nginx.tar
ctr: content digest sha256:87c2c53ae6565cc48341389169745670320a22d39014ce861661e986e983342c: not found
Versions:
$ docker version
Client: Docker Engine - Community
 Version:           27.3.1
 API version:       1.47
 Go version:        go1.22.7
 Git commit:        ce12230
 Built:             Fri Sep 20 11:40:59 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.3.1
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.22.7
  Git commit:       41ca978
  Built:            Fri Sep 20 11:40:59 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.24
  GitCommit:        88bf19b2105c8b17560993bee28a01ddc2f97182
 runc:
  Version:          1.2.2
  GitCommit:        v1.2.2-0-g7cb3632
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

$ sudo ctr version
Client:
  Version:  1.7.24
  Revision: 88bf19b2105c8b17560993bee28a01ddc2f97182
  Go version: go1.22.9

Server:
  Version:  1.7.24
  Revision: 88bf19b2105c8b17560993bee28a01ddc2f97182
  UUID: f0862135-af1a-494b-b111-192071709ee5

Note this does need Docker that has multi-platform support, so I don't see this on Docker 24 on another system. I've not checked all versions but definitely it happens with Docker 27 on the host.

For the nginx image this happens because there's a reference to something for platform 386, which in some cases matches amd64 (containerd's platform matching code).

Therefore a workaround is:

$ sudo ctr images import --platform amd64 nginx.tar
unpacking docker.io/library/nginx:1.27.0 (sha256:98f8ec75657d21b924fe4f69b6b9bff2f6550ea48838af479d8894a852000e40)...done

Obviously that's not ideal, because #2957 wanted --all-platforms (and note it happens even if --platform isn't specified, as the default is a mix of amd64/386). However I happened to notice ctr from containerd 2.0 doesn't exhibit this behaviour and bisected it to: containerd/containerd@eb123db

So that means --local=false works as a partial workaround and indeed that does work on the version of containerd kind is currently using. If anyone is experiencing this I made #3805 as a potential workaround -- that's not ready to be merged, in particular I haven't got a way to test this on Docker Desktop, but testing would be useful.

@BenTheElder
Copy link
Member

Do we know why this works? Is it potentially fetching references from a remote (and therefore broken airgapped)?

@dgl
Copy link
Contributor

dgl commented Dec 5, 2024

@BenTheElder It's not fetching from a remote --local=false means it is using the containerd transfer service, which is a different implementation (more is done inside containerd than inside ctr).

cc @AkihiroSuda -- could you help decide if this is a containerd bug or a docker issue? i.e. is docker generating a bad OCI image or is containerd mishandling it? In order to reproduce this all that is needed is a Docker instance configured to use containerd (CE edition is fine), then run docker save and try to import that image to containerd using ctr (steps in this comment).

There are three potential issues here:

  1. Docker sets containerd.WithSkipMissing but ctr doesn't.
  2. The transfer service code path doesn't actually check skip missing, but also doesn't error on missing references in some cases (when there is a mixture of platforms like amd64 / 386). I haven't followed the full code flow but I did notice this todo in containerd which is maybe related.
  3. Docker generates the exported image with OnlyStrict, Containerd imports the image with Only (I suspect there's other places, I think that's the one that applies for transfer service).

(2 is why --local=false can workaround this, I think.)

@AkihiroSuda
Copy link
Member

The transfer service code path doesn't actually check skip missing

Seems to be a bug of containerd.
Any image that can be imported with non-transfer API should be still importable with the transfer API.

@BenTheElder
Copy link
Member

We should circle back to see if it's still a bug after upgrading, I am working on shipping v1.7.24 (current 1.7.x)

@BenTheElder
Copy link
Member

At HEAD the default node image is on Kubernetes 1.32.0 + containerd 1.7.24, there are a lot of changes in containerd and I wonder if any of them fixed the bug.

#3768 tracks containerd 2.0, that one is a bigger can of worms for our users and might be a bit, but also AIUI the fix isn't necessarily in 2.0 if not in 1.7, it's just that this particular issue can be avoided by using the transfer service, but we should also report a bug to containerd if it's still present in current releases without using the transfer service.

@porridge
Copy link

porridge commented Dec 16, 2024

We should circle back to see if it's still a bug after upgrading, I am working on shipping v1.7.24 (current 1.7.x)

@BenTheElder do I need to upgrade anything on my workstation to test whether the issue is fixed in my case? Or should it be enough to bump the version of kind to current HEAD? 🤔

Because just using the current kind snapshot is not helping:

[kuttl]$ vi go.mod 
[kuttl]$ go mod tidy
go: downloading sigs.k8s.io/controller-runtime v0.19.3
go: downloading github.com/docker/docker v27.4.0+incompatible
go: downloading sigs.k8s.io/kind v0.26.0-alpha.0.20241213223025-771fb17acbc3
go: downloading golang.org/x/exp v0.0.0-20230515195305-f3d0a9c9a5cc
[kuttl]$ git diff go.mod
diff --git a/go.mod b/go.mod
index 043cc93..fc6f434 100644
--- a/go.mod
+++ b/go.mod
@@ -21,7 +21,7 @@ require (
        k8s.io/code-generator v0.31.3
        sigs.k8s.io/controller-runtime v0.19.3
        sigs.k8s.io/controller-tools v0.16.5
-       sigs.k8s.io/kind v0.25.0
+       sigs.k8s.io/kind v0.26.0-alpha.0.20241213223025-771fb17acbc3
 )
 
 require (
[kuttl]$ make envtest
mkdir -p /home/mowsiany/tmp/20241216-kind-new-containerd-Pgp/kuttl/bin
test -s /home/mowsiany/tmp/20241216-kind-new-containerd-Pgp/kuttl/bin/setup-envtest || GOBIN=/home/mowsiany/tmp/20241216-kind-new-containerd-Pgp/kuttl/bin go install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest
go: downloading sigs.k8s.io/controller-runtime/tools/setup-envtest v0.0.0-20241206182001-aea2e32a9365
go: sigs.k8s.io/controller-runtime/tools/[email protected] requires go >= 1.23.0; switching to go1.23.4
go: downloading go1.23.4 (linux/amd64)
[kuttl]$ KUBEBUILDER_ASSETS=$(./bin/setup-envtest use 1.25.0 --bin-dir `pwd`/bin -p path) go test -tags integration ./pkg/test -v -mod=readonly -test.run TestAddContainers
=== RUN   TestAddContainers
    kind_integration_test.go:66: {"status":"Pulling from library/busybox","id":"latest"}
    kind_integration_test.go:66: {"status":"Digest: sha256:2919d0172f7524b2d8df9e50066a682669e6d170ac0f6a49676d54358fe970b5"}
    kind_integration_test.go:66: {"status":"Status: Image is up to date for busybox:latest"}
    kind.go:69: Adding Containers to KIND...
    kind.go:78: Add image docker.io/library/busybox:latest to node test-control-plane
ctr: content digest sha256:4c6b3915ceab750f69555510444e80541e4c72e23130c748c6ce3315f603015e: not found
    kind_integration_test.go:74: failed to add container to KIND cluster: failed to load image: command "docker exec --privileged -i test-control-plane ctr --namespace=k8s.io images import --all-platforms --digests --snapshotter=overlayfs -" failed with error: exit status 1
    kind_integration_test.go:89: failed to find image docker.io/library/busybox:latest on node test-control-plane
--- FAIL: TestAddContainers (13.65s)
FAIL
FAIL	github.com/kudobuilder/kuttl/pkg/test	17.084s
FAIL

@BenTheElder
Copy link
Member

@BenTheElder do I need to upgrade anything on my workstation to test whether the issue is fixed in my case? Or should it be enough to bump the version of kind to current HEAD? 🤔

It's the node image at HEAD, so if you use the default image then bumping to HEAD would test it, but if you're setting the node image to use then you'd have to change that.

Thanks for testing.

@BenTheElder
Copy link
Member

xref #3828 (comment)

@BenTheElder
Copy link
Member

FWIW we will be moving to the transfer API hopefully (since we prefer to use defaults and plan to upgrade to containerd 2.0), pending a fix to ctr import --all-platforms. It seems this may resolve the problem.

We should probably still report a bug with details to containerd. #3795 (comment)

I haven't had time to locally reproduce and I'd appreciate if one of you would file with containerd, thanks!

@porridge
Copy link

I'd appreciate if one of you would file with containerd, thanks!

FTR Unfortunately I don't really understand the pieces involved so I'm not able to formulate a report in containerd terms. And I'm a bit swamped with other work so no time to learn this.

@BenTheElder
Copy link
Member

We're working on upgrading to containerd 2.0.2 (after a 2.x fix related to image import landed), which uses the transfer service by default and I think may fix this.

@BenTheElder
Copy link
Member

In the meantime if anyone hits this: recommend opting out of containerd managed images at least until the containerd 2.x upgrade is available, though you could instead try #3805 (which should be obviated by the 2.x upgrade)

@BenTheElder
Copy link
Member

At HEAD / the latest commit we're using containers 2.x with the default import transport. Can someone with this issue test it? You can checkout the repo and run make build or make install to get a binary.

@porridge
Copy link

At HEAD / the latest commit we're using containers 2.x with the default import transport. Can someone with this issue test it? You can checkout the repo and run make build or make install to get a binary.

@BenTheElder this didn't seem to help.

  • using these repro instructions and
  • assuming that bumping the sigs.k8s.io/kind dependency to main and running go mod tidy will ensure that a resulting kind cluster uses the updated kind image.

@BenTheElder
Copy link
Member

assuming that bumping the sigs.k8s.io/kind dependency to main and running go mod tidy will ensure that a resulting kind cluster uses the updated kind image.

Yes, the default node image was updated at HEAD, so as long as you're using the default it should have containerd 2.x (this should also be visible in the node info with something like kubectl get no -o yaml)

That's unfortunate, FWIW when we import an image, aside from detecting if the image needs importing, we're just asking docker to save it, and containerd to import the saved image. So the implementations are primarily in those projects.

I would expect docker save to NOT start generating images with missing references, and IMHO that seems like a bug in docker. The whole point is to save to disk.

@BenTheElder
Copy link
Member

NOTE: I cannot directly repro myself due to the licensing changes https://www.theregister.com/2021/08/31/docker_desktop_no_longer_free/ (not permitted at my employer)

So I will need help identifying a fix.
Or if someone can make an affected image available via some other means (maybe an uploaded tarball?) I might be able to look at some point.

That said, I think we'll most likely require a patch in containerd or docker desktop, and at most just updating containerd in kind to a patched release.

@porridge
Copy link

porridge commented Feb 5, 2025

@BenTheElder I don't think you need docker desktop to reproduce this. 🤔

I'm facing it using the moby engine provided by Fedora 41, which I think is rather strict w.r.t. licensing:

$ rpm -qf /usr/bin/dockerd
moby-engine-27.3.1-2.fc41.x86_64
$ dnf info moby-engine
Updating and loading repositories:
Repositories loaded.
Installed packages
Name            : moby-engine
Epoch           : 0
Version         : 27.3.1
Release         : 2.fc41
Architecture    : x86_64
Installed size  : 102.2 MiB
Source          : moby-engine-27.3.1-2.fc41.src.rpm
From repository : fedora
Summary         : The open-source application container engine
URL             : https://github.com/moby/moby
License         : Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND ISC AND MIT AND MPL-2.0 AND (Apache-2.0 OR GPL-2.0-or-later)
Description     : Docker is an open source project to build, ship and run any application as a
                : lightweight container.
                : 
                : Docker containers are both hardware-agnostic and platform-agnostic. This means
                : they can run anywhere, from your laptop to the largest EC2 compute instance and
                : everything in between — and they do not require you to use a particular
                : language, framework or packaging system. That makes them great building blocks
                : for deploying and scaling web apps, databases, and backend services without
                : depending on a particular stack or provider.
Vendor          : Fedora Project
$ lsb_release -a
LSB Version:	n/a
Distributor ID:	Fedora
Description:	Fedora Linux 41 (Workstation Edition)
Release:	41
Codename:	n/a

Should be possible to do it in a VM...

@BenTheElder
Copy link
Member

ACK, I'll try to get to this ...

In the meantime if anyone has time, we can ignore kind and focus on docker save ... | ctr images import -, I'm pretty sure we'll find that this is broken w/o kind and we'll find that we need to patch one of these.

@porridge
Copy link

porridge commented Feb 6, 2025

@BenTheElder I tried the following and I think I succeeded with reproducing it in the end:

$ docker image save docker.io/library/busybox:latest > busybox-latest.img
$ file busybox-latest.img 
busybox-latest.img: POSIX tar archive
$ make -C ../../containerd/containerd bin/ctr # (after having checked out v2.0.2 there)
$ sudo ../../containerd/containerd/bin/ctr --namespace=k8s.io images import --all-platforms --digests - < busybox-latest.img
ctr: rpc error: code = NotFound desc = content digest sha256:4c6b3915ceab750f69555510444e80541e4c72e23130c748c6ce3315f603015e: not found
$ 

@porridge
Copy link

porridge commented Feb 6, 2025

containerd/containerd#11344

@BenTheElder
Copy link
Member

BenTheElder commented Feb 6, 2025

Thank you! ... subscribing

@BenTheElder BenTheElder added the kind/external upstream bugs label Feb 6, 2025
@lucacome
Copy link

I'm having a similar issue (I think) since kindest/node:v1.32.1

❯ kind load -v 1 docker-image nginx
Image: "nginx" with ID "sha256:9b1b7be1ffa607d40d545607d3fdf441f08553468adec5588fb58499ad77fe58" not yet present on node "kind-control-plane", loading...
ERROR: failed to detect containerd snapshotter
Stack Trace:
sigs.k8s.io/kind/pkg/errors.New
	sigs.k8s.io/kind/pkg/errors/errors.go:28
sigs.k8s.io/kind/pkg/cluster/nodeutils.parseSnapshotter
	sigs.k8s.io/kind/pkg/cluster/nodeutils/util.go:107
sigs.k8s.io/kind/pkg/cluster/nodeutils.getSnapshotter
	sigs.k8s.io/kind/pkg/cluster/nodeutils/util.go:97
sigs.k8s.io/kind/pkg/cluster/nodeutils.LoadImageArchive
	sigs.k8s.io/kind/pkg/cluster/nodeutils/util.go:81
sigs.k8s.io/kind/pkg/cmd/kind/load/docker-image.loadImage
	sigs.k8s.io/kind/pkg/cmd/kind/load/docker-image/docker-image.go:205
sigs.k8s.io/kind/pkg/cmd/kind/load/docker-image.runE.func1
	sigs.k8s.io/kind/pkg/cmd/kind/load/docker-image/docker-image.go:190
sigs.k8s.io/kind/pkg/errors.UntilErrorConcurrent.func1
	sigs.k8s.io/kind/pkg/errors/concurrent.go:30
runtime.goexit
	runtime/asm_arm64.s:1223

Everything works fine in v1.32.0

@BenTheElder
Copy link
Member

BenTheElder commented Feb 10, 2025

@lucacome that's something else, you're using an unsupported image, please see the docs about images (and the release notes) as well as the recent issue #3853

(basically we're mid transition to containerd 2.0, there's some issues with it in testing kubernetes discovered since that are pending a patch upstream, and I have limited bandwidth at the moment so we haven't actually released kind for that image yet, it will work at HEAD with the latest sources)

@lucacome
Copy link

Sorry, I didn't think to search the closed issues.

I actually didn't know there could be compatibility issues with images. We've always updated to the latest image available for years, and fortunately this is the first time we run into any problems, but we probably have to rethink this.

Thanks!

@BenTheElder
Copy link
Member

I actually didn't know there could be compatibility issues with images. We've always updated to the latest image available for years, and fortunately this is the first time we run into any problems, but we probably have to rethink this.

We try very hard not to break things, but sometimes we have to deal with ecosystem changes (in this case containerd 1.x => 2.x has some changes) and it becomes unreasonable to guarantee no issues, we might need to reiterate this more strongly in https://kind.sigs.k8s.io/docs/user/quick-start/#creating-a-cluster:~:text=This%20will%20bootstrap,a%20kind%20release. in addition to the release notes warnings.

(Similarly, Kubernetes makes breaking changes to bootstrapping, so while kind tries to isolate from that, we can't always guarantee that a new image with a version that didn't exist at the time of the kind release still works without changes in kind, it's just been unusually stable for the past year or two, there are more breaking changes from the ecosystem coming like #3847 ...)

@vvoland
Copy link

vvoland commented Feb 11, 2025

Hi, Docker Engine maintainer here, let me give some context on how it works on the Docker side.

With the containerd image store integration enabled, Docker uses the digest of the image manifest index as the image's ID.

If you pull an image with Docker, by default it pulls only the native platform blobs, but still persist the whole image index to persist the image identity (to keep stable image IDs). docker image save by default will do its best to maintain the image ID, so it will output the full OCI index.

However, since in most cases you don't pull all possible platforms for the image, a tarball from docker image save won't have some of the referenced blobs because they aren't available locally.

On containerd side, such image can still be loaded, but it requires passing an extra WithSkipMissing functional option to the client.Import (which doesn't have a corresponding ctr flag yet). See: containerd/containerd#11344 (comment)

However, there's a way out of this on the Docker side too. docker image save now also has a --platform switch, which will only export a single-platform manifest for the specified platform.

@BenTheElder
Copy link
Member

BenTheElder commented Feb 11, 2025

Thanks @vvoland! Will try to follow up more soon ... KEP season. Quickly:

If you pull an image with Docker, by default it pulls only the native platform blobs, but still persist the whole image index to persist the image identity (to keep stable image IDs). docker image save by default will do its best to maintain the image ID, so it will output the full OCI index.

I think we'd actually like to preserve the ID if possible too, though I think that's currently not working either (it might after this change?) #2394

On containerd side, such image can still be loaded, but it requires passing an extra WithSkipMissing functional option to the client.Import (which doesn't have a corresponding ctr flag yet). See: containerd/containerd#11344 (comment)

WithSkipMissing seems like what we'd like.

If we can't get it in ctr we could start using our own client, but then we have to break things a lot more, and I had hoped our relatively minimal exposure to ctr would be reasonable.

However, there's a way out of this on the Docker side too. docker image save now also has a --platform switch, which will only export a single-platform manifest for the specified platform.

That makes sense, at least for a quick workaround users can docker save --platform ... | kind load image-archive - in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/external upstream bugs
Projects
None yet
Development

No branches or pull requests

7 participants