Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

k3d: unable to start built container #46

Open
juliostanley opened this issue Nov 18, 2020 · 13 comments
Open

k3d: unable to start built container #46

juliostanley opened this issue Nov 18, 2020 · 13 comments

Comments

@juliostanley
Copy link

What steps did you take and what happened

  • Took a Dockerfile from some random project
  • Built it just fine (caching seems to be working well). kubectl build . -t test
    image
  • Attempted to run it with
    kubectl run -i --tty test --image=test -- sh
  • Got image pull back
    image
  • The image does not appear to be there, compared to the gif animation on the readme
    • It didn't send the tar? (the last export line is the last entry on the output, see image above)

What did you expect to happen
Container starts on single node k8s

Environment Details:

  • kubectl buildkit 0.1.0
  • Kubernetes v1.18.8+k3s1
  • k3s via k3d, on Docker for Windows (wsl2)
  • containerd v1.3.3-k3s2

Builder Logs

[+] Building 1.3s (9/9) FINISHED
 => [internal] load .dockerignore                                                                            0.1s 
 => [internal] load build definition from Dockerfile                                                         0.1s 
 => => transferring dockerfile: 32B                                                                          0.0s 
 => [internal] load metadata for docker.io/library/ubuntu:18.04                                              0.8s 
 => [1/5] FROM docker.io/library/ubuntu:18.04@sha256:646942475da61b4ce9cc5b3fadb42642ea90e5d0de46111458e100  0.0s 
 => => resolve docker.io/library/ubuntu:18.04@sha256:646942475da61b4ce9cc5b3fadb42642ea90e5d0de46111458e100  0.0s 
 => CACHED [2/5] RUN apt-get update; apt-get install wget -y                                                 0.0s 
 => CACHED [3/5] RUN  wget https://bin.equinox.io/c/VdrWdbjqyF/cloudflared-stable-linux-amd64.deb            0.0s 
 => CACHED [4/5] RUN  apt-get install ./cloudflared-stable-linux-amd64.deb                                   0.0s 
 => CACHED [5/5] RUN  useradd -s /usr/sbin/nologin -r -M cloudflared;rm -rf cloudflared-stable-linux-amd64.  0.0s 
 => exporting to image                                                                                       0.3s 
 => => exporting layers                                                                                      0.1s 
 => => exporting manifest sha256:eaf698c4058f45b5e9ba84ba994b269f6289ec5a5ba7b7b949c6b6cb0c6ec27d            0.0s 
 => => exporting config sha256:d9349451d21751b2bb3600b1ac00ecc5e4d53056fad7a01732aff58bf2fd5eeb              0.1s 

Dockerfile
N/A happens with any simple file

Vote on this request

This is an invitation to the community to vote on issues. Use the "smiley face" up to the right of this comment to vote.

  • 👍 "I would like to see this bug fixed as soon as possible"
  • 👎 "There are more important bugs to focus on right now"
@juliostanley juliostanley changed the title unable to start build container unable to start built container Nov 18, 2020
@juliostanley
Copy link
Author

Where can I increase the log level, or any tips on debugging?

@pdevine
Copy link
Contributor

pdevine commented Nov 18, 2020

Can you rebuild but tag it something like test:mytest ? I think this may be running into a problem w/ the semantics of how kubectl run works where it will always attempt to pull an image if it's untagged or set to latest.

@juliostanley
Copy link
Author

For the default buildkit pod created
kubectl logs buildkit-5b5d76d554-smdpc

The logs are

time="2020-11-18T16:32:50Z" level=info msg="auto snapshotter: using overlayfs"
time="2020-11-18T16:32:50Z" level=warning msg="using host network as the default"
time="2020-11-18T16:32:50Z" level=info msg="found worker \"lm0xod5ma4sm55gh6k87rtfpn\", labels=map[org.mobyproject.buildkit.worker.executor:oci org.mobyproject.buildkit.worker.hostname:buildkit-5b5d76d554-smdpc org.mobyproject.buildkit.worker.snapshotter:overlayfs], platforms=[linux/amd64 linux/arm64 linux/riscv64 linux/ppc64le 
linux/s390x linux/386 linux/arm/v7 linux/arm/v6]"
time="2020-11-18T16:32:50Z" level=warning msg="skipping containerd worker, as \"/run/containerd/containerd.sock\" does not exist"
time="2020-11-18T16:32:50Z" level=info msg="found 1 workers, default=\"lm0xod5ma4sm55gh6k87rtfpn\""
time="2020-11-18T16:32:50Z" level=warning msg="currently, only the default worker can be used."
time="2020-11-18T16:32:50Z" level=info msg="running server on /run/buildkit/buildkitd.sock"

I looked at k3s node, which has containerd sock in a different location than expected?

⤷ lab  master > docker exec -it 72d sh
/ # ls -la /run/k3s/containerd
total 0
drwx--x--x 5 0 0 140 Nov 18 16:14 .
drwx--x--x 3 0 0  60 Nov 18 16:14 ..
srw-rw---- 1 0 0   0 Nov 18 16:14 containerd.sock
srw-rw---- 1 0 0   0 Nov 18 16:14 containerd.sock.ttrpc
drwxr-xr-x 4 0 0  80 Nov 18 16:14 io.containerd.grpc.v1.cri
drwx--x--x 2 0 0  40 Nov 18 16:14 io.containerd.runtime.v1.linux
drwx--x--x 3 0 0  60 Nov 18 16:14 io.containerd.runtime.v2.task

@juliostanley
Copy link
Author

Gave it a try with a tag, but still same effect


kubectl build . -t test:test -f .\Dockerfile.test

 => [internal] load .dockerignore                                                                            0.1s 
 => => transferring context: 2B                                                                              0.0s 
 => [internal] load build definition from Dockerfile.test                                                    0.1s 
 => => transferring dockerfile: 36B                                                                          0.0s 
 => [internal] load metadata for docker.io/library/alpine:latest                                             1.1s 
 => [1/2] FROM docker.io/library/alpine@sha256:c0e9560cda118f9ec63ddefb4a173a2b2a0347082d7dff7dc14272e7841a  0.0s 
 => => resolve docker.io/library/alpine@sha256:c0e9560cda118f9ec63ddefb4a173a2b2a0347082d7dff7dc14272e7841a  0.0s 
 => CACHED [2/2] RUN echo hi                                                                                 0.0s 
 => exporting to image                                                                                       0.1s 
 => => exporting layers                                                                                      0.0s 
 => => exporting manifest sha256:4d46a8b555ef3122328fdb79c2de6fe60f4b7126413912eea2dd8012a594efbc            0.1s 
 => => exporting config sha256:819a744a827582dee9e9e319081273666fbc26e8349df5ae6f605c003d1a4adb              0.0s

kubectl run -i --tty test --image=test:test -- sh


kubectl get po test -o yaml

....
spec:
  containers:
  - args:
    - sh
    image: test:test
    imagePullPolicy: IfNotPresent
    name: test
    resources: {}
    stdin: true
    stdinOnce: true
....

kubectl get po

test                        0/1     ImagePullBackOff   0          2m11s

@pdevine
Copy link
Contributor

pdevine commented Nov 18, 2020

Does the image get loaded back into docker when you run docker images?

@MarcusAhlfors
Copy link

I have the same problem when using microk8s. Build works fine but image is not uploaded to containerd. I tried settings containerd.sock location:

kubectl buildkit create --runtime containerd --containerd-sock=/var/snap/microk8s/common/run/containerd.sock;

But after this even building started to fail

@pdevine
Copy link
Contributor

pdevine commented Nov 18, 2020

@MarcusAhlfors could you file a separate issue for microk8s? We've tested it out on a lot of platforms, but clearly not enough! :-D

@juliostanley
Copy link
Author

I am running k3s, which uses containerd.

No, the image does not get loaded into containerd.


If I follow the flags used by @MarcusAhlfors with
kubectl buildkit create --runtime containerd --containerd-sock=/run/k3s/containerd/containerd.sock;

I get the following error

[+] Building 0.0s (0/1)
 => [internal] booting buildkit                                                                             23.1s 
 => => # Normal buildkit-54df6654c9 SuccessfulCreate Created pod: buildkit-54df6654c9-n4rpl
 => => # Normal buildkit-54df6654c9-n4rpl Scheduled Successfully assigned default/buildkit-54df6654c9-n4rpl to k3 
 => => # d-k3s-default-server-0
 => => # Warning buildkit-54df6654c9-n4rpl FailedMount MountVolume.SetUp failed for volume "var-lib-containerd" : 
 => => #  hostPath type check failed: /var/lib/containerd is not a directory
 => => waiting for 1 pods to be ready 

events

  Warning  FailedMount  95s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[var-lib-containerd], unattached volumes=[var-lib-containerd run-containerd var-log tmp default-token-s7jfq buildkitd-config containerd-sock var-lib-buildkit]: timed out waiting for the condition

@juliostanley
Copy link
Author

Regarding the previous error, I think the mount should be configurable? I modified the deployment of buildkit while in failed state:

  • replica to 0
  • var-lib-containerd to hostPath /var/lib/rancher/k3s/agent/containerd
  • replica to 1

But I ended in this error on pod events

Error: failed to generate container "e3c0f6613345a1a9d493a557951288c03e83ed22753aa52c51bd4d3b6388fcc8" spec: path "/tmp" is mounted on "/" but it is not a shared mount

Which seems similar to this
k3d-io/k3d#206

Which is caused by Bidirectional setting to allow for the mounts to be picked up.

I guess, maybe k3d isn't a good env for kubectl-buildkit? Suggestions?

@juliostanley juliostanley changed the title unable to start built container k3s: unable to start built container Nov 18, 2020
@juliostanley juliostanley changed the title k3s: unable to start built container k3d: unable to start built container Nov 18, 2020
@pdevine
Copy link
Contributor

pdevine commented Nov 18, 2020

As a work-around you can --push the built image to a registry and then pull it back, but that's definitely not ideal. We'll need to see if we can get an environment to see if we can replicate the problem.

@dhiltgen
Copy link
Contributor

@juliostanley you mention in the opening comment Docker for Windows (wsl2) and the containerd runtime. I'm trying to wrap my head around what moving parts are in your environment (maybe some of the auto-detection logic is getting confused.)

Is your kubelet configured to use containerd or dockerd? (I'm assuming containerd, but please confirm)

Is dockerd also running inside your system, and is there a /var/run/docker.sock visible inside the wsl2 distro that's running the kubelet?

If there is a dockerd, and you're using containerd for kubernetes, it's possible the builder is auto-selecting dockerd incorrectly, assuming that's your runtime, then storing images there, which are not visible to kubernetes via containerd.

If that's what's going on, using kubectl buildkit create --runtime containerd --containerd-sock ... it should be possible to work around this auto-detection glitch by forcing the right runtime and containerd socket path, but the trick will be figuring out what the path is inside the environment that is running the kubelet process. If you can find your kubelet config, hopefully that lists the path to the containerd socket for reference.

@juliostanley
Copy link
Author

juliostanley commented Nov 18, 2020

@dhiltgen Yeah, it may sound a little confusing, and its actually is part of the issue, due to the need for Bidirectional mounts.


So here is what I noticed (based on my previous comments)

  • I am using k3d https://k3d.io/
  • k3d uses docker (in this case docker for windows - wsl2) to spin up "nodes" or single a "node" for k8s cluster, these run k3s which comes with containerd (its not a docker in docker situation). The sock and the lib locations for containerd are on non standard paths.
    • This is the command to create the k8s cluster k3d cluster create
    • var-lib-containerd is at /var/lib/rancher/k3s/agent/containerd inside the docker container node
    • sock is at /run/k3s/containerd/containerd.sock
  • I can create buildkit with kubectl buildkit create --runtime containerd --containerd-sock=/run/k3s/containerd/containerd.sock
  • But it will still fails, because of 2 reasons
    • var-lib-containerd is assumed to be in /var/lib/containerd, which causes pod creation failure due to non existing hostPath (the following bullet point is even after I manually edit the deployment of buildkit)
    • The hostPath on the deployment are set to be Bidirectional, which actually throws an error regarding not being shared mounts (see this ticket for reference [BUG] Fail to start longhorn with k3d k3d-io/k3d#206), different issue but same pattern. Pod errors failing to create buildkit container due to mounts.

Basically it seems like k3d is not a good environment for kubectl-buildkit, and the only option for it is if you are using a registry, as described by @pdevine, although that eliminates one of the use cases (not transferring bytes up and down from the registry, and needing a registry)

Hope this clarified the environment

@dhiltgen
Copy link
Contributor

Thanks for the clarification!

The way containerd works is the gRPC API requires the "client" to be "local" - it's not a network API like the kubernetes API or dockerd API. The client libraries require access to specific host paths so that files can be placed there so child containers can access them, hence the bidirectional mounts. This is only needed if we're using containerd to facilitate the containers used during the image build.

It sounds like k3d isn't going to work unless/until those mounts are refined upstream for the containerd runtime model.

It's possible #26 might wind up building out an alternative strategy which could be employed here. We might be able to approach this by using the ~rootless model (not building inside containerd) then load the images directly through a proxy which I believe can load images purely over the containerd.sock without having to touch the filesystem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants