Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do-not-merge: This will be used to test the kata-containers CI related stuff #263

Closed

Conversation

fidencio
Copy link
Member

@fidencio fidencio commented Oct 9, 2023

No description provided.

ChengyuZhu6 and others added 2 commits September 25, 2023 16:23
nydus-snapshotter / nydus will be used to get rid of the containerd fork
we have, allowing us to do both the image pulling on the host side and
inside the guest.

NOTE:
This PR should NOT be merged as it's, as it breaks s390x payload build.

Signed-off-by: ChengyuZhu6 <[email protected]>
Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio
Copy link
Member Author

fidencio commented Oct 9, 2023

/test

3 similar comments
@fidencio
Copy link
Member Author

fidencio commented Oct 9, 2023

/test

@fidencio
Copy link
Member Author

fidencio commented Oct 9, 2023

/test

@fidencio
Copy link
Member Author

fidencio commented Oct 9, 2023

/test

@fidencio fidencio closed this Oct 9, 2023
@fidencio fidencio reopened this Oct 9, 2023
@fidencio
Copy link
Member Author

fidencio commented Oct 9, 2023

/test

@fidencio
Copy link
Member Author

fidencio commented Oct 9, 2023

/test-tdx

@fidencio
Copy link
Member Author

fidencio commented Oct 9, 2023

/test

@fidencio fidencio force-pushed the topic/test-kata-ci-stuff branch from 1d25728 to 3843fae Compare October 9, 2023 14:58
@fidencio
Copy link
Member Author

fidencio commented Oct 9, 2023

/test

2 similar comments
@fidencio
Copy link
Member Author

fidencio commented Oct 9, 2023

/test

@fidencio
Copy link
Member Author

fidencio commented Oct 9, 2023

/test

@fidencio
Copy link
Member Author

fidencio commented Oct 9, 2023

/test-tdx

@fidencio
Copy link
Member Author

fidencio commented Oct 9, 2023

/test-tdx
/test

@fidencio fidencio force-pushed the topic/test-kata-ci-stuff branch from 35fc2e5 to ebc29f9 Compare October 11, 2023 15:36
@fidencio
Copy link
Member Author

/test

1 similar comment
@fidencio
Copy link
Member Author

/test

@fidencio fidencio force-pushed the topic/test-kata-ci-stuff branch from 583b325 to e71a696 Compare October 11, 2023 21:35
@fidencio
Copy link
Member Author

/test

@huoqifeng
Copy link
Contributor

@fidencio thanks for adding snapshotter to operator in this, I know this is still in progress, I want to add some of my findings when try it in my local to eliminate potential problems. As I tried in my local, we should only enable nydus for kata-xxx shims in /etc/containerd/config.toml, otherwise, I got error similar as below:

Error: failed to create containerd container: create instance 9: object with key "9" already exists: unknown

I guess which is because it makes mess when both runc and kata pods are referring to same image, which I think is always true for pause image. Similar configure as below worked well in my local for containerd 1.7.6:

  • kata-xxx parts revised
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
  runtime_type = "io.containerd.kata.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration.toml"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-remote]
  runtime_type = "io.containerd.kata-remote.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-remote.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-remote.toml"
  • runc parts keep original
      [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        privileged_without_host_devices_all_devices_allowed = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""
        sandbox_mode = ""
        snapshotter = ""

        [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          privileged_without_host_devices_all_devices_allowed = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"
          sandbox_mode = "podsandbox"
          snapshotter = ""

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = false

Sorry for the disturb if you have tried/knew it already.

@fidencio
Copy link
Member Author

@huoqifeng, first of all, thanks a whole lot for taking some time to test it out with the remote hypervisor!

This is what the Operator is deploying as the containerd configuration: https://gist.github.com/fidencio/3b9f1f8049d1b1b43b6c976987f497c9#file-containerd-config-toml-L225-L228

The important parts are:

  • proxy_plugins:
[proxy_plugins]
  [proxy_plugins.nydus]
    type = "snapshot"
    address = "/run/containerd-nydus/containerd-nydus-grpc.sock"
  • runc configuration (which we're not changing anything):
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = false
  • default runtime configuration (which we're also not chaging anything):
      [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""

        [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]
  • kata-containers specific configuration:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
  runtime_type = "io.containerd.kata.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration.toml"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh]
  runtime_type = "io.containerd.kata-clh.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-clh.toml"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh-tdx]
  runtime_type = "io.containerd.kata-clh-tdx.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh-tdx.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-clh-tdx.toml"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu]
  runtime_type = "io.containerd.kata-qemu.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-tdx]
  runtime_type = "io.containerd.kata-qemu-tdx.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-tdx.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu-tdx.toml"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-sev]
  runtime_type = "io.containerd.kata-qemu-sev.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-sev.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu-sev.toml"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-snp]
  runtime_type = "io.containerd.kata-qemu-snp.v2"
  snapshotter = "nydus"
  privileged_without_host_devices = true
  pod_annotations = ["io.katacontainers.*"]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-snp.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu-snp.toml"

These configs worked for me in a very simple set of tests I ran, but we're still facing issues in the CI (which I'm debugging as we speak).

@huoqifeng, just to confirm, the configurations match with yours, correct?

@fidencio
Copy link
Member Author

Also, for reference, for the nydus-snapshotter, the way we're deploying it is basically putting the binary on the host, and also adding a new systemd service, which we simply make it enabled.

Here's the systemd unit:

ubuntu@operator:~/operator/tests/e2e$ systemctl cat nydus-snapshotter.service 
# /etc/systemd/system/nydus-snapshotter.service
[Unit]
Description=Nydus snapshotter
After=network.target local-fs.target
Before=containerd.service

[Service]
ExecStart=/opt/confidential-containers/bin/containerd-nydus-grpc --config /opt/confidential-containers/share/nydus-snapshotter/config-coco-guest-pulling.toml

[Install]
RequiredBy=containerd.service

The important parts here are the:

  • Before, which means it must be stsarted before containerd, so we ensure that whenever containerd is up it'll be ready to use nydus, as long as the nydus-snapshotter properly configured
  • RequiredBy, which enforces that the snapshotter will be started, in case it's not started yet, whenever we start / restart containerd.

@huoqifeng
Copy link
Contributor

@fidencio I'm not using a completely installation but just debugging on my local, the steps (not completely commands listed yet) described in confidential-containers/cloud-api-adaptor#1512 and the configuration file I'm using https://raw.githubusercontent.com/huoqifeng/gistfiles/main/coco/containerd.toml, which works for me to run a runc pod and 2 PeerPods which are using same nginx image.

@fidencio
Copy link
Member Author

/test

@fidencio
Copy link
Member Author

Okay, I had a really good debug session with @stevenhorsman, where we could notice at least 2 issues coming from the kata-containers side:

  • Missing /etc/aa-offline_fs_kbc-keys.json file in the rootfs, which seems to make image-rs quite confused
  • Changes of containerd configuration as part of the tests repo, which should never be done when running the tests on the operator side.

With that we got more green tests on this front, and here's where I call it a week.

@stevenhorsman will talk to Ding and PRs will be opened by Tomorrow, and on Monday I'll get back to this.

@stevenhorsman, sincerely, thanks a lot for the help!

@fidencio
Copy link
Member Author

/test

1 similar comment
@fidencio
Copy link
Member Author

/test

@fidencio fidencio force-pushed the topic/test-kata-ci-stuff branch from 106e445 to a16f423 Compare October 17, 2023 06:52
@fidencio
Copy link
Member Author

/test

1 similar comment
@fidencio
Copy link
Member Author

/test

Signed-off-by: Fabiano Fidêncio <[email protected]>
We're facing some errors in the CI (which I'm not able to reproduce
locally) when trying to build the s390x image.

Let's skip it for now just for the sake of having something tested.

Signed-off-by: Fabiano Fidêncio <[email protected]>
As we want to start testing and shipping with nydus / upstream
containerd.

Signed-off-by: Fabiano Fidêncio <[email protected]>
This is used in the install_nydus_snapshotter_artefacts() function
without being declared.

Signed-off-by: Fabiano Fidêncio <[email protected]>
For now we're only using nydus, let's make sure to set it properly
instead of depending on a non-existent env var.

Signed-off-by: Fabiano Fidêncio <[email protected]>
Now that we're relying on it.

Signed-off-by: Fabiano Fidêncio <[email protected]>
We will do it in a better way, rather than hardcoding it here, but for
now we want to know whether tests will pass with it on.

Signed-off-by: Fabiano Fidêncio <[email protected]>
This needs to be dropped, but it allows us to keep testing.
There's an error, still to be debugged, as the snapshotter is not being
properly set with the current operator logic.

Signed-off-by: Fabiano Fidêncio <[email protected]>
Let's rely on containerd's drop-in stuff, as it makes things way easier
for us than relying on (even more) seds all over the place.

Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio fidencio force-pushed the topic/test-kata-ci-stuff branch from 964a694 to 40787c9 Compare October 19, 2023 12:11
@fidencio
Copy link
Member Author

/test

2 similar comments
@fidencio
Copy link
Member Author

/test

@fidencio
Copy link
Member Author

/test

@fidencio
Copy link
Member Author

I'm closing this one as #267 got merged.

@fidencio fidencio closed this Oct 26, 2023
@fffmonkeyking
Copy link

fffmonkeyking commented Jan 15, 2024

Why is it necessary to use "/usr/bin/containerd + /opt/confidential-containers/bin/containerd-nydus-grpc" instead of "/opt/confidential-containers/bin/containerd" by default from v0.7.0 to v0.8.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants