Skip to content

Conversation

c3d
Copy link
Contributor

@c3d c3d commented Sep 18, 2025

In order for the Kata runtime to correctly deal with hot plugging and set a VM size that is compatible with the host cgroup size [1], we need to pass the known overhead (from the RuntimeClass) to the runtime using an annotation.

This code implements a mutating web hook to modify the pod so that the annotation is added when the runtime class is Kata.

Fixes: #1057

- Description of the problem which is fixed/What is the use case

When hot-plugging memory in Kata (typically because the container spec provides a memory limit), the host-side cgroup and the maximum VM size currently become inconsistent, because the host-side cgroup is defined as "pod overhead + container limit" whereas the actual maximum VM size is "default VM size + container limit".

The in-guest cgroup is insufficient to constrain the memory usage sufficiently, because it does not account for things like kernel buffers. This leads to situations where we end up hitting a host-side cgroup out-of-memory, which kills the VM in a nasty way: we lose connexion to the kata runtime itself, and therefore things like container exit status, etc.

- What I did

This adds a pod mutating web hook that injects an annotation that the runtime can use to adjust the amount of memory being hot-plugged to avoid the issue.

- How to verify it

There is some local testing of the operator's pod mutation logic which can be exercised by running the run_pod_overhead_mutators.sh

- Description for the changelog

Add pod-mutating web hook to inject memory overhead annotation for the kata runtime

Assisted-by: Cursor with gpt5 model (test generation)

@openshift-ci openshift-ci bot requested review from bpradipt and vvoronko September 18, 2025 13:40
@c3d
Copy link
Contributor Author

c3d commented Sep 18, 2025

I don't have the proper privileges to change labels on this repo, but this is still work-in-progress. The actual interaction with the runtime remains to be tested.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 21, 2025
@c3d c3d force-pushed the feature/overhead-annotation branch from 1604549 to 22e02e7 Compare September 25, 2025 09:12
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 25, 2025
In order for the Kata runtime to correctly deal with hot plugging and
set a VM size that is compatible with the host cgroup size [1], we
need to pass the known overhead (from the RuntimeClass) to the runtime
using an annotation.

This code implements a mutating web hook to modify the pod so that the
annotation is added when the runtime class is Kata.

Signed-off-by: Christophe de Dinechin <[email protected]>
@c3d c3d force-pushed the feature/overhead-annotation branch from 22e02e7 to 6f9797c Compare September 25, 2025 10:42
Copy link

openshift-ci bot commented Sep 25, 2025

@c3d: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

}
if *r.Spec.MemoryOverheadMB > 4096*1024 {
return nil, fmt.Errorf("memoryOverheadMB must be at most 4GB")
}
Copy link
Member

@gkurz gkurz Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify how you came up with these numbers ?

return nil, fmt.Errorf("memoryOverheadMB must be at most 4GB")
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the duplicate code block to some helper ?

default: 350
format: int32
maximum: 2048
minimum: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do the min and max don't match what's tested in the code above ?

value: registry.redhat.io/openshift-sandboxed-containers/osc-podvm-builder-rhel9@sha256:517b1d76dad553871c745a315aa73c341f4672029704d50992a3eb663945e6dc
- name: RELATED_IMAGE_PODVM_PAYLOAD
value: registry.redhat.io/openshift-sandboxed-containers/osc-podvm-payload-rhel9@sha256:211a4291cd13302a18ae2766a8e680a2593450ce31f119e78ac3367c5858104a
value: registry.redhat.io/openshift-sandboxed-containers/osc-podvm-payload-rhel9@sha256:211a4291cd13302a18ae2766a8e680a2593450ce31f119e78ac3367c5858104a
Copy link
Member

@gkurz gkurz Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current value in the devel branch is :

registry.redhat.io/openshift-sandboxed-containers/osc-podvm-payload-rhel9@sha256:9c07280746952c7bd4ea811cd15e6dc2e35e0f6c2015e21436dfeaa418ec5799 

More generally, these lines with image digests are maintained by mintmaker. You should never have to modify them in a PR, so please just drop this change.

This being said, there's indeed a trailing space... even in the current HEAD of devel, Cc @littlejawa .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this space has been here for a very long time and mintmaker PRs gladly honor this formatting nit 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix that in a separate PR.


// Known Kata runtime classes
KataRuntimeClass = "kata"
KataRemoteRuntimeClass = "kata-remote"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to duplicate strings that we already have in the main controller file. The controller maintains a list of the known kata runtime class in the KataConfig.

https://github.com/openshift/sandboxed-containers-operator/blob/devel/api/v1/kataconfig_types.go#L51

Namespace: "default",
},
Spec: corev1.PodSpec{
RuntimeClassName: stringPtr("kata-remote"),
Copy link
Member

@gkurz gkurz Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to rebase to get the changes from #1069.

Namespace: "default",
},
Spec: corev1.PodSpec{
RuntimeClassName: stringPtr("kata"),
Copy link
Member

@gkurz gkurz Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnablePeerPods bool `json:"enablePeerPods"`

// +optional
// +kubebuilder:default:=350
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the answer is yes, should they stay in sync ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement pod-mutating web hook to pass overhead to Kata runtime

3 participants