-
Notifications
You must be signed in to change notification settings - Fork 61
controller: Add overhead annotation using pod mutating webhook #1058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
I don't have the proper privileges to change labels on this repo, but this is still work-in-progress. The actual interaction with the runtime remains to be tested. |
1604549
to
22e02e7
Compare
In order for the Kata runtime to correctly deal with hot plugging and set a VM size that is compatible with the host cgroup size [1], we need to pass the known overhead (from the RuntimeClass) to the runtime using an annotation. This code implements a mutating web hook to modify the pod so that the annotation is added when the runtime class is Kata. Signed-off-by: Christophe de Dinechin <[email protected]>
22e02e7
to
6f9797c
Compare
@c3d: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
} | ||
if *r.Spec.MemoryOverheadMB > 4096*1024 { | ||
return nil, fmt.Errorf("memoryOverheadMB must be at most 4GB") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify how you came up with these numbers ?
return nil, fmt.Errorf("memoryOverheadMB must be at most 4GB") | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move the duplicate code block to some helper ?
default: 350 | ||
format: int32 | ||
maximum: 2048 | ||
minimum: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do the min and max don't match what's tested in the code above ?
value: registry.redhat.io/openshift-sandboxed-containers/osc-podvm-builder-rhel9@sha256:517b1d76dad553871c745a315aa73c341f4672029704d50992a3eb663945e6dc | ||
- name: RELATED_IMAGE_PODVM_PAYLOAD | ||
value: registry.redhat.io/openshift-sandboxed-containers/osc-podvm-payload-rhel9@sha256:211a4291cd13302a18ae2766a8e680a2593450ce31f119e78ac3367c5858104a | ||
value: registry.redhat.io/openshift-sandboxed-containers/osc-podvm-payload-rhel9@sha256:211a4291cd13302a18ae2766a8e680a2593450ce31f119e78ac3367c5858104a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current value in the devel
branch is :
registry.redhat.io/openshift-sandboxed-containers/osc-podvm-payload-rhel9@sha256:9c07280746952c7bd4ea811cd15e6dc2e35e0f6c2015e21436dfeaa418ec5799
More generally, these lines with image digests are maintained by mintmaker. You should never have to modify them in a PR, so please just drop this change.
This being said, there's indeed a trailing space... even in the current HEAD of devel
, Cc @littlejawa .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this space has been here for a very long time and mintmaker PRs gladly honor this formatting nit 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix that in a separate PR.
|
||
// Known Kata runtime classes | ||
KataRuntimeClass = "kata" | ||
KataRemoteRuntimeClass = "kata-remote" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to duplicate strings that we already have in the main controller file. The controller maintains a list of the known kata runtime class in the KataConfig.
https://github.com/openshift/sandboxed-containers-operator/blob/devel/api/v1/kataconfig_types.go#L51
Namespace: "default", | ||
}, | ||
Spec: corev1.PodSpec{ | ||
RuntimeClassName: stringPtr("kata-remote"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use https://github.com/openshift/sandboxed-containers-operator/blob/devel/controllers/peerpods.go#L45 ?
And fix other sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to rebase to get the changes from #1069.
Namespace: "default", | ||
}, | ||
Spec: corev1.PodSpec{ | ||
RuntimeClassName: stringPtr("kata"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And fix other sites.
EnablePeerPods bool `json:"enablePeerPods"` | ||
|
||
// +optional | ||
// +kubebuilder:default:=350 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the answer is yes, should they stay in sync ?
In order for the Kata runtime to correctly deal with hot plugging and set a VM size that is compatible with the host cgroup size [1], we need to pass the known overhead (from the RuntimeClass) to the runtime using an annotation.
This code implements a mutating web hook to modify the pod so that the annotation is added when the runtime class is Kata.
Fixes: #1057
- Description of the problem which is fixed/What is the use case
When hot-plugging memory in Kata (typically because the container spec provides a memory limit), the host-side cgroup and the maximum VM size currently become inconsistent, because the host-side cgroup is defined as "pod overhead + container limit" whereas the actual maximum VM size is "default VM size + container limit".
The in-guest cgroup is insufficient to constrain the memory usage sufficiently, because it does not account for things like kernel buffers. This leads to situations where we end up hitting a host-side cgroup out-of-memory, which kills the VM in a nasty way: we lose connexion to the kata runtime itself, and therefore things like container exit status, etc.
- What I did
This adds a pod mutating web hook that injects an annotation that the runtime can use to adjust the amount of memory being hot-plugged to avoid the issue.
- How to verify it
There is some local testing of the operator's pod mutation logic which can be exercised by running the run_pod_overhead_mutators.sh
- Description for the changelog
Add pod-mutating web hook to inject memory overhead annotation for the kata runtime
Assisted-by: Cursor with gpt5 model (test generation)