-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with Kubevirt CPU throttling even with Guaranteed and static CPUManager #4954
Comments
I think this is very related, if not the same story as in #4319 |
Looking at the "semantic" diff between the k0s and rke kubelet configs, there's some differences which I believe are the culprit here. I've omitted some non relevant bits from the diff such as key file locations etc.
Without deeper refreshment read into the area of CPU pinning and scheduling, I'd look into these:
Remember that with k0s, you can create a specialized worker profile which basically allow you to customize the kubelet config. In that case, remember to start the worker using |
Hey we've been doing some testing with this, if we set the apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
name: cluster
spec:
hosts:
- ssh:
# server-a - Controller
address: 10.69.0.23
user: megaport
port: 22
role: controller
installFlags:
- --debug
- ssh:
# server-b - Worker
address: 10.69.0.27
user: megaport
port: 22
role: worker
installFlags:
- --debug
- --profile="custom"
k0s:
version: "v1.30.4+k0s.0"
versionChannel: stable
dynamicConfig: true
config:
apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
name: cluster
spec:
workerProfiles:
- name: custom
values:
cgroupDriver: "systemd" The above config reliably causes the crashing behavior. We're going to look into the other options today. Any reason k0s uses the |
hmm, did you also change it on containerd config? IIRC that defaults to cgroupfs too and as those are now different between kubelet and containerd, things might go south... 😄
No real reason other than simplicity. Simplicity from the point of view that k0s runs also on other than systemd managed stuff. We have more better detection and logic in plans to make it play nicer with different cgroup managers. |
We've got the new kubelet config here, but unfortunately still running into throttling issues. kubelet config```json { "kubeletconfig": { "enableServer": true, "podLogsDir": "/var/log/pods", "syncFrequency": "1m0s", "fileCheckFrequency": "20s", "httpCheckFrequency": "20s", "address": "0.0.0.0", "port": 10250, "tlsCipherSuites": [ "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256", "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384", "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256", "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256", "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384", "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256" ], "tlsMinVersion": "VersionTLS12", "rotateCertificates": true, "serverTLSBootstrap": true, "authentication": { "x509": { "clientCAFile": "/var/lib/k0s/pki/ca.crt" }, "webhook": { "enabled": true, "cacheTTL": "2m0s" }, "anonymous": { "enabled": false } }, "authorization": { "mode": "Webhook", "webhook": { "cacheAuthorizedTTL": "5m0s", "cacheUnauthorizedTTL": "30s" } }, "registryPullQPS": 5, "registryBurst": 10, "eventRecordQPS": 0, "eventBurst": 100, "enableDebuggingHandlers": true, "healthzPort": 10248, "healthzBindAddress": "127.0.0.1", "oomScoreAdj": -999, "clusterDomain": "cluster.local", "clusterDNS": [ "10.96.0.10" ], "streamingConnectionIdleTimeout": "4h0m0s", "nodeStatusUpdateFrequency": "10s", "nodeStatusReportFrequency": "5m0s", "nodeLeaseDurationSeconds": 40, "imageMinimumGCAge": "2m0s", "imageMaximumGCAge": "0s", "imageGCHighThresholdPercent": 85, "imageGCLowThresholdPercent": 80, "volumeStatsAggPeriod": "1m0s", "kubeletCgroups": "/system.slice/containerd.service", "cgroupsPerQOS": true, "cgroupDriver": "systemd", "cpuManagerPolicy": "static", "cpuManagerReconcilePeriod": "10s", "memoryManagerPolicy": "Static", "topologyManagerPolicy": "restricted", "topologyManagerScope": "pod", "runtimeRequestTimeout": "2m0s", "hairpinMode": "promiscuous-bridge", "maxPods": 110, "podPidsLimit": -1, "resolvConf": "/etc/resolv.conf", "cpuCFSQuota": true, "cpuCFSQuotaPeriod": "100ms", "nodeStatusMaxImages": 50, "maxOpenFiles": 1000000, "contentType": "application/vnd.kubernetes.protobuf", "kubeAPIQPS": 50, "kubeAPIBurst": 100, "serializeImagePulls": true, "evictionHard": { "imagefs.available": "15%", "imagefs.inodesFree": "5%", "memory.available": "100Mi", "nodefs.available": "10%", "nodefs.inodesFree": "5%" }, "evictionPressureTransitionPeriod": "5m0s", "enableControllerAttachDetach": true, "makeIPTablesUtilChains": true, "iptablesMasqueradeBit": 14, "iptablesDropBit": 15, "failSwapOn": false, "memorySwap": {}, "containerLogMaxSize": "10Mi", "containerLogMaxFiles": 5, "containerLogMaxWorkers": 1, "containerLogMonitorInterval": "10s", "configMapAndSecretChangeDetectionStrategy": "Watch", "systemReserved": { "cpu": "2", "memory": "1000Mi" }, "kubeReserved": { "memory": "2000Mi" }, "kubeReservedCgroup": "system.slice", "enforceNodeAllocatable": [ "pods" ], "volumePluginDir": "/usr/libexec/k0s/kubelet-plugins/volume/exec", "logging": { "format": "text", "flushFrequency": "5s", "verbosity": 1, "options": { "text": { "infoBufferSize": "0" }, "json": { "infoBufferSize": "0" } } }, "enableSystemLogHandler": true, "enableSystemLogQuery": false, "shutdownGracePeriod": "0s", "shutdownGracePeriodCriticalPods": "0s", "reservedMemory": [ { "numaNode": 0, "limits": { "memory": "1550Mi" } }, { "numaNode": 1, "limits": { "memory": "1550Mi" } } ], "enableProfilingHandler": true, "enableDebugFlagsHandler": true, "seccompDefault": false, "memoryThrottlingFactor": 0.9, "registerNode": true, "localStorageCapacityIsolation": true, "containerRuntimeEndpoint": "unix:///run/k0s/containerd.sock" } } ``` After setting the containerd config to # /etc/k0s/containerd.d/00-cgroups.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true We were able to get things to start, but still running into the cgroup throttling issue. kubvirt pod```yaml apiVersion: v1 kind: Pod metadata: annotations: cni.projectcalico.org/containerID: f14bd217a120cc893dbadc4dce5237d40fb3be15cf8b3f7fd257402cdf689acc cni.projectcalico.org/podIP: 10.244.9.88/32 cni.projectcalico.org/podIPs: 10.244.9.88/32 descheduler.alpha.kubernetes.io/request-evict-only: "" kubectl.kubernetes.io/default-container: compute kubevirt.io/domain: fedora-test kubevirt.io/migrationTransportUnix: "true" kubevirt.io/vm-generation: "1" post.hook.backup.velero.io/command: '["/usr/bin/virt-freezer", "--unfreeze", "--name", "fedora-test", "--namespace", "default"]' post.hook.backup.velero.io/container: compute pre.hook.backup.velero.io/command: '["/usr/bin/virt-freezer", "--freeze", "--name", "fedora-test", "--namespace", "default"]' pre.hook.backup.velero.io/container: compute creationTimestamp: "2024-09-16T20:21:16Z" generateName: virt-launcher-fedora-test- labels: kubevirt.io: virt-launcher kubevirt.io/created-by: 76a6a7fb-2327-479e-bb06-b816d3e0b730 kubevirt.io/nodeName: protoklustr-2 vm.kubevirt.io/name: fedora-test name: virt-launcher-fedora-test-j575z namespace: default ownerReferences: - apiVersion: kubevirt.io/v1 blockOwnerDeletion: true controller: true kind: VirtualMachineInstance name: fedora-test uid: 76a6a7fb-2327-479e-bb06-b816d3e0b730 resourceVersion: "5611" uid: 2fdafb62-3de2-43e6-900a-eda4bf1982db spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-labeller.kubevirt.io/obsolete-host-model operator: DoesNotExist automountServiceAccountToken: false containers: - command: - /usr/bin/virt-launcher-monitor - --qemu-timeout - 356s - --name - fedora-test - --uid - 76a6a7fb-2327-479e-bb06-b816d3e0b730 - --namespace - default - --kubevirt-share-dir - /var/run/kubevirt - --ephemeral-disk-dir - /var/run/kubevirt-ephemeral-disks - --container-disk-dir - /var/run/kubevirt/container-disks - --grace-period-seconds - "25" - --hook-sidecars - "0" - --ovmf-path - /usr/share/OVMF - --run-as-nonroot env: - name: XDG_CACHE_HOME value: /var/run/kubevirt-private - name: XDG_CONFIG_HOME value: /var/run/kubevirt-private - name: XDG_RUNTIME_DIR value: /var/run - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name image: quay.io/kubevirt/virt-launcher:v1.3.1 imagePullPolicy: IfNotPresent name: compute resources: limits: cpu: "16" devices.kubevirt.io/kvm: "1" devices.kubevirt.io/tun: "1" devices.kubevirt.io/vhost-net: "1" hugepages-1Gi: 16Gi memory: "501219329" requests: cpu: "16" devices.kubevirt.io/kvm: "1" devices.kubevirt.io/tun: "1" devices.kubevirt.io/vhost-net: "1" ephemeral-storage: 50M hugepages-1Gi: 16Gi memory: "501219329" securityContext: allowPrivilegeEscalation: false capabilities: add: - NET_BIND_SERVICE drop: - ALL privileged: false runAsGroup: 107 runAsNonRoot: true runAsUser: 107 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/kubevirt-private name: private - mountPath: /var/run/kubevirt name: public - mountPath: /var/run/kubevirt-ephemeral-disks name: ephemeral-disks - mountPath: /var/run/kubevirt/container-disks mountPropagation: HostToContainer name: container-disks - mountPath: /var/run/libvirt name: libvirt-runtime - mountPath: /var/run/kubevirt/sockets name: sockets - mountPath: /dev/hugepages name: hugepages - mountPath: /dev/hugepages/libvirt/qemu name: hugetblfs-dir - mountPath: /var/run/kubevirt/hotplug-disks mountPropagation: HostToContainer name: hotplug-disks - args: - --copy-path - /var/run/kubevirt-ephemeral-disks/container-disk-data/76a6a7fb-2327-479e-bb06-b816d3e0b730/disk_0 command: - /usr/bin/container-disk image: kubevirt/fedora-cloud-container-disk-demo:latest imagePullPolicy: Always name: volumecontainerdisk resources: limits: cpu: 10m memory: 40M requests: cpu: 10m ephemeral-storage: 50M memory: 40M securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true runAsUser: 107 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/kubevirt-ephemeral-disks/container-disk-data/76a6a7fb-2327-479e-bb06-b816d3e0b730 name: container-disks - mountPath: /usr/bin name: virt-bin-share-dir - args: - --logfile - /var/run/kubevirt-private/76a6a7fb-2327-479e-bb06-b816d3e0b730/virt-serial0-log command: - /usr/bin/virt-tail env: - name: VIRT_LAUNCHER_LOG_VERBOSITY value: "2" image: quay.io/kubevirt/virt-launcher:v1.3.1 imagePullPolicy: IfNotPresent name: guest-console-log resources: limits: cpu: 15m memory: 60M requests: cpu: 15m memory: 60M securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true runAsUser: 107 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/kubevirt-private name: private readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: false hostname: fedora-test initContainers: - command: - /usr/bin/cp - /usr/bin/container-disk - /init/usr/bin/container-disk env: - name: XDG_CACHE_HOME value: /var/run/kubevirt-private - name: XDG_CONFIG_HOME value: /var/run/kubevirt-private - name: XDG_RUNTIME_DIR value: /var/run image: quay.io/kubevirt/virt-launcher:v1.3.1 imagePullPolicy: IfNotPresent name: container-disk-binary resources: limits: cpu: 10m memory: 40M requests: cpu: 10m memory: 40M securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false runAsGroup: 107 runAsNonRoot: true runAsUser: 107 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /init/usr/bin name: virt-bin-share-dir - args: - --no-op command: - /usr/bin/container-disk image: kubevirt/fedora-cloud-container-disk-demo:latest imagePullPolicy: Always name: volumecontainerdisk-init resources: limits: cpu: 10m memory: 40M requests: cpu: 10m ephemeral-storage: 50M memory: 40M securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true runAsUser: 107 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/kubevirt-ephemeral-disks/container-disk-data/76a6a7fb-2327-479e-bb06-b816d3e0b730 name: container-disks - mountPath: /usr/bin name: virt-bin-share-dir nodeName: protoklustr-2 nodeSelector: cpumanager: "true" kubernetes.io/arch: amd64 kubevirt.io/schedulable: "true" preemptionPolicy: PreemptLowerPriority priority: 0 readinessGates: - conditionType: kubevirt.io/virtual-machine-unpaused restartPolicy: Never schedulerName: default-scheduler securityContext: fsGroup: 107 runAsGroup: 107 runAsNonRoot: true runAsUser: 107 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 40 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes: - emptyDir: {} name: private - emptyDir: {} name: public - emptyDir: {} name: sockets - emptyDir: {} name: virt-bin-share-dir - emptyDir: {} name: libvirt-runtime - emptyDir: {} name: ephemeral-disks - emptyDir: {} name: container-disks - emptyDir: medium: HugePages name: hugepages - emptyDir: {} name: hugetblfs-dir - emptyDir: {} name: hotplug-disks status: conditions: - lastProbeTime: "2024-09-16T20:21:16Z" lastTransitionTime: "2024-09-16T20:21:16Z" message: the virtual machine is not paused reason: NotPaused status: "True" type: kubevirt.io/virtual-machine-unpaused - lastProbeTime: null lastTransitionTime: "2024-09-16T20:21:17Z" status: "True" type: PodReadyToStartContainers - lastProbeTime: null lastTransitionTime: "2024-09-16T20:21:18Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2024-09-16T20:21:19Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: "2024-09-16T20:21:19Z" status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2024-09-16T20:21:16Z" status: "True" type: PodScheduled containerStatuses: - containerID: containerd://d30a5394062be7df6b9d9297e4fb6e30561e1c5f949e66acded795d17f05528b image: quay.io/kubevirt/virt-launcher:v1.3.1 imageID: quay.io/kubevirt/virt-launcher@sha256:b15f8049d7f1689d9d8c338d255dc36b15655fd487e824b35e2b139258d44209 lastState: {} name: compute ready: true restartCount: 0 started: true state: running: startedAt: "2024-09-16T20:21:18Z" - containerID: containerd://8c28b3316e2ae0a6e2fdd49d0590f32ff509b62118d8002ca4a436f4f223a579 image: quay.io/kubevirt/virt-launcher:v1.3.1 imageID: quay.io/kubevirt/virt-launcher@sha256:b15f8049d7f1689d9d8c338d255dc36b15655fd487e824b35e2b139258d44209 lastState: {} name: guest-console-log ready: true restartCount: 0 started: true state: running: startedAt: "2024-09-16T20:21:19Z" - containerID: containerd://6f9b6afe9dcd8d02cfb90db1609d561fd929e5ab10b085df162c3e20f3640ff9 image: docker.io/kubevirt/fedora-cloud-container-disk-demo:latest imageID: docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4a0c3f9526551d0294079f1b0171a071a57fe0bf60a2e8529bf4102ee63a67cd lastState: {} name: volumecontainerdisk ready: true restartCount: 0 started: true state: running: startedAt: "2024-09-16T20:21:19Z" hostIP: 10.69.0.23 hostIPs: - ip: 10.69.0.23 initContainerStatuses: - containerID: containerd://151888cd26bd5ef8b7b9677b5f817a2e1437b579d952e926b94d4cdf98bb1a5c image: quay.io/kubevirt/virt-launcher:v1.3.1 imageID: quay.io/kubevirt/virt-launcher@sha256:b15f8049d7f1689d9d8c338d255dc36b15655fd487e824b35e2b139258d44209 lastState: {} name: container-disk-binary ready: true restartCount: 0 started: false state: terminated: containerID: containerd://151888cd26bd5ef8b7b9677b5f817a2e1437b579d952e926b94d4cdf98bb1a5c exitCode: 0 finishedAt: "2024-09-16T20:21:17Z" reason: Completed startedAt: "2024-09-16T20:21:16Z" - containerID: containerd://ac024ab1218d338ac40043c9c6cd3befe8011141c1f50f74e12e4ed54b31e233 image: docker.io/kubevirt/fedora-cloud-container-disk-demo:latest imageID: docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:4a0c3f9526551d0294079f1b0171a071a57fe0bf60a2e8529bf4102ee63a67cd lastState: {} name: volumecontainerdisk-init ready: true restartCount: 0 started: false state: terminated: containerID: containerd://ac024ab1218d338ac40043c9c6cd3befe8011141c1f50f74e12e4ed54b31e233 exitCode: 0 finishedAt: "2024-09-16T20:21:18Z" reason: Completed startedAt: "2024-09-16T20:21:18Z" phase: Running podIP: 10.244.9.88 podIPs: - ip: 10.244.9.88 qosClass: Guaranteed startTime: "2024-09-16T20:21:16Z" ``` We have this kubevirt pod deployed and are still seeing the throttling reported per container, and overall for the pod
Interestingly if we run a pod like this apiVersion: v1
kind: Pod
metadata:
name: ubuntu
labels:
app: ubuntu
spec:
containers:
- image: ubuntu
command:
- "sleep"
- "604800"
imagePullPolicy: IfNotPresent
name: ubuntu
resources:
requests:
memory: "4Gi"
cpu: "4"
limits:
memory: "4Gi"
cpu: "4"
restartPolicy: Always We don't see the throttling in reported in Anything else we could look into for this behavior? |
The issue is marked as stale since no activity has been recorded in 30 days |
Before creating an issue, make sure you've checked the following:
Platform
Version
v1.30.4+k0s.0
Sysinfo
`k0s sysinfo`
What happened?
When deploying a KubeVirt VM, we can't achieve 100% CPU usage without throttling. This is despite enabling CPUManager
static
policy, and the VM pod being given theGuaranteed
QOS along with the kubevirt options for dedicated CPU placement. We don't see this issue on other K8s distributions likerke2
with the same manifest, and can achieve 100% utilization of cores.Steps to reproduce
Deploy k0s with kubelet extra args for CPU manager
k0s installflags
Deploy a VM that should have dedicated CPU resources
test-vm.yaml
Observe in
htop
as well as in cgroup info that the CPU is being throttled. You can runfor i in $(seq $(getconf _NPROCESSORS_ONLN)); do yes > /dev/null & done
on the VM to pin the CPUs at 100%.Expected behavior
We would expect the CPU to actually be pinned at 100% with no throttling.
Actual behavior
CPU is throttled and the VM can't achieve 100% CPU utilization on the host. Interestingly, this doesn't seem to be an issue when deploying pods, and they can use 100% CPU. We also do see that the VM is active on all requested cores, and that the only processes scheduled on those cores are the kubevirt ones.
Screenshots and logs
No response
Additional context
Here's the kublet config for each host extracted from the running cluster.
kublet config k0s
kubelet config rke2 (working)
The text was updated successfully, but these errors were encountered: