You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In clusters with hardware nodes, a new PVC and its workload can be stuck in Pending state if they are scheduled without nodeAffinity.
Steps to reproduce:
run a cluster that includes a hardware worker, and label the hw node with instance.hetzner.cloud/is-root-server=true as mentioned in the README
install CSI driver according to instructions
apply the test-pvc and pod mentioned in the README, using the default storageClass with WaitForFirstConsumer volumeBindingMode
Expected Behaviour:
hcloud-csi-controller should provide the desired / required topology constaints to the k8s scheduler, which then schedules the pod on a node fulfilling the topology requirements.
As the hardware node does not run csi-driver and cannot mount hetzner cloud volumes, the workload should not be scheduled there.
Observed Behaviour:
Both pvc and pod are stuck in Pending state.
the container csi-provisioner of the CSI Controller deployment logs this Error:
'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "hcloud-volumes": error generating accessibility requirements: no topology key found on CSINode hardwarenode.testcluster
More Info:
the DaemonSet for hcloud-csi-node does not run on the hw node
because of this, the csinode object for the node lists no driver:
kubectl get csinode
NAME DRIVERS AGE
virtualnode.testcluster 1 1d
hardwarenode.testcluster 0 1d
the csinode object of the hardware node does not have a driver and therefore no topology key, as the node intentionally runs no hcloud-csi-node pod due to the nodeAffinity:
It seems we are hitting this Issue in csi-provisioner.
As the hardware node has no csi-driver pod and therefore no driver or topology key listed, the csi-provisioner breaks. It is trying to build the preferred topology to give it to the scheduler, but as the hardware node has no topology key, the csi-provisioner fails. Pod and PVC cannot finish scheduling and remain in Pending state forever.
Workaround
This issue can be avoided by making sure the object that uses the PVC (StatefulSet, Pod etc.) cannot be scheduled on the hardware node in the first place. This can be done by specifying a nodeAffinity:
The external-provisioner Issue, lists a few possible solutions on the csi-driver side, such as running the csi-driver on all nodes, including hardware nodes.
CSI-controller would then need to be aware of which nodes are virtual or hardware when providing the topology preferences to the k8s scheduler.
The text was updated successfully, but these errors were encountered:
In clusters with hardware nodes, a new PVC and its workload can be stuck in
Pending
state if they are scheduled without nodeAffinity.Steps to reproduce:
instance.hetzner.cloud/is-root-server=true
as mentioned in the READMEWaitForFirstConsumer
volumeBindingModeExpected Behaviour:
hcloud-csi-controller should provide the desired / required topology constaints to the k8s scheduler, which then schedules the pod on a node fulfilling the topology requirements.
As the hardware node does not run csi-driver and cannot mount hetzner cloud volumes, the workload should not be scheduled there.
Observed Behaviour:
Pending
state.csi-provisioner
of the CSI Controller deployment logs this Error:More Info:
hcloud-csi-node
does not run on the hw nodecsinode
object for the node lists no driver:csinode
object of the virtual node looks ok:csinode
object of the hardware node does not have a driver and therefore no topology key, as the node intentionally runs nohcloud-csi-node
pod due to thenodeAffinity
:Theory
It seems we are hitting this Issue in csi-provisioner.
As the hardware node has no csi-driver pod and therefore no driver or topology key listed, the csi-provisioner breaks. It is trying to build the preferred topology to give it to the scheduler, but as the hardware node has no topology key, the csi-provisioner fails. Pod and PVC cannot finish scheduling and remain in
Pending
state forever.Workaround
This issue can be avoided by making sure the object that uses the PVC (StatefulSet, Pod etc.) cannot be scheduled on the hardware node in the first place. This can be done by specifying a
nodeAffinity
:Proposed Solution
The external-provisioner Issue, lists a few possible solutions on the csi-driver side, such as running the csi-driver on all nodes, including hardware nodes.
CSI-controller would then need to be aware of which nodes are virtual or hardware when providing the topology preferences to the k8s scheduler.
The text was updated successfully, but these errors were encountered: