Error: nodes \"my-vk-node\" not found | StorageError: invalid object #260

antoinetran · 2024-07-17T08:23:42Z

Short Description of the issue

Deploying interlink core component, in a vcluster and shared Kubernetes environment, resulted in these errors:

time="2024-07-16T16:08:36Z" level=error msg="Error handling node status update" error="nodes \"my-vk-node\" not found"

time="2024-07-16T16:09:21Z" level=error msg="failed to update node lease" error="Operation cannot be fulfilled on leases.coordination.k8s.io \"my-vk-node\": StorageError: invalid object, Code: 4, Key: /regi
stry/leases/kube-node-lease/my-vk-node, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 4e3bc5f5-d27e-4449-8d09-23136d1258d9, UID in object meta: "

Environment

Operating System: Redhat 8
Other related components versions:

Kubernetes version

kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.26.1

Restriction policy (this is a shared cluster), only limited to one namespace, etc.

vcluster 1.18.1 to enable multinamespace and some level of privilege of Kubernetes cluster

Steps to reproduce

deploy vcluster
deploy interlink following https://intertwin-eu.github.io/interLink/docs/tutorial-admins/deploy-interlink/#remote-slurm-job-submission
patch Refresh token not reused #259 and Web Proxy support #256 applied

Logs, stacktrace, or other symptoms

time="2024-07-16T16:08:06Z" level=info msg="Loading Virtual Kubelet config from /etc/interlink/InterLinkConfig.yaml"
time="2024-07-16T16:08:06Z" level=info msg="Trying InCluster configuration"
time="2024-07-16T16:08:06Z" level=info msg="Loading Virtual Kubelet config from /etc/interlink/InterLinkConfig.yaml"
time="2024-07-16T16:08:06Z" level=info msg=nodeLoop
time="2024-07-16T16:08:06Z" level=debug msg="Starting leasecontroller" leaseController="&{0xc000215c20 300 75000000000 0x2dd6da0 0xc000573c20 <nil>}"
time="2024-07-16T16:08:06Z" level=debug msg="lease controller in use, updating at statusInterval" statusInterval=1m0s
time="2024-07-16T16:08:06Z" level=debug msg="Generated lease"
time="2024-07-16T16:08:06Z" level=debug msg="Successfully created lease"
time="2024-07-16T16:08:06Z" level=debug msg="Generated lease"
time="2024-07-16T16:08:06Z" level=debug msg="Generated lease"
time="2024-07-16T16:08:06Z" level=debug msg="Successfully updated lease" retries=0
time="2024-07-16T16:08:06Z" level=info msg="Pod cache in-sync"
time="2024-07-16T16:08:06Z" level=info msg="receive GetPods"
W0716 16:08:06.318048       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2024-07-16T16:08:06Z" level=info msg=statusLoop
time="2024-07-16T16:08:06Z" level=info msg="Starting the virtual kubelet HTTPs server listening on \"0.0.0.0:10250\""
time="2024-07-16T16:08:06Z" level=info msg="Retrieving ALL Pods registered to the cluster and owned by VK"
time="2024-07-16T16:08:07Z" level=info msg="starting workers"
time="2024-07-16T16:08:07Z" level=info msg="started workers"
time="2024-07-16T16:08:11Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-07-16T16:08:11Z" level=info msg="statusLoop=end"
time="2024-07-16T16:08:11Z" level=info msg=statusLoop
time="2024-07-16T16:08:16Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-07-16T16:08:16Z" level=info msg="statusLoop=end"
time="2024-07-16T16:08:16Z" level=info msg=statusLoop
time="2024-07-16T16:08:21Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-07-16T16:08:21Z" level=info msg="statusLoop=end"
time="2024-07-16T16:08:21Z" level=info msg=statusLoop
time="2024-07-16T16:08:26Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-07-16T16:08:26Z" level=info msg="statusLoop=end"
time="2024-07-16T16:08:26Z" level=info msg=statusLoop
time="2024-07-16T16:08:31Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-07-16T16:08:31Z" level=info msg="statusLoop=end"
time="2024-07-16T16:08:31Z" level=info msg=statusLoop
time="2024-07-16T16:08:36Z" level=info msg="Pinging: https://interlink-slurm-plugin:8443/pinglink"
time="2024-07-16T16:08:36Z" level=error msg="Ping Failed with exit code: -1"
time="2024-07-16T16:08:36Z" level=info msg=endNodeLoop
time="2024-07-16T16:08:36Z" level=debug msg="Received node status update"
time="2024-07-16T16:08:36Z" level=error msg="Error handling node status update" error="nodes \"my-vk-node\" not found"

...


time="2024-07-16T16:09:21Z" level=debug msg="Generated lease"
time="2024-07-16T16:09:21Z" level=debug msg="Generated lease"
time="2024-07-16T16:09:21Z" level=error msg="failed to update node lease" error="Operation cannot be fulfilled on leases.coordination.k8s.io \"my-vk-node\": StorageError: invalid object, Code: 4, Key: /regi
stry/leases/kube-node-lease/my-vk-node, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 4e3bc5f5-d27e-4449-8d09-23136d1258d9, UID in object meta: "

Summary of proposed changes

The text was updated successfully, but these errors were encountered:

antoinetran · 2024-07-17T09:44:58Z

I don't know if the normal behavior is, when doing kubectl get node, to see the new node? Because I don't see it.

dciangot · 2024-07-17T09:48:31Z

Not trivial to say, it would be interesting to see if at the level of the apiserver/vcluster sync service there is any permission denied.

That would be my first guess, like a node registration silently failing.

antoinetran · 2024-07-17T10:10:06Z

Good point! The only relevant log I can see for now is

2024-07-17 10:05:41     INFO    fake-node.my-vk-node    syncer/fake_syncer.go:83        Delete fake node my-vk-node as it is not needed anymore {"component": "vcluster"}

But I will try without vcluster on GCP later, to see if this is because of vcluster or the Kubernetes restriction policies.

antoinetran · 2024-07-17T14:00:07Z

Ok, I tested:

local Kubernetes with Kind + interlink core => the node appears with kubectl get node
local Kubernetes with Kind + vcluster + interlink core => the node does not appear and I have the same error:

time="2024-07-17T13:59:18Z" level=error msg="Error handling node status update" error="nodes \"my-vk-node\" not found"

and in vcluster pod log

2024-07-17 13:57:18     INFO    node.my-vk-node syncer/syncer.go:136    delete virtual node my-vk-node, because it is not needed anymore        {"component": "vcluster"}

Thus this very related to vcluster, and not to the shared Kubernetes restriction constraints.

antoinetran · 2024-07-17T16:27:34Z

Issue encountered with vcluster 0.18.1 and 0.19.4

antoinetran · 2024-07-18T11:38:23Z

I opened an issue at vcluster side (link beyond). I think there is nothing to do from interlink side, but let's keep this issue open for now, until new info :)

antoinetran mentioned this issue Jul 17, 2024

vcluster and virtual kubelet does not work together loft-sh/vcluster#1944

Open

dciangot added the waiting label Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: nodes \"my-vk-node\" not found | StorageError: invalid object #260

Error: nodes \"my-vk-node\" not found | StorageError: invalid object #260

antoinetran commented Jul 17, 2024

antoinetran commented Jul 17, 2024

dciangot commented Jul 17, 2024

antoinetran commented Jul 17, 2024

antoinetran commented Jul 17, 2024 •

edited

Loading

antoinetran commented Jul 17, 2024

antoinetran commented Jul 18, 2024

Error: nodes \"my-vk-node\" not found | StorageError: invalid object #260

Error: nodes \"my-vk-node\" not found | StorageError: invalid object #260

Comments

antoinetran commented Jul 17, 2024

Short Description of the issue

Environment

Steps to reproduce

Logs, stacktrace, or other symptoms

Summary of proposed changes

antoinetran commented Jul 17, 2024

dciangot commented Jul 17, 2024

antoinetran commented Jul 17, 2024

antoinetran commented Jul 17, 2024 • edited Loading

antoinetran commented Jul 17, 2024

antoinetran commented Jul 18, 2024

antoinetran commented Jul 17, 2024 •

edited

Loading