Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: nodes \"my-vk-node\" not found | StorageError: invalid object #260

Open
antoinetran opened this issue Jul 17, 2024 · 6 comments
Open
Labels

Comments

@antoinetran
Copy link

Short Description of the issue

Deploying interlink core component, in a vcluster and shared Kubernetes environment, resulted in these errors:

time="2024-07-16T16:08:36Z" level=error msg="Error handling node status update" error="nodes \"my-vk-node\" not found"
time="2024-07-16T16:09:21Z" level=error msg="failed to update node lease" error="Operation cannot be fulfilled on leases.coordination.k8s.io \"my-vk-node\": StorageError: invalid object, Code: 4, Key: /regi
stry/leases/kube-node-lease/my-vk-node, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 4e3bc5f5-d27e-4449-8d09-23136d1258d9, UID in object meta: "

Environment

  • Operating System: Redhat 8
  • Other related components versions:
  • Kubernetes version
kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.26.1

Restriction policy (this is a shared cluster), only limited to one namespace, etc.

  • vcluster 1.18.1 to enable multinamespace and some level of privilege of Kubernetes cluster

Steps to reproduce

Logs, stacktrace, or other symptoms

time="2024-07-16T16:08:06Z" level=info msg="Loading Virtual Kubelet config from /etc/interlink/InterLinkConfig.yaml"
time="2024-07-16T16:08:06Z" level=info msg="Trying InCluster configuration"
time="2024-07-16T16:08:06Z" level=info msg="Loading Virtual Kubelet config from /etc/interlink/InterLinkConfig.yaml"
time="2024-07-16T16:08:06Z" level=info msg=nodeLoop
time="2024-07-16T16:08:06Z" level=debug msg="Starting leasecontroller" leaseController="&{0xc000215c20 300 75000000000 0x2dd6da0 0xc000573c20 <nil>}"
time="2024-07-16T16:08:06Z" level=debug msg="lease controller in use, updating at statusInterval" statusInterval=1m0s
time="2024-07-16T16:08:06Z" level=debug msg="Generated lease"
time="2024-07-16T16:08:06Z" level=debug msg="Successfully created lease"
time="2024-07-16T16:08:06Z" level=debug msg="Generated lease"
time="2024-07-16T16:08:06Z" level=debug msg="Generated lease"
time="2024-07-16T16:08:06Z" level=debug msg="Successfully updated lease" retries=0
time="2024-07-16T16:08:06Z" level=info msg="Pod cache in-sync"
time="2024-07-16T16:08:06Z" level=info msg="receive GetPods"
W0716 16:08:06.318048       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2024-07-16T16:08:06Z" level=info msg=statusLoop
time="2024-07-16T16:08:06Z" level=info msg="Starting the virtual kubelet HTTPs server listening on \"0.0.0.0:10250\""
time="2024-07-16T16:08:06Z" level=info msg="Retrieving ALL Pods registered to the cluster and owned by VK"
time="2024-07-16T16:08:07Z" level=info msg="starting workers"
time="2024-07-16T16:08:07Z" level=info msg="started workers"
time="2024-07-16T16:08:11Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-07-16T16:08:11Z" level=info msg="statusLoop=end"
time="2024-07-16T16:08:11Z" level=info msg=statusLoop
time="2024-07-16T16:08:16Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-07-16T16:08:16Z" level=info msg="statusLoop=end"
time="2024-07-16T16:08:16Z" level=info msg=statusLoop
time="2024-07-16T16:08:21Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-07-16T16:08:21Z" level=info msg="statusLoop=end"
time="2024-07-16T16:08:21Z" level=info msg=statusLoop
time="2024-07-16T16:08:26Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-07-16T16:08:26Z" level=info msg="statusLoop=end"
time="2024-07-16T16:08:26Z" level=info msg=statusLoop
time="2024-07-16T16:08:31Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-07-16T16:08:31Z" level=info msg="statusLoop=end"
time="2024-07-16T16:08:31Z" level=info msg=statusLoop
time="2024-07-16T16:08:36Z" level=info msg="Pinging: https://interlink-slurm-plugin:8443/pinglink"
time="2024-07-16T16:08:36Z" level=error msg="Ping Failed with exit code: -1"
time="2024-07-16T16:08:36Z" level=info msg=endNodeLoop
time="2024-07-16T16:08:36Z" level=debug msg="Received node status update"
time="2024-07-16T16:08:36Z" level=error msg="Error handling node status update" error="nodes \"my-vk-node\" not found"

...


time="2024-07-16T16:09:21Z" level=debug msg="Generated lease"
time="2024-07-16T16:09:21Z" level=debug msg="Generated lease"
time="2024-07-16T16:09:21Z" level=error msg="failed to update node lease" error="Operation cannot be fulfilled on leases.coordination.k8s.io \"my-vk-node\": StorageError: invalid object, Code: 4, Key: /regi
stry/leases/kube-node-lease/my-vk-node, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 4e3bc5f5-d27e-4449-8d09-23136d1258d9, UID in object meta: "

Summary of proposed changes

@antoinetran
Copy link
Author

I don't know if the normal behavior is, when doing kubectl get node, to see the new node? Because I don't see it.

@dciangot
Copy link
Collaborator

Not trivial to say, it would be interesting to see if at the level of the apiserver/vcluster sync service there is any permission denied.

That would be my first guess, like a node registration silently failing.

@antoinetran
Copy link
Author

Good point! The only relevant log I can see for now is

2024-07-17 10:05:41     INFO    fake-node.my-vk-node    syncer/fake_syncer.go:83        Delete fake node my-vk-node as it is not needed anymore {"component": "vcluster"}

But I will try without vcluster on GCP later, to see if this is because of vcluster or the Kubernetes restriction policies.

@antoinetran
Copy link
Author

antoinetran commented Jul 17, 2024

Ok, I tested:

  • local Kubernetes with Kind + interlink core => the node appears with kubectl get node
  • local Kubernetes with Kind + vcluster + interlink core => the node does not appear and I have the same error:
time="2024-07-17T13:59:18Z" level=error msg="Error handling node status update" error="nodes \"my-vk-node\" not found"

and in vcluster pod log

2024-07-17 13:57:18     INFO    node.my-vk-node syncer/syncer.go:136    delete virtual node my-vk-node, because it is not needed anymore        {"component": "vcluster"}

Thus this very related to vcluster, and not to the shared Kubernetes restriction constraints.

@antoinetran
Copy link
Author

Issue encountered with vcluster 0.18.1 and 0.19.4

@antoinetran
Copy link
Author

I opened an issue at vcluster side (link beyond). I think there is nothing to do from interlink side, but let's keep this issue open for now, until new info :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants