Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't upgrade k8s using snap refresh #634

Open
mhalano opened this issue Aug 28, 2024 · 7 comments
Open

Can't upgrade k8s using snap refresh #634

mhalano opened this issue Aug 28, 2024 · 7 comments

Comments

@mhalano
Copy link

mhalano commented Aug 28, 2024

Summary

I tried to upgrade my snap k8s installed on a single node and got this error:

mhalano@skynet:~$ sudo snap refresh k8s
2024-08-28T22:51:36Z INFO Waiting for "snap.k8s.kube-apiserver.service" to stop.
error: cannot perform the following tasks:
- Run configure hook of "k8s" snap if present (run hook "configure": Node is not part of a cluster: <nil>)

What Should Happen Instead?

Should complete the upgrade correctly

Reproduction Steps

  1. Install k8s in an older version. Mine is 1.30.3
  2. Execute sudo snap refresh k8s to get the latest version. At this moment is 1.31.0
  3. See the error

System information

inspection-report-20240828_225625.tar.gz

Can you suggest a fix?

No response

Are you interested in contributing with a fix?

I don't know how I can fix it, but I can help troubleshooting and validate any possible solution. My cluster is not critical.

@bschimke95
Copy link
Contributor

Hi @mhalano

Thanks for reporting this issue.
I think that I have addressed this issue with #633 yesterday evening.

For me, the upgrade to the latest version works:

➜ sudo snap refresh k8s --channel=1.30-classic/candidate --classic
k8s (1.30-classic/candidate) v1.30.3 from Canonical✓ refreshed

➜ sudo k8s bootstrap                                              
Bootstrapping the cluster. This may take a few seconds, please wait.
Bootstrapped a new Kubernetes cluster with node address "192.168.178.36:6400".
The node will be 'Ready' to host workloads after the CNI is deployed successfully.

➜ sudo snap refresh k8s --channel=latest/edge --classic           
2024-08-29T08:20:22+02:00 INFO Waiting for "snap.k8s.kube-apiserver.service" to stop.
k8s (edge) v1.31.0 from Canonical✓ refreshed

Would you mind retrying this upgrade on latest/edge (revision 991)? Thanks!

@mhalano
Copy link
Author

mhalano commented Aug 29, 2024

@bschimke95 I did the test you mentioned, and the upgrade process works, but it crashes my cluster after:

root@skynet:/var/lib/snapd# k8s kubectl get pods
Error: Failed to retrieve the node status.

The error was: failed to GET /k8sd/node: Get "http://control.socket/1.0/k8sd/node": dial unix /var/snap/k8s/common/var/lib/k8sd/state/control.socket: connect: no such file or directory

I don't know if this problem is still related to the snap anymore. Any tips?

@bschimke95
Copy link
Contributor

This looks like something inside k8sd crashed, causing the unix socket to not be available.

Would you mind sharing an inspection report and the output of snap services k8s?

@mhalano
Copy link
Author

mhalano commented Aug 29, 2024

Here it is:

root@skynet:/var/lib/snapd# snap services k8s
Service                      Startup   Current   Notes
k8s.containerd               enabled   active    -
k8s.k8s-apiserver-proxy      disabled  inactive  -
k8s.k8s-dqlite               enabled   active    -
k8s.k8sd                     enabled   inactive  -
k8s.kube-apiserver           enabled   active    -
k8s.kube-controller-manager  enabled   active    -
k8s.kube-proxy               enabled   active    -
k8s.kube-scheduler           enabled   active    -
k8s.kubelet                  enabled   active    -
root@skynet:/var/lib/snapd# 

@bschimke95
Copy link
Contributor

Thanks! So as assumed, the k8sd is not running.
Could you share the inspection report for this run?

@mhalano
Copy link
Author

mhalano commented Aug 29, 2024

Here it is.
inspection-report-20240829_142849.tar.gz

@bschimke95
Copy link
Contributor

Thanks @mhalano.

It looks like the migrations are not correctly applied.

Aug 29 12:19:31 skynet k8s.k8sd[77766]: Error: Failed to run k8sd: failed to run microcluster: Daemon stopped with error: Daemon failed to start: Failed to re-establish cluster connection: "SELECT\n    t.name, t.expiry\nFROM\n    worker_tokens AS t\nWHERE\n    ( t.token = ? )\nLIMIT 1\n": no such column: t.expiry`

The expiry field was added to the worker token database in latest/edge.
We will work on a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants