Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

microk8s is not running. microk8s.inspect showing no error #886

Open
ibigbug opened this issue Jan 2, 2020 · 77 comments
Open

microk8s is not running. microk8s.inspect showing no error #886

ibigbug opened this issue Jan 2, 2020 · 77 comments
Labels

Comments

@ibigbug
Copy link

ibigbug commented Jan 2, 2020

Please run microk8s.inspect and attach the generated tarball to this issue.

wtf@k8s-master:~$ microk8s.inspect
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-flanneld is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver is running
Service snap.microk8s.daemon-apiserver-kicker is running
Service snap.microk8s.daemon-proxy is running
Service snap.microk8s.daemon-kubelet is running
Service snap.microk8s.daemon-scheduler is running
Service snap.microk8s.daemon-controller-manager is running
Service snap.microk8s.daemon-etcd is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster

Building the report tarball
Report tarball is at /var/snap/microk8s/1107/inspection-report-20200102_011315.tar.gz

inspection-report-20200102_011315.tar.gz

wtf@k8s-master:~$ microk8s.status
microk8s is not running. Use microk8s.inspect for a deeper inspection.

We appreciate your feedback. Thank you for using microk8s.

@balchua
Copy link
Collaborator

balchua commented Jan 2, 2020

Your apiserver is complaining about an invalid bearer token.

Jan 02 01:13:06 k8s-master.syd.home microk8s.daemon-apiserver[4971]: E0102 01:13:06.280497    4971 authentication.go:104] Unable to authenticate the request due to an error: invalid bearer token
Jan 02 01:13:06 k8s-master.syd.home microk8s.daemon-apiserver[4971]: E0102 01:13:06.453439    4971 authentication.go:104] Unable to authenticate the request due to an error: invalid bearer token

Was this a fresh installation?

@ibigbug
Copy link
Author

ibigbug commented Jan 2, 2020

@balchua no it's not. I rebooted the machine after it's been running for a while

@balchua
Copy link
Collaborator

balchua commented Jan 2, 2020

Thanks @ibigbug can you try to restart microk8s? microk8s.stop then microk8s.start to see if it resolve the issue?

@ibigbug
Copy link
Author

ibigbug commented Jan 2, 2020

@balchua not seem working

wtf@k8s-master:~$ microk8s.stop
[sudo] password for wtf:
Stopped.
wtf@k8s-master:~$ microk8s.start
Started.
Enabling pod scheduling
wtf@k8s-master:~$ microk8s.status
microk8s is not running. Use microk8s.inspect for a deeper inspection.
wtf@k8s-master:~$ 

@ibigbug
Copy link
Author

ibigbug commented Jan 5, 2020

pod status

admin@k8s-master:~$ kubectl get po -n kube-system
NAME                                              READY   STATUS        RESTARTS   AGE
coredns-9b8997588-kldbv                           0/1     Pending       0          2d14h
coredns-9b8997588-xllr9                           0/1     Terminating   0          14d
dashboard-metrics-scraper-687667bb6c-kg6zd        1/1     Terminating   0          14d
dashboard-metrics-scraper-687667bb6c-sqdj4        0/1     Pending       0          2d13h
filebeat-p6nfk                                    1/1     Running       0          14d
filebeat-w55z9                                    1/1     Running       1          14d
heapster-v1.5.2-5c58f64f8b-4dfw2                  4/4     Terminating   0          14d
heapster-v1.5.2-5c58f64f8b-v5699                  0/4     Pending       0          2d13h
hostpath-provisioner-7b9cb5cdb4-f7jh7             1/1     Terminating   0          14d
hostpath-provisioner-7b9cb5cdb4-wgmwq             0/1     Pending       0          2d13h
kubernetes-dashboard-5c848cc544-4rlxr             1/1     Terminating   1          14d
kubernetes-dashboard-5c848cc544-j2vzv             0/1     Pending       0          2d14h
metricbeat-55f4fc45cb-5whm2                       1/1     Terminating   1          14d
metricbeat-55f4fc45cb-l49zf                       0/1     Pending       0          2d14h
metricbeat-cw92z                                  1/1     Running       0          14d
metricbeat-kkq8s                                  1/1     Running       1          14d
monitoring-influxdb-grafana-v4-6d599df6bf-lzqtw   2/2     Terminating   2          14d
monitoring-influxdb-grafana-v4-6d599df6bf-pfsdx   0/2     Pending       0          2d14h

@balchua
Copy link
Collaborator

balchua commented Jan 5, 2020

Are you running multi nodes?

@ibigbug
Copy link
Author

ibigbug commented Jan 5, 2020

yes 1 master + 1 follower

@balchua
Copy link
Collaborator

balchua commented Jan 5, 2020

Can you go to the worker/follower node and do a microk8s.stop and microk8s.start?

@ibigbug
Copy link
Author

ibigbug commented Jan 5, 2020

it doesn't actually allow me:

admin@k8s-node1:~$ microk8s.stop
This MicroK8s deployment is acting as a node in a cluster. Please use the microk8s.stop on the master.

@balchua
Copy link
Collaborator

balchua commented Jan 5, 2020

Is it possible to make it a single node cluster to see if it is still running? I think you may need to do microk8s.leave or microk8s.remove-node something like that.

@ibigbug
Copy link
Author

ibigbug commented Jan 5, 2020

still not working. maybe I'll just reinstall..

@balchua
Copy link
Collaborator

balchua commented Jan 5, 2020

You may want to pin it to a particular channel ex. 1.16 stable.

@ktsakalozos
Copy link
Member

@ibigbug I see that the kubelets cannot register with the apiserver. The last time they registered with the API server was on the 22nd of Dec. The error you have looks like this:

Jan 02 01:13:06 k8s-master.syd.home microk8s.daemon-kubelet[9551]: E0102 01:13:06.297471    9551 kubelet.go:2263] node "k8s-master.syd.home" not found
Jan 02 01:13:06 k8s-master.syd.home microk8s.daemon-kubelet[9551]: E0102 01:13:06.399105    9551 kubelet.go:2263] node "k8s-master.syd.home" not found
Jan 02 01:13:06 k8s-master.syd.home microk8s.daemon-kubelet[9551]: E0102 01:13:06.487449    9551 reflector.go:156] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Failed to list *v1.Node: Unauthorized

Any idea what might have changed around then?

@ibigbug
Copy link
Author

ibigbug commented Jan 8, 2020

if it's saying node not found, might be due to the reboot of VM?

@pankajxyz
Copy link

I also had the same issue. It happens with v1.17 only (other versions like v1.16, v1.15, v1.14) are ok. Also, it happens with v1.17 after I try to install kubeflow using
microk8s.enable kubeflow
which basically throws an error about Juju. To resolve that I did install Juju and lxd and did
juju bootstrap
after this
microk8s.status
gives me microk8s not running.

I reproduced this behaviour in another machine as well.

@TribalNightOwl
Copy link

TribalNightOwl commented Feb 2, 2020

Same error.
Running single node.
microk8s version:
installed: v1.17.2 (1173) 179MB classic

$ microk8s.start
Started.
Enabling pod scheduling
$ microk8s.status
microk8s is not running. Use microk8s.inspect for a deeper inspection.
$ microk8s.inspect 
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-flanneld is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Service snap.microk8s.daemon-etcd is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster

Building the report tarball
  Report tarball is at /var/snap/microk8s/1173/inspection-report-20200202_114517.tar.gz

inspection-report-20200202_114517.tar.gz

@TribalNightOwl
Copy link

Either removing and re-installing fixed the issue or the version:
installed: v1.17.0 (1109) 179MB classic

$ snap remove microk8s 
microk8s removed

$ microk8s.status
bash: /snap/bin/microk8s.status: No such file or directory


$ sudo snap install microk8s --classic --channel=1.17/stable

microk8s (1.17/stable) v1.17.0 from Canonical✓ installed


$ microk8s.start
Started.
Enabling pod scheduling
node/blushy already uncordoned


$ microk8s.status
microk8s is running
addons:
cilium: disabled
dashboard: disabled
dns: disabled
fluentd: disabled
gpu: disabled
helm: disabled
ingress: disabled
istio: disabled
jaeger: disabled
juju: disabled
knative: disabled
kubeflow: disabled
linkerd: disabled
metallb: disabled
metrics-server: disabled
prometheus: disabled
rbac: disabled
registry: disabled
storage: disabled


@TribalNightOwl
Copy link

After several delete and re-installs, I narrowed it down to microk8s dying the moment I try to change the context to use.

I enabled DNS, then created two namespaces, then two contexts, I checked the status of microk8s after each command and it was running.

$ kubectl get namespaces 
NAME                 STATUS   AGE
default              Active   52s
jenkinsmaster-dev    Active   5s
jenkinsmaster-prod   Active   5s
kube-node-lease      Active   66s
kube-public          Active   66s
kube-system          Active   67s


$ kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://127.0.0.1:16443
  name: microk8s-cluster
contexts:
- context:
    cluster: microk8s
    namespace: jenkinsmaster-dev
    user: admin
  name: jenkinsmaster-dev
- context:
    cluster: microk8s
    namespace: jenkinsmaster-prod
    user: admin
  name: jenkinsmaster-prod
- context:
    cluster: microk8s-cluster
    user: admin
  name: microk8s
current-context: microk8s
kind: Config
preferences: {}
users:
- name: admin
  user:
    password: bCtlMTl6dUhSVXlFb1hVRXpYcWs0QUpzbFc4dFpPd2hsb3U4UVA0UFo0VT0K
    username: admin


$ kubectl config current-context 
microk8s

After I did:

$ kubectl config use-context jenkinsmaster-dev 
Switched to context "jenkinsmaster-dev".


$ microk8s.status
microk8s is not running. Use microk8s.inspect for a deeper inspection.

@balchua
Copy link
Collaborator

balchua commented Feb 2, 2020

@TribalNightOwl thanks for the info. When you added the context, did u add it to the file /var/snap/microk8s/current/credentials/client.config?
And the kubectl you are using is an alias?
Thanks again.

@balchua
Copy link
Collaborator

balchua commented Feb 2, 2020

@TribalNightOwl your context jenkinsmaster-dev and jenkinsmaster-prod is pointing to a non existing cluster microk8s.
It should be microk8s-cluster.

@TribalNightOwl
Copy link

@TribalNightOwl thanks for the info. When you added the context, did u add it to the file /var/snap/microk8s/current/credentials/client.config?

No, I just used these commands:

microk8s.kubectl config set-context jenkinsmaster-dev --namespace=jenkinsmaster-dev   --cluster=microk8s   --user=admin

microk8s.kubectl config set-context jenkinsmaster-prod --namespace=jenkinsmaster-prod   --cluster=microk8s   --user=admin

And the kubectl you are using is an alias?

yes:

alias kubectl='microk8s.kubectl'

@TribalNightOwl
Copy link

@TribalNightOwl your context jenkinsmaster-dev and jenkinsmaster-prod is pointing to a non existing cluster microk8s.
It should be microk8s-cluster.

I will try again and report back. Although I would argue that microk8s shouldn't stop running (and refuse to start) due to this.

@TribalNightOwl
Copy link

$ snap install microk8s --classic
microk8s v1.17.2 from Canonical✓ installed

$ microk8s.enable dns
Enabling DNS
Applying manifest
serviceaccount/coredns created
configmap/coredns created
deployment.apps/coredns created
service/kube-dns created
clusterrole.rbac.authorization.k8s.io/coredns created
clusterrolebinding.rbac.authorization.k8s.io/coredns created
Restarting kubelet
[sudo] password for hugo: 
DNS is enabled

$ kubectl apply -f namespaces.yaml 
namespace/jenkinsmaster-dev created
namespace/jenkinsmaster-prod created


$ kubectl get namespaces 
NAME                 STATUS   AGE
default              Active   94s
jenkinsmaster-dev    Active   4s
jenkinsmaster-prod   Active   4s
kube-node-lease      Active   107s
kube-public          Active   107s
kube-system          Active   108s

$ microk8s.kubectl config set-context jenkinsmaster-dev --namespace=jenkinsmaster-dev \
>   --cluster=microk8s-cluster \
>   --user=admin
Context "jenkinsmaster-dev" created.


$ microk8s.kubectl config set-context jenkinsmaster-prod --namespace=jenkinsmaster-prod \
>   --cluster=microk8s-cluster \
>   --user=admin
Context "jenkinsmaster-prod" created.


$ kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://127.0.0.1:16443
  name: microk8s-cluster
contexts:
- context:
    cluster: microk8s-cluster
    namespace: jenkinsmaster-dev
    user: admin
  name: jenkinsmaster-dev
- context:
    cluster: microk8s-cluster
    namespace: jenkinsmaster-prod
    user: admin
  name: jenkinsmaster-prod
- context:
    cluster: microk8s-cluster
    user: admin
  name: microk8s
current-context: microk8s
kind: Config
preferences: {}
users:
- name: admin
  user:
    password: ZytkS1o5NVZhZWRTU0t3NnNReFhHaHpRcHRaaUxkaG1XNWFBTXFPbVNNaz0K
    username: admin

$ kubectl config use-context jenkinsmaster-dev 
Switched to context "jenkinsmaster-dev".

$ microk8s.status
microk8s is running
addons:
cilium: disabled
dashboard: disabled
dns: enabled
fluentd: disabled
gpu: disabled
helm3: disabled
helm: disabled
ingress: disabled
istio: disabled
jaeger: disabled
juju: disabled
knative: disabled
kubeflow: disabled
linkerd: disabled
metallb: disabled
metrics-server: disabled
prometheus: disabled
rbac: disabled
registry: disabled
storage: disabled

BINGO! it didn't die this time.

$ kubectl config current-context 
jenkinsmaster-dev

@balchua
Copy link
Collaborator

balchua commented Feb 3, 2020

@TribalNightOwl microk8s is not actually dying. The status command uses the kubeconfig settings to verify the cluster's health. So if the kubeconfig is misconfigured it will not be able to gather kubernetes resources, hence it will say not running

The message can be misleading though.

@TribalNightOwl
Copy link

$ microk8s.kubectl config set-context jenkinsmaster-dev --namespace=jenkinsmaster-fail   --cluster=microk8s   --user=admin

$ kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://127.0.0.1:16443
  name: microk8s-cluster
contexts:
- context:
    cluster: microk8s
    namespace: jenkinsmaster-fail
    user: admin
  name: jenkinsmaster-dev
- context:
    cluster: microk8s-cluster
    namespace: jenkinsmaster-prod
    user: admin
  name: jenkinsmaster-prod
- context:
    cluster: microk8s-cluster
    user: admin
  name: microk8s
current-context: jenkinsmaster-dev
kind: Config
preferences: {}
users:
- name: admin
  user:
    password: ZytkS1o5NVZhZWRTU0t3NnNReFhHaHpRcHRaaUxkaG1XNWFBTXFPbVNNaz0K
    username: admin


$ microk8s.status
microk8s is not running. Use microk8s.inspect for a deeper inspection.
sudo vi /var/snap/microk8s/current/credentials/client.config

Manually deleted this section:

- context:
    cluster: microk8s
    namespace: jenkinsmaster-fail
    user: admin
  name: jenkinsmaster-dev


$ kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://127.0.0.1:16443
  name: microk8s-cluster
contexts:
- context:
    cluster: microk8s-cluster
    namespace: jenkinsmaster-prod
    user: admin
  name: jenkinsmaster-prod
- context:
    cluster: microk8s-cluster
    user: admin
  name: microk8s
current-context: jenkinsmaster-dev
kind: Config
preferences: {}
users:
- name: admin
  user:
    password: ZytkS1o5NVZhZWRTU0t3NnNReFhHaHpRcHRaaUxkaG1XNWFBTXFPbVNNaz0K
    username: admin

$ microk8s.status
microk8s is not running. Use microk8s.inspect for a deeper inspection.

@TribalNightOwl
Copy link

Hold on, I got it:

My previous was still pointing to something non-existent.

I did:

 kubectl config use-context jenkinsmaster-prod
Switched to context "jenkinsmaster-prod".

$ microk8s.status
microk8s is running
addons:
cilium: disabled
dashboard: disabled
dns: enabled
fluentd: disabled
gpu: disabled
helm3: disabled
helm: disabled
ingress: disabled
istio: disabled
jaeger: disabled
juju: disabled
knative: disabled
kubeflow: disabled
linkerd: disabled
metallb: disabled
metrics-server: disabled
prometheus: disabled
rbac: disabled
registry: disabled
storage: disabled

That completely proves your previous comment, thanks!

@TribalNightOwl
Copy link

How about changing the message:

Currently:

microk8s is not running. Use microk8s.inspect for a deeper inspection.

New:

microk8s is not running. Verify your config is valid and use microk8s.inspect for a deeper inspection.

Or:

microk8s is not running in cluster $CLUSTERNAME. Use microk8s.inspect for a deeper inspection.

Something that would make the user think about having a misconfigured client and not necessarily microk8s dying.

@ktsakalozos
Copy link
Member

How about changing the message

We could also detect such problems and suggest a fix in microk8s.inspect https://github.com/ubuntu/microk8s/blob/master/scripts/inspect.sh#L106

@gavinB-orange
Copy link

On my system I found that the problem went away after I updated the rather too old kubectl installed in /usr/local/bin on my system. I had assumed that microk8s would exclusively use it's own kubectl, but apparently not.

@antsankov
Copy link

Solved it for me @gavinB-orange - had to remove my previously installed kubectl and then it microk8s started working!

rm -rf /usr/local/bin/kubectl

@ktsakalozos
Copy link
Member

@alexgleason you said the server was rebooted, was it a graceful reboot or was it due to some power failure? Could you share the output of ls -al /var/snap/microk8s/current/var/kubernetes/backend? It is possible this is a case of a data corruption this is why @balchua suggested deleting 0000000187009741-0000000187010158. Any other thoughts @MathieuBordere?

@MathieuBordere
Copy link

I have an open issue for it in raft canonical/raft#192, but haven't been able to reproduce yet. The solution in most of the cases is indeed to remove the offending segment.

@alexgleason
Copy link

Hey guys, thank you for the help, I appreciate it. I deleted the file and now I see this in the logs when starting:

Aug 10 13:21:26 tribes-doge microk8s.daemon-apiserver[2483466]: Error: start node: raft_start(): io: closed segment 0000000187010159-0000000187010479 is past last snapshot snapshot-1-187009805-10914062476

inspection-report-20210810_132505.tar.gz

Could you share the output of ls -al /var/snap/microk8s/current/var/kubernetes/backend?

tribes@tribes-doge:~$ ls -al /var/snap/microk8s/current/var/kubernetes/backend
total 198312
drwxrwx--- 2 root microk8s     4096 Aug 10 01:36 .
drwxr-xr-x 3 root root         4096 Jan 31  2021 ..
-rw-rw---- 1 root microk8s  8363696 Aug  7 01:50 0000000187001567-0000000187001905
-rw-rw---- 1 root microk8s  8364056 Aug  7 01:50 0000000187001906-0000000187002477
-rw-rw---- 1 root microk8s  8384000 Aug  7 01:51 0000000187002478-0000000187002927
-rw-rw---- 1 root microk8s  8365928 Aug  7 01:52 0000000187002928-0000000187003240
-rw-rw---- 1 root microk8s  8360528 Aug  7 01:54 0000000187003241-0000000187003592
-rw-rw---- 1 root microk8s  8372912 Aug  7 01:54 0000000187003593-0000000187004173
-rw-rw---- 1 root microk8s  8375864 Aug  7 01:54 0000000187004174-0000000187004738
-rw-rw---- 1 root microk8s  8362832 Aug  7 01:54 0000000187004739-0000000187005179
-rw-rw---- 1 root microk8s  8386952 Aug  7 01:55 0000000187005180-0000000187005556
-rw-rw---- 1 root microk8s  8376008 Aug  7 01:55 0000000187005557-0000000187006123
-rw-rw---- 1 root microk8s  8353400 Aug  7 01:56 0000000187006124-0000000187006547
-rw-rw---- 1 root microk8s  8387024 Aug  7 01:58 0000000187006548-0000000187006868
-rw-rw---- 1 root microk8s  8378888 Aug  7 01:59 0000000187006869-0000000187007247
-rw-rw---- 1 root microk8s  8384936 Aug  7 01:59 0000000187007248-0000000187007824
-rw-rw---- 1 root microk8s  8387888 Aug  7 01:59 0000000187007825-0000000187008385
-rw-rw---- 1 root microk8s  8361320 Aug  7 01:59 0000000187008386-0000000187008805
-rw-rw---- 1 root microk8s  8378312 Aug  7 02:00 0000000187008806-0000000187009176
-rw-rw---- 1 root microk8s  8379896 Aug  7 02:00 0000000187009177-0000000187009740
-rw-rw---- 1 root microk8s  8377592 Aug  7 02:01 0000000187009741-0000000187010158
-rw-rw---- 1 root microk8s  8374712 Aug  7 02:03 0000000187010159-0000000187010479
-rw-rw---- 1 root microk8s  2012312 Aug  7 02:03 0000000187010480-0000000187010555
-rw-rw---- 1 root microk8s     2220 Jan 31  2021 cluster.crt
-rw-rw---- 1 root microk8s     3272 Jan 31  2021 cluster.key
-rw-rw---- 1 root microk8s      126 Aug  7 02:03 cluster.yaml
-rw-rw-r-- 1 root microk8s        2 Aug 10 01:36 failure-domain
-rw-rw---- 1 root microk8s       57 Jan 31  2021 info.yaml
srw-rw---- 1 root microk8s        0 Aug  2 05:54 kine.sock
-rw-rw---- 1 root microk8s       32 Jan 31  2021 metadata1
-rw-rw---- 1 root microk8s 16934349 Aug  7 01:59 snapshot-1-187008781-10913992754
-rw-rw---- 1 root microk8s       96 Aug  7 01:59 snapshot-1-187008781-10913992754.meta
-rw-rw---- 1 root microk8s 16563766 Aug  7 02:00 snapshot-1-187009805-10914062476
-rw-rw---- 1 root microk8s       96 Aug  7 02:00 snapshot-1-187009805-10914062476.meta

was it a graceful reboot or was it due to some power failure?

It was indeed a power failure. 😕

@MathieuBordere
Copy link

You'll have to delete 0000000187010159-0000000187010479 and 0000000187010480-0000000187010555 too to make it start, deleting 0000000187009741-0000000187010158 left a 'hole' in the segments.

Remember to always backup your data before you start deleting things ;-)

@alexgleason
Copy link

Wow that worked, thank you so much! I'm able to see my nodes and resources now. I really appreciate your help.

@MathieuBordere
Copy link

Glad it worked, you will have lost a couple of minutes of data it seems.

@thaunghtike-share
Copy link

@MathieuBordere I am facing error which can't start microk8s .. I tried all posible ways from this issue. but nothing can fix my errors. Can you please check my mk8s inspect tarball ?

@Vin678
Copy link

Vin678 commented Sep 28, 2021

Hi, my local kubernetes installation suddenly stopped working. I got the same problem where microk8s status shows
microk8s is not running. Use microk8s inspect for a deeper inspection.
And microk8s inspect shows no error. Kubectl get pods is showing a connection refused error:
The connection to the server 127.0.0.1:16443 was refused - did you specify the right host or port?
Does someone have any clue why microk8s could suddenly stop working. I didn't do any configuration changes or system updates when this started happening.
inspection-report-20210927_113253.tar.gz

@thaunghtike-share
Copy link

thaunghtike-share commented Sep 28, 2021 via email

@Vin678
Copy link

Vin678 commented Sep 28, 2021

It doesn't look like updating to version 1.20 changed anything. Still get the same error.

inspection-report-20210928_131544.tar.gz

@balchua
Copy link
Collaborator

balchua commented Sep 28, 2021

@Vin678 something is force killing the apiserver.

Sep 27 11:19:27 rc5l-laptop systemd[1]: snap.microk8s.daemon-apiserver.service: Main process exited, code=killed, status=9/KILL
Sep 27 11:19:27 rc5l-laptop systemd[1]: snap.microk8s.daemon-apiserver.service: Failed with result 'signal'.
Sep 27 11:19:27 rc5l-laptop systemd[1]: snap.microk8s.daemon-apiserver.service: Service hold-off time over, scheduling restart.
Sep 27 11:19:27 rc5l-laptop systemd[1]: snap.microk8s.daemon-apiserver.service: Scheduled restart job, restart counter is at 1.
Sep 27 11:19:27 rc5l-laptop 

I couldn't find anything in the logs.

@altrr2
Copy link

altrr2 commented Jan 13, 2022

I run into the same issue today.
microk8s start => started
microk8s status => microk8s is not running. Use microk8s inspect for a deeper inspection.
microk8s inspect => shows no errors
I have to say that i also had the same problem yesterday, and reinstalled both microk8s and kubectl which worked for a while but not this morning.

apparently for me it came down to the x509certificate, eg:
microk8s kubectl get ns
Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2022-01-13T13:02:19Z is before 2022-01-13T13:32:23Z

If I move the clock on the computer 1 hour forward everything works fine then.

I run it on an Ubuntu 20.04 laptop, set to the Automatic Date/Time and GMT timezone. Not sure what caused this, but hope this is useful.

UPD:
it seems to be caused by the laptop's RTC being misconfigured for some reason. The following command fixed it:
timedatectl set-local-rtc 0

@codecoron
Copy link

@balchua
same question!

enviroment

root@ajinlong:/var/snap/microk8s/3597# uname -a
Linux ajinlong 5.15.0-47-generic #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@ajinlong:/var/snap/microk8s/3597# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.1 LTS
Release:	22.04
Codename:	jammy

After snap install microk8s --classic
i check microk8s status. then
microk8s is not running. Use microk8s inspect for a deeper inspection

Inspecting system
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-kubelite is running
  Service snap.microk8s.daemon-k8s-dqlite is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting dqlite
  Inspect dqlite

Building the report tarball
  Report tarball is at /var/snap/microk8s/3597/inspection-report-20220912_154441.tar.gz

But i found processes referring microk8s

root@ajinlong:/home/ajinlong# ps -ef | grep microk8s
root       13376       1 17 15:40 ?        00:01:16 /snap/microk8s/3597/kubelite --scheduler-args-file=/var/snap/microk8s/3597/args/kube-scheduler --controller-manager-args-file=/var/snap/microk8s/3597/args/kube-controller-manager --proxy-args-file=/var/snap/microk8s/3597/args/kube-proxy --kubelet-args-file=/var/snap/microk8s/3597/args/kubelet --apiserver-args-file=/var/snap/microk8s/3597/args/kube-apiserver --kubeconfig-file=/var/snap/microk8s/3597/credentials/client.config --start-control-plane=true
root       13397       1  0 15:40 ?        00:00:00 /bin/bash /snap/microk8s/3597/apiservice-kicker
root       13438       1  5 15:40 ?        00:00:23 /snap/microk8s/3597/bin/k8s-dqlite --storage-dir=/var/snap/microk8s/3597/var/kubernetes/backend/ --listen=unix:///var/snap/microk8s/3597/var/kubernetes/backend/kine.sock:12379
root       13544       1  0 15:40 ?        00:00:00 /bin/bash /snap/microk8s/3597/run-cluster-agent-with-args
root       13556       1  0 15:40 ?        00:00:04 /snap/microk8s/3597/bin/containerd --config /var/snap/microk8s/3597/args/containerd.toml --root /var/snap/microk8s/common/var/lib/containerd --state /var/snap/microk8s/common/run/containerd --address /var/snap/microk8s/common/run/containerd.sock
root       13745   13544  0 15:40 ?        00:00:01 /snap/microk8s/3597/bin/cluster-agent --bind 0.0.0.0:25000 --keyfile /var/snap/microk8s/3597/certs/server.key --certfile /var/snap/microk8s/3597/certs/server.crt --timeout 240
root       17820   15171  0 15:48 pts/2    00:00:00 grep --color=auto microk8s

inspection-report-20220912_154441.tar.gz

@neoaggelos
Copy link
Contributor

Hi @codecoron

In the containerd logs, I see

9月 11 00:01:21 ajinlong microk8s.daemon-containerd[1407]: time="2022-09-11T00:01:21.026730341+08:00" level=info msg="trying next host" error="failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.1\": dial tcp 64.233.189.82:443: i/o timeout" host=k8s.gcr.io

Can you try to see if https://microk8s.io/docs/registry-private#configure-registry-mirrors-7 solves your issue?

@codecoron
Copy link

codecoron commented Sep 13, 2022

It doesn't work

root@ajinlong:/home# cat /var/snap/microk8s/current/args/certs.d/k8s.gcr.io/hosts.toml

server = "https://k8s.gcr.io"

[host."https://registry.cn-hangzhou.aliyuncs.com/google_containers"]
capabilities = ["pull", "resolve"]

Any ideas? @neoaggelos

More related info

root@ajinlong:/home# microk8s kubectl get nodes
NAME       STATUS     ROLES    AGE   VERSION
ajinlong   NotReady   <none>   31h   v1.24.4-2+2f38f78fa07274
root@ajinlong:/home# microk8s kubectl get services
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.152.183.1   <none>        443/TCP   31h

inspection-report-20220913_231501.tar.gz

@bangzhuzhu
Copy link

same error, but I worked in a wsl system:
`Inspecting system
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-kubelite is running
Service snap.microk8s.daemon-k8s-dqlite is running
Service snap.microk8s.daemon-apiserver-kicker is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy openSSL information to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy current linux distribution to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Inspecting dqlite
Inspect dqlite

Building the report tarball
Report tarball is at /var/snap/microk8s/3883/inspection-report-20220914_024117.tar.gz`

inspection-report-20220914_024117.tar.gz

Thank you

@bangzhuzhu
Copy link

image

@bioinformatist
Copy link

As Chinese user, it may caused by GFW.

As /var/snap/microk8s/3xxx/inspection-report/snap.microk8s.daemon-kubelite/journal.log shows:

Sep 23 12:41:15 an microk8s.daemon-kubelite[1773]: E0923 12:41:15.844909    1773 remote_runtime.go:222] "RunPodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = failed to get sandbox image \"k8s.gcr.io/pause:3.7\": failed to pull image \"k8s.gcr.io/pause:3.7\": failed to pull and unpack image \"k8s.gcr.io/pause:3.7\": failed to resolve reference \"k8s.gcr.io/pause:3.7\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.7\": dial tcp 142.251.8.82:443: i/o timeout"

@kmcgill88
Copy link

I'm in the same boat. I have 3 boxes, all with the same hardware and 1 will not run microk8s, even with a fresh install of Ubuntu 22.04.1 LTS. I've uninstalled, re-installed, reset microk8s, wiped the HD, and re-installed the OS. After all this microk8s extremely unstable.

Run describe pods 3 times in a row.

dave@dave:~$ microk8s kubectl describe pods -A
No resources found
dave@dave:~$ microk8s kubectl describe pods -A
No resources found
dave@dave:~$ microk8s kubectl describe pods -A
The connection to the server 127.0.0.1:16443 was refused - did you specify the right host or port?

dave@dave:~$ microk8s status
microk8s is not running. Use microk8s inspect for a deeper inspection.

dave@dave:~$ microk8s inspect
[sudo] password for dave: 
Inspecting system
Inspecting Certificates
Inspecting services
  Service snap.microk8s.

daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-kubelite is running
  Service snap.microk8s.daemon-k8s-dqlite is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy openSSL information to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster
Inspecting dqlite
  Inspect dqlite

Building the report tarball
  Report tarball is at /var/snap/microk8s/4390/inspection-report-20230117_195958.tar.gz

I'm not sure what to be looking for in the logs but a few things looks suspicious.

Jan 17 02:07:10 dave microk8s.daemon-k8s-dqlite[807]: I0117 02:07:10.488526     807 log.go:198] Failure domain set to 1
Jan 17 02:07:10 dave microk8s.daemon-k8s-dqlite[807]: I0117 02:07:10.488555     807 log.go:198] TLS enabled
Jan 17 02:07:18 dave microk8s.daemon-k8s-dqlite[807]: I0117 02:07:18.248940     807 log.go:198] Connecting to kine endpoint: dqlite://k8s?peer-file=/var/snap/microk8s/4390/var/kubernetes/backend/localnode.yaml&driver-name=dqlite-1
Jan 17 02:07:18 dave microk8s.daemon-k8s-dqlite[807]: time="2023-01-17T02:07:18Z" level=info msg="New kine for dqlite."
Jan 17 02:07:18 dave microk8s.daemon-k8s-dqlite[807]: time="2023-01-17T02:07:18Z" level=info msg="DriverName is dqlite-1."
Jan 17 02:07:18 dave microk8s.daemon-k8s-dqlite[807]: time="2023-01-17T02:07:18Z" level=info msg="Kine listening on unix:///var/snap/microk8s/4390/var/kubernetes/backend/kine.sock:12379"
Jan 17 04:02:42 dave microk8s.daemon-k8s-dqlite[807]: time="2023-01-17T04:02:42Z" level=error msg="error in txn: exec (try: 0): context canceled"
Jan 17 12:32:41 dave microk8s.daemon-k8s-dqlite[807]: time="2023-01-17T12:32:41Z" level=error msg="error in txn: exec (try: 0): context canceled"
Jan 17 16:27:48 dave microk8s.daemon-k8s-dqlite[807]: time="2023-01-17T16:27:48Z" level=error msg="error in txn: exec (try: 0): context canceled"
Jan 17 18:52:39 dave microk8s.daemon-k8s-dqlite[807]: time="2023-01-17T18:52:39Z" level=error msg="error in txn: exec (try: 0): context canceled"

and

Jan 17 02:06:32 dave systemd[1]: Started Service for snap application microk8s.daemon-apiserver-kicker.
Jan 17 02:06:53 dave microk8s.daemon-apiserver-kicker[1377]: chgrp: cannot access '/var/snap/microk8s/common/run/containerd.sock': No such file or directory
Jan 17 10:02:24 dave microk8s.daemon-apiserver-kicker[1011707]: chmod: cannot access '/var/snap/microk8s/4390/var/kubernetes/backend/open-260': No such file or directory
Jan 17 18:32:46 dave microk8s.daemon-apiserver-kicker[2099193]: chmod: cannot access '/var/snap/microk8s/4390/var/kubernetes/backend/open-539': No such file or directory

inspection-report-20230117_195958.tar.gz

@averri
Copy link

averri commented Apr 23, 2023

I get the same issue if installing Microk8s version 1.27 and then downgrading to version 1.26.

@tiansiyuan
Copy link

I have the same issue with MicroK8s v1.26.4 revision 5219.

microk8s stop

microk8s start

microk8s status

microk8s is not running. Use microk8s inspect for a deeper inspection.

microk8s inspect
Inspecting system
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-kubelite is running
Service snap.microk8s.daemon-k8s-dqlite is running
Service snap.microk8s.daemon-apiserver-kicker is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy openSSL information to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy current linux distribution to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Inspecting dqlite
Inspect dqlite

Building the report tarball
The report tarball inspection-report-20230522_161922.tar.gz is stored on the current directory

inspection-report-20230522_161922.tar.gz

@neoaggelos
Copy link
Contributor

Hi @tiansiyuan

I see the following repeated in the containerd logs:

May 18 17:16:35 microk8s-vm microk8s.daemon-containerd[3211]: time="2023-05-18T17:16:35.379585244+08:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:calico-node-j524j,Uid:90f09e1a-1094-48a6-afb5-26f1fe42645f,Namespace:kube-system,Attempt:0,} failed, error" error="failed to get sandbox image \"registry.k8s.io/pause:3.7\": failed to pull image \"registry.k8s.io/pause:3.7\": failed to pull and unpack image \"registry.k8s.io/pause:3.7\": failed to resolve reference \"registry.k8s.io/pause:3.7\": failed to do request: Head \"https://asia-east1-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.7\": dial tcp 108.177.97.82:443: i/o timeout"

Where is this? If a firewall is blocking access to registry.k8s.io, check if the following helps you https://microk8s.io/docs/registry-private#configure-registry-mirrors-7

Thanks!

@dstkibrom
Copy link

I run into the same issue today. microk8s start => started microk8s status => microk8s is not running. Use microk8s inspect for a deeper inspection. microk8s inspect => shows no errors I have to say that i also had the same problem yesterday, and reinstalled both microk8s and kubectl which worked for a while but not this morning.

apparently for me it came down to the x509certificate, eg: microk8s kubectl get ns Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2022-01-13T13:02:19Z is before 2022-01-13T13:32:23Z

If I move the clock on the computer 1 hour forward everything works fine then.

I run it on an Ubuntu 20.04 laptop, set to the Automatic Date/Time and GMT timezone. Not sure what caused this, but hope this is useful.

UPD: it seems to be caused by the laptop's RTC being misconfigured for some reason. The following command fixed it: timedatectl set-local-rtc 0

My microk8s also stopped working when i changed the time and I was getting similar erros. now is running perfect after fixing the time.

@linonetwo
Copy link

Can you try to see if https://microk8s.io/docs/registry-private#configure-registry-mirrors-7 solves your issue?

I confirm this solves it.

After applying a valid mirror, and sudo snap restart microk8s, status is good:

$ microk8s.status
microk8s is running

@MarkOWiesemann
Copy link

I am just going to append to this issue, I seem to have the same problem but no idea on how to read the tarball (is there documentation for that?)
My tarball:
inspection-report-20240217_183613.tar.gz

Copy link

stale bot commented Jan 14, 2025

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the inactive label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests