-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes stuck on boot with Waiting for service "cri" to be registered #9732
Comments
There's something else going on here - the CRI doesn't start yet, as the user disks haven't been mounted. I don't have full picture (config + expected layout) to say why, but I can see messages like this in the log:
|
Nothing special in terms of layout.
Turns out that on the problematic servers kernel seem to swap nvme0n1 and nvme1n1. I mean i've checked those stuck servers and system disk is reported as I'm unsure why is this happening. It's like 10% of whole setup of semi-identical servers. |
You might have better luck with |
I don't see how |
There's no other way at the moment unfortunately, either way you want to match on a disk which is not a system disk. |
I wonder why it swaps them anyway. |
There's a bit of randomness in the way Linux enumerates devices in general, which applies e.g. to network interfaces as well. This is why those stable |
BTW, 1.7.x used to have this error
Which was more verbose and at least caused a reboot. Seems like 1.8.x would stuck endlessly. |
I'm wondering, why Talos doesn't mount by UUID like other distros do. Like, you know, you can specify |
Please see my response in the beginning of the issue:
I submitted all possible information here, I'll link this issue to #8367, but otherwise there's nothing we can do at the moment. |
Bug Report
After updating to 1.8.x series, including fresh install, some of our servers are unable to boot. Talos API responds, but kubelet is not starting and server stucks in "Booting" stage forever.
Last message from logs is
Waiting for service "cri" to be registered
A server can boot properly after several reboot attempts.
Description
Nodes won't boot properly.
Logs
logs.zip
Environment
The text was updated successfully, but these errors were encountered: