You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When upgrading the version of Colima I use from v0.6.8 to v0.8.0, I ran into some odd, difficult to understand behaviour.
Tale of woe
The Kubernetes control plane pods would get stuck with status `ImageInspectError` with messages like
message: 'Failed to inspect image "rancher/mirrored-coredns-coredns:1.10.1":
Id or size of image "rancher/mirrored-coredns-coredns:1.10.1" is not set'
I couldn't find much on the internet for that message, save for a post on StackOverflow. Apparently, I was using incompatible versions of Docker and Kubernetes? That seemed odd because I wasn't choosing values for those in my invocation of colima start.
Surely Colima wouldn't choose incompatible versions for the software it runs right? Something else must be going on. Maybe if I find the breaking change, I'll understand better. So I bisected released versions to identify that the first broken version was v0.6.9 and from there bisected local builds to identify that #1032 was the breaking change. So it was specifically a versions problem.
So maybe we just need to find a version of k3s that will work with our recent Docker? Sure enough when I ran with the latest k3s set (--kubernetes-version v1.31.2+k3s1) everything started working. So I set about bisecting on the k3s version, but lo and behold, it also worked at every version I chose! Even the starting version of v1.30.0+k3s1 which I'd pulled from the default of --kubernetes-version documented in the output of colima help start!? Wait what? When I set the default explicitly, I get a working system, but when I pass no value everything is broken!? Maybe the default version is a lie?
❯ colima version
colima version v0.6.9
git commit: c3a31ed05f5fab8b2cdbae835198e8fb1717fd0f
runtime: docker
arch: aarch64
client: v26.1.3
server: v27.3.1
kubernetes
Client Version: v1.27.12
Kustomize Version: v5.0.1
Server Version: v1.28.3+k3s2
The default version is a lie! So I set about inspecting the Colima codebase to fix that bug. But i can't find this alternate value anywhere. But I do find that the code can read values out of some kind of optional template. I don't think I have one of those but I check, and sure enough there is a file located at ~/.colima/_templates/default.yaml with the offending server version v1.28.3+k3s2.
I delete the default template, nuke and reinit my Colima VM, and everything comes up fine. Better than fine, this also clears the DNS issue I'd been experiencing for some time (#973).
The long and the short of it was that I had a default template present in my ~/.colima directory, but wasn't aware of it. Presumably I'd created it at some point, perhaps when just starting out with colima when I didn't know what I was doing. The settings in it worked for a while, but degraded over time as I upgraded versions. Looking at a backup of the broken template that I've kept, it looks like the default settings from some old version; I'm not sure I ever used this template to actually override anything.
While user error is partly at fault here, it seems to me that this behaviour of the templates system might be a bit of a footgun. I think this could be defused using only minor tweaks. Ideas I have that might help:
Warn the user when the version of Colima used to create a template does not match the version of Colima that is consuming the template. That might be identified by sniffing unexpectedly absent/present fields and/or old values. Or it could be done by adding an explicit colimaVersion field to the template to support this check.
Inform the user when an implicit template is in use. That is, when they haven't passed --profile but the default template is overriding default values.
If you don't like those concepts, I'm sure there are other ways to make this less likely to happen and/or easier to identify and fix.
Version
colima version v0.6.9
git commit: c3a31ed05f5fab8b2cdbae835198e8fb1717fd0f
runtime: docker
arch: aarch64
client: v26.1.3
server: v27.3.1
kubernetes
Client Version: v1.27.12
Kustomize Version: v5.0.1
Server Version: v1.28.3+k3s2
limactl version 1.0.1
qemu-img version 9.1.0
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers
Inform the user when an implicit template is in use. That is, when they haven't passed --profile but the default template is overriding default values.
Description
When upgrading the version of Colima I use from v0.6.8 to v0.8.0, I ran into some odd, difficult to understand behaviour.
Tale of woe
The Kubernetes control plane pods would get stuck with status `ImageInspectError` with messages likeI couldn't find much on the internet for that message, save for a post on StackOverflow. Apparently, I was using incompatible versions of Docker and Kubernetes? That seemed odd because I wasn't choosing values for those in my invocation of
colima start
.Surely Colima wouldn't choose incompatible versions for the software it runs right? Something else must be going on. Maybe if I find the breaking change, I'll understand better. So I bisected released versions to identify that the first broken version was v0.6.9 and from there bisected local builds to identify that #1032 was the breaking change. So it was specifically a versions problem.
So maybe we just need to find a version of k3s that will work with our recent Docker? Sure enough when I ran with the latest k3s set (
--kubernetes-version v1.31.2+k3s1
) everything started working. So I set about bisecting on the k3s version, but lo and behold, it also worked at every version I chose! Even the starting version ofv1.30.0+k3s1
which I'd pulled from the default of--kubernetes-version
documented in the output ofcolima help start
!? Wait what? When I set the default explicitly, I get a working system, but when I pass no value everything is broken!? Maybe the default version is a lie?The default version is a lie! So I set about inspecting the Colima codebase to fix that bug. But i can't find this alternate value anywhere. But I do find that the code can read values out of some kind of optional template. I don't think I have one of those but I check, and sure enough there is a file located at
~/.colima/_templates/default.yaml
with the offending server versionv1.28.3+k3s2
.I delete the default template, nuke and reinit my Colima VM, and everything comes up fine. Better than fine, this also clears the DNS issue I'd been experiencing for some time (#973).
The long and the short of it was that I had a default template present in my
~/.colima
directory, but wasn't aware of it. Presumably I'd created it at some point, perhaps when just starting out withcolima
when I didn't know what I was doing. The settings in it worked for a while, but degraded over time as I upgraded versions. Looking at a backup of the broken template that I've kept, it looks like the default settings from some old version; I'm not sure I ever used this template to actually override anything.While user error is partly at fault here, it seems to me that this behaviour of the templates system might be a bit of a footgun. I think this could be defused using only minor tweaks. Ideas I have that might help:
colimaVersion
field to the template to support this check.--profile
but the default template is overriding default values.If you don't like those concepts, I'm sure there are other ways to make this less likely to happen and/or easier to identify and fix.
Version
Operating System
Output of
colima status
Reproduction Steps
Expected behaviour
Additional context
No response
The text was updated successfully, but these errors were encountered: