-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple hardware types and install disks in a serverclass, serverclass inheritance, or conditional serverclasses? #1101
Comments
I also noticed that the |
I was going through the documentation again and noticed this page, verry bottom, https://www.sidero.dev/v0.5/resource-configuration/metadata/. How can we specify conditional patches, based on differences in hardware, for a StorageClass? I solved my original issue by patching the server resource. However, this isn't ideal, because there should be a way to make this automatic, instead of having to edit every server object specifically. I'm a strong believer in infrastructure as code, this is one of my primary objectives professionally and part of my duties as an engineer where I work. If we were to adopt this and run it in production, this would be a deal breaker. For instance, say we have a mixture of arm and intel based worker nodes with nvme and standard disks. We can easily use node selectors or affinities to direct workloads onto specific nodes easily enough, but without being able to set nvme/regular disks conditionally on the serverclass, we wouldn't be able to do that. I would agree that it's probably a good idea to keep your control plane running on the same hardware, but this is an evaluation in my personal lab. However, from what I understand currently, all control plane and worker hardware must be similar because the target serviceclass can't handle a mixture of hardware specific patches. It would be ideal if there were serverclass inheritance, or some other way to have a mixture of hardware, which may require different but conflicting patch definitions (like install disks). |
Also, outside of the basic setup, there seems to be no production oriented deployment documentation. It would be useful for those of us that are evaluating this for professional reasons to have documentation oriented for a production ready deployment, with recommended hardening procedures and the like. The goal of my evaluation is to create a production ready, HA deployment of Sidero/Talos. |
I think the easiest solution that comes to mind is, being able to select multiple ServerClasses when defining the MetalMachineTemplate. Possibly using a ServerClass label, that exists on multiple ServerClasses, or something similar. That way you can have various ServerClasses, with a mixture of hardware and install disk configurations, and it could pull from any of the StorageClasses with the common label. |
There are many questions in this issue, so not sure if the original problem is still relevant? The server class the server was picked from can be seen in As install disk is usually specific to the node, it might make sense to make it Also please keep in mind Talos allows picking up system disk using disk selectors: https://www.talos.dev/v1.3/reference/configuration/#installconfig |
/I wasn't aware of the disk selectors. That will be helpful regarding install disks. However, let me lay out a scenario and maybe that will help explain what my issue is with serverclasses. In this scenario, the control plane is using similar hardware setup in the same way. So a single serverclass will work, all the disks are the same, everything works automatically with no user interaction via automatic acceptance. Workload cluster has arm and intel machines. Arm has additional iscsi (/dev/sdb) volumes for persistent data for low i/o workloads. Intel nodes have additional nvme (/dev/nvme0n1) drives for persistent workload data. These nodes may have any number of additional drives, but since this is a production POC, all hardware will be set up similarly. All intel nodes will be similar, all arm nodes will be similar. However I have two In this scenario, on the workload cluster, I can only choose a single There are a couple solutions I've been mulling over:
TLDR: |
Just in case, you can have multiple worker machine deployments for a single cluster this way you could have multiple worker sets coming from multiple ServerClasses. You can also use that to e.g. label them different way.
|
Hmm, I think that will work, making another machine deployment. I should be able to duplicate the below resources for each set of hardware. Not exactly ideal, a lot of code duplication, but this may be a viable solution. This is just a thought so far, I haven't tested anything and won't be able to for a couple of days.
|
I'm still figuring out what can be updated on the fly and what can't. I recently patched node labels at the |
CAPI only natively supports upgrades by replacing nodes. Sidero will deliver config to the machine only once. |
Gotcha, not a big deal, Sidero/Talos makes that process easy and I like it because it's in line with ephemeral hardware practices. However, is there any way to get the This brings up the question though, how can I roll a node without deleting the Server resource? When I left the Server resource alone, deleting only the Machine, the Node/ServerBinding were removed, but it didn't seem to update the node's config, once it came back up. I believe this is because it wasn't caught by Sidero on the next boot and booted from disk. Once I deleted the Server resource, everything worked as expected, except the version mismatch of course. If rolling the nodes to update configs is the pattern Sidero/Talos has chosen, it would be great if |
The version of Talos you're installing is defined by the |
Ok, good to know. I will revist that. I appreciate the clarification you have provided so far. This has been really helpful. For the rolling of a node to update it's machine config, will this process work? Essentially |
Hmm, this document seems to state that the TalosControlPlane & the TalosConfigTemplate set the talosVersion. However it does say
This doc seems to corroborate this as well. And I see no mention of talosVersion in the environments documentation. |
By rolling update I mean something which is done by CAPI itself. You don't need to do anything for it except for managing the resources. You can still out of CAPI control change Talos machine configuration, but it might not match CAPI-level settings. |
Environment is what gets booted up, and that (by default) defines the installed version of Talos.
|
and
Got it, it is used to render the talosconfig, not to specify the installed talos version |
It is used to generate proper base Talos machine configuration. It should be fixed at the moment of cluster creation (e.g. created using Talos 1.6.2, so |
I have specified the install disk as
/dev/nvme0n1
but I'm getting a disk does not exist error, on provisioning of a bare metal system. I have confirmed that during the initial boot, the dmesg displaysFast wiped /dev/nvme0n1
, andWipe complete
. So/dev/nvme0n1
should be the correct disk.I believe the issue is that the serverclass
talos-masters
is the only one being applied as it's the only one defined in thetalos-cp
MetalMachineTemplate
and referenced by theTalosControlPlane
'sinfrastructureTemplate
.How do ServerClasses get applied? Does any matching serverclass contribute to the configuration of the matched server? How can I have a
talos-masters
serverclass and supply install disks, or other configPatches, conditionally based on hardware type/version/etc?As you can see below, I have defined a talos-masters serverclass which is referenced by the cluster. You can only set one infrastructureTemplate on the TalosControlPlane to my knowledge. If you look at my serverclasses, you can see that the
talos-lenovo-m710q
serverclass defines the install disk and is matched according to the describe serverclasses below.Resources
kubectl get serverclasses
kubectl describe metalmachine talos-cp-hl729
ServerClasses
Talos Cluster
The text was updated successfully, but these errors were encountered: