-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[occm] Add Openstack server hostId as k8s node label #2579
Comments
This seems like a valid request from my point of view. @mdbooth, what do you think? @chess-knight: Are you planning to contribute implementation? |
@gryf, @stephenfin: This sounds like a low hanging fruit you can grab to get up to speed with CPO code. |
is hostid really available to normal user in openstack? I do not have access now to openstack, but if I remember correctly normal user cannot see that? |
According to our research in SovereignCloudStack/issues#540, hostId should be available for all users. You can see e.g. in the nova code, that it is supported from API version 2.62 https://opendev.org/openstack/nova/commit/c2f7d6585818c04e626aa4b6c292e5c2660cb8b3. |
actually I can see both |
Yes, exactly as I originally wrote in the issue, thanks. |
we are not using live migrations at all, so difficult to say how it works. |
IMHO the node controller should update node labels to reflect their current reality, i.e. a live migration will trigger node relabelling the next time the node is reconciled. Most things currently running on the Node are unlikely to act on it, but:
We should also validate this decision with the cloud-provider folks in case there are caveats we're not aware of. |
I'm very much in favour of adding a |
Hi, could you - or anybody - please clarify: You say this HostID can't be used to determine anything about the host, but from the name I would suppose it's some kind of unique(?) distinct value per host, no? So can't I at least infer, that when the HostID changes, the underlying host has changed? If the answer to the above is "no", so you can't infer anything from this ID, I don't see where adding it would bring any benefit, at least for our use case in the Sovereign Cloudstack project. So any clarification around this would be highly appreciated, thanks! |
It's a sha224 of project_id and hostname: https://github.com/openstack/nova/blob/7dc4b1ea627d864a0ee2745cc9de4336fc0ba7b5/nova/utils.py#L1028-L1043 So hostID can't be compared between tenants.
@stephenfin may be able to confirm that hostID will change if a VM live migrates, but I'm pretty sure that it would. In general, k8s isn't going to handle a live migration well because, in general, we don't continuously reconcile the placement of things which have already been scheduled. I think the value of HostID to a kubernetes cluster is the ability to schedule Pods on different underlying hypervisors. This means that an end-user can ensure their workload can survive a maintenance outage of a single hypervisor. |
I am thinking of the following scenario:
|
I expect it to be updated. However, live migrating k8s hosts already violates the scheduling constraints of everything which was running on it. Live migrating k8s workers is not a good idea if it can be avoided. Simply draining the node and shutting it down during maintenance is preferrable. |
I agree with you, that live-migrating k8s node is not a good idea.
But which controller should be responsible for that? Right now, I am not aware of any. |
This PR is requesting that OpenStack CCM sets it, so OpenStack CCM would also be responsible for updating it. IIRC there is now a mechanism for returning arbitrary node labels, but I don't recall what it is. |
Do you mean |
I created PR so the discussion can move on. Can someone try it, please? You can use |
#2628 is approved. Should we merge it immediately and close this issue or if someone wants to look at it? I am not able to test migrations where host-id will change. I tried only deletion of the label with kubectl command and this additional label is not reconciled back into the place, so I assume that live migration will have the same effect(wrong host-id label after). |
@chess-knight I agree, the label must be reconciled and updated once the node is live-migrated. |
I am not sure if OCCM is capable of doing that. Maybe after all we should go back into the original issue and implement it in the CAPO, where machine reconciliation happens(I hope so). CAPO can introduce e.g. |
@chess-knight As well as reconciling new Nodes, the node controller resyncs nodes periodically: https://github.com/kubernetes/kubernetes/blob/03fe89c2339a1582733649faab5f5df471f65f09/staging/src/k8s.io/cloud-provider/controllers/node/node_controller.go#L191-L198 However, it looks like that job:
It sounds like if we want to continuously reconcile zone information that should be a discussion with the cloud-provider folks. Maybe @aojea can let us know if this has been discussed before, and if not the best place to start the discussion. My view: Kubernetes doesn't expect zone information to change, and in general will not respond to changes in zone information. We should advise users that there are alternatives which will give better behaviour. Despite that, zone information can still change, which means it will occasionally change. An example is a managed cloud service where the user has no influence over the migration of workloads. By updating the zone information on the Node when it does change we:
For now, this is an edge case. Lets return HostID in the instance metadata as is done by #2628. This is an immediate win for anybody wanting to schedule with hypervisor anti-affinity. The problem of continuous reconciliation is somewhat independent as it covers more than just the HostID label. |
Hi @mdbooth,
Interestingly, the comment suggests
I agree, that updating labels needs to be discussed. Maybe it can be configurable on/off behaviour. Do you think that I should write also some docs about the HostID label, so users are aware of it? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature
What happened:
This issue follows a discussion in kubernetes-sigs/cluster-api-provider-openstack#1605, as a request to automatically label nodes with underlying hostId information, so e.g. workload can be scheduled on different physical hosts. It can be used as a topology differentiator when all other topology labels are the same.
Anything else we need to know?:
Issue kubernetes/cloud-provider#67 is closed and potentially resolved by kubernetes/kubernetes#123223 so now based on comment kubernetes/cloud-provider#67 (comment), AdditionalLabels with hostId information can be added to InstanceMetadata:
cloud-provider-openstack/pkg/openstack/instancesv2.go
Lines 136 to 142 in dab0f06
This information should be available in the server struct AFAIK, so there should not be too much work I think now.
One potential issue that I see is live migration, e.g. see #1801, where occm will have to update the label because hostId will change.
Environment:
The text was updated successfully, but these errors were encountered: