Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypershift cluster fails when combining OKD and FCOS #1767

Closed
itmwiw opened this issue Oct 22, 2023 · 5 comments
Closed

Hypershift cluster fails when combining OKD and FCOS #1767

itmwiw opened this issue Oct 22, 2023 · 5 comments

Comments

@itmwiw
Copy link

itmwiw commented Oct 22, 2023

I am attempting to use Hypershift to provision an OKD cluster. I have successfully installed the OKD 'hosted cluster' up to a certain point: The Control plane's pods are 'Running,' and the node pool's VMs are provisioned, with the Nodes marked as 'Ready'. Furthermore, when I run 'oc get co,' all operators are displayed as 'AVAILABLE':

NAME                                       VERSION                          AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
console                                    4.13.0-0.okd-2023-09-30-084937   True        False         False      16s
csi-snapshot-controller                    4.13.0-0.okd-2023-09-30-084937   True        False         False      50m
dns                                        4.13.0-0.okd-2023-09-30-084937   True        False         False      2m12s
image-registry                             4.13.0-0.okd-2023-09-30-084937   True        False         False      2m25s
ingress                                    4.13.0-0.okd-2023-09-30-084937   True        False         False      2m7s
insights                                   4.13.0-0.okd-2023-09-30-084937   True        False         False      3m20s
kube-apiserver                             4.13.0-0.okd-2023-09-30-084937   True        False         False      50m
kube-controller-manager                    4.13.0-0.okd-2023-09-30-084937   True        False         False      50m
kube-scheduler                             4.13.0-0.okd-2023-09-30-084937   True        False         False      50m
kube-storage-version-migrator              4.13.0-0.okd-2023-09-30-084937   True        False         False      2m43s
monitoring                                 4.13.0-0.okd-2023-09-30-084937   True        False         False      66s
network                                    4.13.0-0.okd-2023-09-30-084937   True        False         False      3m39s
node-tuning                                4.13.0-0.okd-2023-09-30-084937   True        False         False      6m48s
openshift-apiserver                        4.13.0-0.okd-2023-09-30-084937   True        False         False      50m
openshift-controller-manager               4.13.0-0.okd-2023-09-30-084937   True        False         False      50m
openshift-samples                          4.13.0-0.okd-2023-09-30-084937   True        False         False      111s
operator-lifecycle-manager                 4.13.0-0.okd-2023-09-30-084937   True        False         False      50m
operator-lifecycle-manager-catalog         4.13.0-0.okd-2023-09-30-084937   True        False         False      50m
operator-lifecycle-manager-packageserver   4.13.0-0.okd-2023-09-30-084937   True        False         False      50m
service-ca                                 4.13.0-0.okd-2023-09-30-084937   True        False         False      3m18s
storage                                    4.13.0-0.okd-2023-09-30-084937   True        False         False      50m

However, I still encounter an error related to the '99-okd-master-disable-mitigations' machineconfig. The exact error is as follow:

E1020 15:06:53.905739       1 sync_worker.go:652] unable to synchronize image (waiting 2m36.124787484s): Multiple errors are preventing progress:
* Could not update machineconfig "99-okd-master-disable-mitigations" (418 of 584): the server does not recognize this resource, check extension API servers
* Could not update machineconfig "99-okd-master-disable-mitigations" (451 of 584): the server does not recognize this resource, check extension API servers

That seems OKD FCOS specific, it doesn't happen on OKD SCOS.

@vrutkovs vrutkovs pinned this issue Oct 22, 2023
@keremceliker
Copy link

keremceliker commented Dec 18, 2023

Hey there,

Sometimes, issues with machineconfigs can be related to kernel modules or configurations. Ensure that the necessary kernel modules are loaded on your FCOS nodes. Also Check if the required modules for machineconfig synchronization (such as bridge-related modules) are available. Lastly, Please Review the kernel configuration to ensure it supports the required features.

There are so many check to do, please let us know if any update on it ?

Kerem ÇELİKER
Head of Cloud Architecture
linkedin.com/in/keremceliker/

@vrutkovs
Copy link
Member

Its not related to machineconfig contents, just the fact that the payload contains MachineConfigs is breaking several hypershift assumptions. OKD FCOS should move towards setting initial kargs via other means instead of "update kernel arguments on pivot"

@itmwiw
Copy link
Author

itmwiw commented May 23, 2024

Hello,
Any news regarding this issue?
Thanks a lot.

@itmwiw
Copy link
Author

itmwiw commented Jun 23, 2024

OKD-SCOS was a workaround to make OKD work with Hypershift. Unlike OKD-FCOS, its installation didn't require MachineConfigs and thus didn't break Hypershift's assumptions. With OKD-SCOS currently paused, it appears there is no way to use Hypershift with OKD at the moment.

@JaimeMagiera
Copy link
Contributor

Hi,

We are currently working on SCOS builds of OKD. Please see these documents...

https://okd.io/blog/2024/06/01/okd-future-statement
https://okd.io/blog/2024/07/30/okd-pre-release-testing

Please test with the OKD SCOS nightlies and file a new issue as needed.

Many thanks,

Jaime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants