Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion Webhook breaks deployment on clusters with seperate control plane #208

Open
Argannor opened this issue Feb 26, 2024 · 4 comments · May be fixed by #301
Open

Conversion Webhook breaks deployment on clusters with seperate control plane #208

Argannor opened this issue Feb 26, 2024 · 4 comments · May be fixed by #301
Labels
bug Something isn't working

Comments

@Argannor
Copy link

Argannor commented Feb 26, 2024

What happened?

After upgrading from 0.10.0 to 0.12.0 the provider is unable to start up successfully with logs indicating the CRDs cannot be watched/listed:

W0226 16:25:45.849143       1 reflector.go:539] k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1alpha2.Object: conversion webhook for kubernetes.crossplane.io/v1alpha1, Kind=Object failed: Post "https://provider-kubernetes.crossplane-system.svc:9443/convert?timeout=30s": Address is not allowed
E0226 16:25:45.849254       1 reflector.go:147] k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1alpha2.Object: failed to list *v1alpha2.Object: conversion webhook for kubernetes.crossplane.io/v1alpha1, Kind=Object failed: Post "https://provider-kubernetes.crossplane-system.svc:9443/convert?timeout=30s": Address is not allowed
crossplane-kubernetes-provider: error: Cannot start controller manager: failed to wait for providerconfig/providerconfig.kubernetes.crossplane.io caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.ProviderConfig
Stream closed EOF for crossplane-system/provider-kubernetes-6e1fd76ec9c7-65d577488d-jzgzs (package-runtime)

Our Kubernetes cluster is an AWS EKS with Calico as the CNI, therefore the cluster control plane and pods running in the cluster are running in different networks (as the CNI cannot be applied to the AWS managed control plane). The consequence of this is that every webhook needs to be run with hostNetwork: true. That in turn is also not possible (to my knowledge) since the pod is managed by crossplane.

As the conversion from v1alpha1 to v1alpha2 is fairly straight forward and could be done manually instead, maybe a command line option can be introduced to disable the webhook. Although this also affects the CRDs and thus depends on the code generation used by crossplane providers, but I might be wrong here.

How can we reproduce it?

Deploy the provider in version 1.11+ and block network access from the control plane to the webhook.

What environment did it happen in?

Crossplane version: v1.15.0
Cloud Provider: AWS
Distribution: EKS v1.29
Container Network Interface: Calico

@Argannor Argannor added the bug Something isn't working label Feb 26, 2024
@phisco
Copy link
Collaborator

phisco commented Feb 26, 2024

You should be able to deploy it with hostNetwork se to true using a DeploymentRuntimeConfig.

@Argannor
Copy link
Author

Thank you for pointing that out, and that works partially: It works only if the ports on the host are available, which is unlikely since it includes 8080 for the metric ports (and in our case they're indeed not available).

So this approach would introduce the need of making the ports configurable, which I think would involve changes to crossplane as well, right?

@ravibagri4
Copy link

Any update on this one? We're also running EKS with cilium CNI and don't want to enable hostNetwork. Besides the port should be configurable. Not sure if we can just disable the conversion webhook via some args in runtimeConfig.

@Argannor Argannor linked a pull request Oct 11, 2024 that will close this issue
2 tasks
@Argannor
Copy link
Author

Sorry for the long silence, I just raised the PR to make the port configurable and updated my crossplane PR. I hope this can get through soon.

I understand that you don't want to run these workloads on the host network, but I don't think that there is another option when running EKS and a custom CNI. The issue is that the Kubernetes API server is running outside of the overlay network constructed by the CNI, therefore it's difficult to get them to talk to each other without leaving the cluster boundary first or going to the host network. That's why the host network approach is the suggested way of exposing webhooks according to the Calico documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants