-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ feat: Add IRSA support for self-managed clusters (rebase) #5109
base: main
Are you sure you want to change the base?
Conversation
Hi @sl1pm4t. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
184576c
to
986add5
Compare
/test pull-cluster-api-provider-aws-e2e |
There is some investigation of the failing e2e test happening over here. |
986add5
to
d265b48
Compare
/retest-required |
d265b48
to
909a464
Compare
/test pull-cluster-api-provider-aws-e2e |
909a464
to
8d30f71
Compare
/retest |
8d30f71
to
540420c
Compare
/test pull-cluster-api-provider-aws-e2e-eks |
@sl1pm4t do you need help with rebase? |
No I will be able to rebase. I had just forgotten this was needed. I'll try to do this in the next day or two. |
Fix s3 tests
540420c
to
71a97d5
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I rebased and re-ran |
/test pull-cluster-api-provider-aws-e2e-eks |
/test pull-cluster-api-provider-aws-e2e-eks |
Both e2e have passed 🎉 |
@@ -37,12 +35,6 @@ func (b *S3Bucket) Validate() []*field.Error { | |||
errs = append(errs, field.Required(field.NewPath("spec", "s3Bucket", "name"), "can't be empty")) | |||
} | |||
|
|||
// Feature gate is not enabled but ignition is enabled then send a forbidden error. | |||
if !feature.Gates.Enabled(feature.BootstrapFormatIgnition) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this removed on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, this PR was based on work started by @luthermonson so I don't have the full history.
I see the check still exists in the v1beta2 validation, so I'll revert this and do some tests.
@@ -67,6 +67,9 @@ const ( | |||
// BootstrapFormatIgnition will allow an user to enable alternate machine bootstrap format, viz. Ignition. | |||
BootstrapFormatIgnition featuregate.Feature = "BootstrapFormatIgnition" | |||
|
|||
// OIDCProviderSupport will allow a user to enable OIDC provider support for kubeadm clusters. | |||
OIDCProviderSupport featuregate.Feature = "OIDCProviderSupport" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we have OIDC provider support in EKS should the feature gate me renamed so that its explicit non-eks (i.e. unmanaged)?
I appreciate the comment says it's for kubeadm clusters but think we may need to be more clear as the gate name is used in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll update this.
@@ -185,6 +185,11 @@ func (t Template) ControllersPolicy() *iamv1.PolicyDocument { | |||
"ec2:DeleteLaunchTemplateVersions", | |||
"ec2:DescribeKeyPairs", | |||
"ec2:ModifyInstanceMetadataOptions", | |||
"iam:CreateOpenIDConnectProvider", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the OIDC feature flag is false by default we may want to not include these extra permissions by default. And instead only add them if enabled via the config file.
@@ -256,6 +267,14 @@ func (r *AWSClusterReconciler) reconcileDelete(ctx context.Context, clusterScope | |||
allErrs = append(allErrs, errors.Wrap(err, "error deleting network")) | |||
} | |||
|
|||
if err := iamService.DeleteOIDCProvider(ctx); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be dependent on the feature flag?
|
||
### 4 - Set Service Account Issuer URL in KubeadmControlPlane Configuration | ||
|
||
The KubeadmControlPlane configuration should be updated to set the apiServer `service-account-issuer` argument with the S3 buckets URL: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be nice to create a new cluster flavor (i.e. template) in the templates directory.
@@ -352,6 +372,12 @@ func (r *AWSClusterReconciler) reconcileNormal(clusterScope *scope.ClusterScope) | |||
} | |||
conditions.MarkTrue(awsCluster, infrav1.S3BucketReadyCondition) | |||
|
|||
if err := iamService.ReconcileOIDCProvider(context.TODO()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we wrap this with a test of the feature being enabled to be explicit? I know the validating webook shouldn't allow setting the associate field when the feature flag is disabled. Undecided about this if i'm honest ;)
// 4. copy Service Account public signing JWKs to the s3 bucket. | ||
func (s *Service) ReconcileOIDCProvider(ctx context.Context) error { | ||
if !s.scope.AssociateOIDCProvider() { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be good to have a log entry to say its disabled at a high log verbosity level for debugging purposes.
@@ -673,6 +673,50 @@ func (r *AWSMachineReconciler) reconcileOperationalState(ec2svc services.EC2Inte | |||
return err | |||
} | |||
|
|||
// check if the remote kubeconfig works and annotate the cluster | |||
if _, ok := machineScope.InfraCluster.InfraCluster().GetAnnotations()[scope.KubeconfigReadyAnnotation]; !ok && machineScope.IsControlPlane() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this as the mechanism to decide if we carry on with the OIDC reconciliation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a suggestion in the awscluster_controller.go
if err := iamService.ReconcileOIDCProvider(context.TODO()); err != nil { | ||
conditions.MarkFalse(awsCluster, infrav1.OIDCProviderReadyCondition, infrav1.OIDCProviderReconciliationFailedReason, clusterv1.ConditionSeverityError, "%s", err.Error()) | ||
clusterScope.Error(err, "failed to reconcile OIDC provider") | ||
return reconcile.Result{RequeueAfter: 15 * time.Second}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could move if err := iamService.ReconcileOIDCProvider(context.TODO()); err != nil {
to after awsCluster.Status.Ready = true
and then within iamService.ReconcileOIDCProvider
instead return a ctrl.Result so that you can say RequeueAfter if the downstream cluster isn't available yet. This would allow provisioning to continue for the control plane and then when the cluster is accessible iamService.ReconcileOIDCProvider
would do the second part and deploy the config map and return an empty ctrl.Result so there is no requeue.
The benefit is this removes the need for the AWSMachines controller annotating AWSCluster and keeps everything it the control of the AWSCluster controller.
Hopefully that all makes sense. If it doesn't ping me on slack and we can chat or arrange a call.
Co-authored-by: Richard Case <[email protected]>
@sl1pm4t: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Adds IRSA functionality to self-managed clusters. This will bring the self-managed clusters inline in functionality with Managed clusters that provide most of this functionality out of the box.
With this PR, the following new resources are created:
<cluster_name>/.well-known/openid-configuration
- OpenID Connect discovery document<cluster_name>/openid/v1/jwks
- Service Account signing public keyThis is a continuation of an old unmerged PR #4094 - with some fixes and some functionality removed to reduce the scope of the PR.
The functionality removed includes:
amazon-pod-identity-webhook addon
to the workload cluster. I felt there are already many ways to manage cluster addons, including ClusterResourceSets or CAAPH, and that it was unnecessary to install the addon via the controller which then becomes an ongoing maintenance burden. Instead, the requirement for the addon has been added to the documentation.service-account-issuer
argument through kubeadm patches. This is easily covered in the documentation and only requires a single line of config to be added to theAWSCluster
resource, but also during testing I experienced issues with this being applied inconsistently, resulting in different values across the control plane nodes.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #3560
Supersedes #4094
Special notes for your reviewer:
This PR adds a new ReconcileOIDCProvider to the AWSCluster reconciliation loop.
Checklist:
Release note: