Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFE-1133: Watch Infrastructure and update AWS tags #137

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

chiragkyal
Copy link
Member

@chiragkyal chiragkyal commented Sep 24, 2024

This PR adds support for AWS user tags specified in the platform status. Tags can be managed centrally in the PlatformStatus and have them automatically applied to resources managed by the ALBC.

  • Adds a watch on the Infrastructure object to detect changes in the platformStatus.AWS.ResourceTags field.
  • Merges tags from both the AWSLoadBalancerController.Spec.AdditionalResourceTags and the PlatformStatus.AWS.ResourceTags into a single list, prioritizing tags from the operator spec.
  • Ensures that the combined tag list does not exceed the maximum allowed limit of 24 tags for the ALBC's --default-tags argument.

Implements CFE-1133

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 24, 2024

@chiragkyal: This pull request references CFE-1133 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 24, 2024
@chiragkyal chiragkyal force-pushed the aws-tags branch 3 times, most recently from 3d5641f to f413d74 Compare September 25, 2024 09:29
@chiragkyal
Copy link
Member Author

/test images

@chiragkyal
Copy link
Member Author

/retest-required

@chiragkyal
Copy link
Member Author

/test all

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 30, 2024

@chiragkyal: This pull request references CFE-1133 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

This PR adds support for AWS user tags specified in the platform status. Tags can be managed centrally in the PlatformStatus and have them automatically applied to resources managed by the ALBC.

  • Adds a watch on the Infrastructure object to detect changes in the platformStatus.AWS.ResourceTags field.
  • Merges tags from both the AWSLoadBalancerController.Spec.AdditionalResourceTags and the PlatformStatus.AWS.ResourceTags into a single list, prioritizing tags from the operator spec.
  • Ensures that the combined tag list does not exceed the maximum allowed limit of 24 tags for the ALBC's --default-tags argument.

Implements CFE-1133

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@chiragkyal chiragkyal changed the title CFE-1133: [WIP] Watch infrastructure and update AWS tags CFE-1133: Watch Infrastructure and update AWS tags Sep 30, 2024
@chiragkyal chiragkyal force-pushed the aws-tags branch 2 times, most recently from c177a9c to 48c649a Compare October 1, 2024 06:34
@chiragkyal
Copy link
Member Author

/retest-required

- Added watch on Infrastructure object to detect changes in PlatformStatus.AWS.ResourceTags.
- Merged tags from AWSLoadBalancerController.Spec.AdditionalResourceTags and PlatformStatus.AWS.ResourceTags, prioritizing operator spec tags.

Signed-off-by: chiragkyal <[email protected]>
@chiragkyal
Copy link
Member Author

/retest

@chiragkyal
Copy link
Member Author

/assign @alebedev87

@chiragkyal
Copy link
Member Author

/retest

Copy link
Contributor

openshift-ci bot commented Nov 3, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from alebedev87. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@chiragkyal
Copy link
Member Author

/retest-required

@chiragkyal chiragkyal force-pushed the aws-tags branch 2 times, most recently from 54edf38 to 650dd1b Compare November 5, 2024 08:19
@chiragkyal
Copy link
Member Author

/retest-required

@chiragkyal
Copy link
Member Author

/retest-required

1 similar comment
@chiragkyal
Copy link
Member Author

/retest-required

Signed-off-by: chiragkyal <[email protected]>
@chiragkyal
Copy link
Member Author

/retest-required

Comment on lines +1209 to +1215
managed, err := isManagedServiceCluster(context.TODO(), kubeClientSet)
if err != nil {
t.Fatalf("failed to check managed cluster %v", err)
}
if managed {
t.Skip("Infrastructure status cannot be directly updated on managed cluster, skipping...")
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infrastructure status cannot be directly updated on ROSA clusters. Hence skipping the test.

Comment on lines +1460 to +1471
func isManagedServiceCluster(ctx context.Context, adminClient kubernetes.Interface) (bool, error) {
_, err := adminClient.CoreV1().Namespaces().Get(ctx, "openshift-backplane", v1.GetOptions{})
if err == nil {
return true, nil
}

if !errors.IsNotFound(err) {
return false, err
}

return false, nil
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add this link as a comment so that it would be easier to track this later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added the PR link as code comment

}

func desiredContainerArgs(controller *albo.AWSLoadBalancerController, clusterName, vpcID string) []string {
func desiredContainerArgs(controller *albo.AWSLoadBalancerController, platformStatus *configv1.PlatformStatus, clusterName, vpcID string) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster name and vpcID are still more fundamental inputs for the flags than the resource tags:

Suggested change
func desiredContainerArgs(controller *albo.AWSLoadBalancerController, platformStatus *configv1.PlatformStatus, clusterName, vpcID string) ([]string, error) {
func desiredContainerArgs(controller *albo.AWSLoadBalancerController, clusterName, vpcID string, platformStatus *configv1.PlatformStatus) ([]string, error) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Updated

Comment on lines +130 to +133
platformStatus := infraConfig.Status.PlatformStatus
if platformStatus == nil {
return ctrl.Result{}, fmt.Errorf("failed to determine infrastructure platform status: status.platformStatus is nil")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the platform status ever be nil? If it can, I don't think we have to block the reconciliation - we just don't add platform tags in that case.

Copy link
Member Author

@chiragkyal chiragkyal Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platform status should not be nil for clusters deployed on AWS. We need the AWS region as well from the platform status for the operator to function correctly.

if infra.Status.PlatformStatus == nil || infra.Status.PlatformStatus.AWS == nil || infra.Status.PlatformStatus.AWS.Region == "" {
err = fmt.Errorf("could not get AWS region from Infrastructure %q status", clusterInfrastructureName)
return
}

So, I think we should block the reconciliation if it is nil

Copy link
Contributor

@alebedev87 alebedev87 Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the AWS region as well from the platform status for the operator to function correctly.

Right, that's what we ensure at the startup of the operator. The region is fundamental for the operator, without it the operator cannot function. Unlike resource tags (platform or additional).

I don't see why a cluster may not have any platform resource tags. So, I consider platformStatus==nil as the same use case as "customer didn't set any resource tags during installation".

Comment on lines +280 to +283
// `--default-tags` arg allows a maximum of 24 user tags, but the combination of
// controller.Spec.AdditionalResourceTags and platformStatus.AWS.ResourceTags can result in more.
// Ensure a maximum of 24 merged tags are added.
if len(tags) > 24 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a constant? And can you please add some links which confirm this number. AWS limitation is 50 tags per resource so I don't quite get where 24 is coming from.

Copy link
Member Author

@chiragkyal chiragkyal Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

24 is coming from the validation added for the AdditionalResourceTags field.

// additionalResourceTags are the AWS tags that will be applied to all AWS resources managed by this
// controller. The managed AWS resources don't include the cluster subnets which are tagged by the operator.
// The addition of new tags as well as the update or removal of any existing tags
// will be propagated to the AWS resources. The controller owns all the tags of the managed AWS resources,
// unsolicited tags are removed. The controller doesn't watch for changes on AWS, so the removal of the unsolicited
// tags can only be triggered by an event coming from OpenShift. AWS supports a maximum of 50 tags per resource.
// AWSLoadBalancerController reserves 3 tags for its use, the rest is split between the tag annotation
// which can be set on the ingress and this field: 23 and 24, respectively. Each tag key must be unique.
//
// +kubebuilder:validation:MaxItems=24
// +kubebuilder:validation:Optional
// +optional
// +listType=map
// +listMapKey=key
// +patchMergeKey=key
// +patchStrategy=merge
AdditionalResourceTags []AWSResourceTag `json:"additionalResourceTags,omitempty" patchStrategy:"merge" patchMergeKey:"key"`

Since we are now merging these tags with the Infrastructure tags, isn't this number still valid?

// Add tags from platformStatus.AWS.ResourceTags to the map, only if the key doesn't exist
if platformStatus.AWS != nil && len(platformStatus.AWS.ResourceTags) > 0 {
for _, t := range platformStatus.AWS.ResourceTags {
if len(t.Key) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platformStatus.AWS.ResourceTags.Key has OpenAPI schema validations which exclude empty value:

    // +kubebuilder:validation:MinLength=1
    // +kubebuilder:validation:MaxLength=128
    // +kubebuilder:validation:Pattern=`^[0-9A-Za-z_.:/=+-@]+$`

So, this is redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. Removed the if block.

Comment on lines +242 to +244
if tc.platformStatus == nil {
tc.platformStatus = &configv1.PlatformStatus{}
}
Copy link
Contributor

@alebedev87 alebedev87 Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should handle the nil case inside the function. That is, if platformStatus is nil we just don't add platform tags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to #137 (comment) reason, I think having nil platformStatus won't even start the operator.

@@ -854,11 +962,15 @@ func TestEnsureDeploymentEnvVars(t *testing.T) {
VPCID: "test-vpc",
AWSRegion: testAWSRegion,
}
_, err := r.ensureDeployment(context.Background(), tc.serviceAccount, "test-credentials", "test-serving", tc.controller, nil)
_, err := r.ensureDeployment(context.Background(), tc.serviceAccount, "test-credentials", "test-serving", tc.controller, &configv1.PlatformStatus{}, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about nil value being handled inside the function as other args of this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the same reason, I think we won't even reach upto ensureDeployment if platformStatus is nil.

Comment on lines 1228 to 1232
alb.Spec.AdditionalResourceTags = []albo.AWSResourceTag{
{Key: "op-key1", Value: "op-value1"},
{Key: "conflict-key1", Value: "op-value2"},
{Key: "conflict-key2", Value: "op-value3"},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this to albcBuilder? Something like withTags(map[string]string).

Copy link
Member Author

@chiragkyal chiragkyal Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just now observed that there is already a similar function: withResourceTags(map[string]string). I've updated it to use this function instead. Thanks for the suggestion.


// Check `--default-tags` arg for tags present in alb instance and infra status (initialInfraTags)
expectedTagValue := "conflict-key1=op-value2,conflict-key2=op-value3,op-key1=op-value1,plat-key1=plat-value1"
assertContainerArgFromDeployment(t, dep, awsLoadBalancerControllerContainerName, "--default-tags", expectedTagValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is not quite different from the unit tests for desired deployment. We already assure the deployment gets the right flag values. I think the e2e test should check the the result of ALBC's job - load balancers. You can use https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2 to check the infrastructure tags are propagated to a load balancer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking differently. The task of the ALBO is to set the flags correctly, next it's the job of ALBC to consume those flags and take necessary actions (adding tags in this case). If we can validate that the --default-tags flag is getting populated correctly, I hope we can trust ALBC to propagate them to a load balancer.

I would expect ALBC's e2e to query the tags using https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2

In case of CIO also, we did a similar check. We're checking whether the operator has correctly set the annotations or not. https://github.com/openshift/cluster-ingress-operator/pull/1148/files#diff-cf4b4e5424070a666a364d1d1b04011478d888d6d3d65c943ca7783333b7a6e4R1356-R1358

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task of the ALBO is to set the flags correctly, next it's the job of ALBC to consume those flags and take necessary actions (adding tags in this case). If we can validate that the --default-tags flag is getting populated correctly, I hope we can trust ALBC to propagate them to a load balancer.

We don't have much control over what's validated upstream. The e2e tests for our addon operators verify use cases we think are nominal for our customers. And the idea here is to go from one end (API) to the other end (load balancers). That's why all the other e2e tests send real traffic to check the load balancers are in the shape we expect them to be. The validation of the expected values for --default-tags flag is what a unit test can do even without envtest, a fake Kube client would be enough.

I would expect ALBC's e2e to query the tags using https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2

I would expect too. But I don't think we know this for sure.

In case of CIO also, we did a similar check. We're checking whether the operator has correctly set the annotations or not. https://github.com/openshift/cluster-ingress-operator/pull/1148/files#diff-cf4b4e5424070a666a364d1d1b04011478d888d6d3d65c943ca7783333b7a6e4R1356-R1358

C-I-O case is different. The creation of load balancers for Services is out of control of C-I-O, it's not done by anything C-I-O manages or being responsible for.

Frankly speaking, I would be ok if this feature would come with a set of unit tests only. However since you already into the e2e stage, we should be consistent with the rest of the tests and go to very end (AWS).

Comment on lines +1460 to +1471
func isManagedServiceCluster(ctx context.Context, adminClient kubernetes.Interface) (bool, error) {
_, err := adminClient.CoreV1().Namespaces().Get(ctx, "openshift-backplane", v1.GetOptions{})
if err == nil {
return true, nil
}

if !errors.IsNotFound(err) {
return false, err
}

return false, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add this link as a comment so that it would be easier to track this later?

Signed-off-by: chiragkyal <[email protected]>
Copy link
Contributor

openshift-ci bot commented Nov 15, 2024

@chiragkyal: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ci-index-aws-load-balancer-operator-bundle e9077da link true /test ci-index-aws-load-balancer-operator-bundle

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants