Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFE-1133: Watch Infrastructure and update AWS tags #137

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,9 @@ help: ## Display this help.

##@ Development

.PHONY: update
update: update-vendored-crds manifests generate

.PHONY: manifests
manifests: ## Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects.
$(CONTROLLER_GEN) rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
Expand All @@ -130,6 +133,11 @@ manifests: ## Generate WebhookConfiguration, ClusterRole and CustomResourceDefin
generate: iamctl-gen iam-gen## Generate code containing DeepCopy, DeepCopyInto, DeepCopyObject method implementations and iamctl policies.
$(CONTROLLER_GEN) object:headerFile="hack/boilerplate.go.txt" paths="./..."

.PHONY: update-vendored-crds
update-vendored-crds:
## Copy infrastructure CRD from openshift/api
cp vendor/github.com/openshift/api/config/v1/0000_10_config-operator_01_infrastructure-Default.crd.yaml ./pkg/utils/test/crd/infrastructure-Default.yaml

.PHONY: fmt
fmt: ## Run go fmt against code.
go fmt -mod=vendor ./...
Expand Down Expand Up @@ -301,8 +309,12 @@ catalog-build: catalog
catalog-push:
$(MAKE) image-push IMG=$(CATALOG_IMG)

.PHONY: verify-vendored-crds
verify-vendored-crds:
diff vendor/github.com/openshift/api/config/v1/0000_10_config-operator_01_infrastructure-Default.crd.yaml ./pkg/utils/test/crd/infrastructure-Default.yaml

.PHONY: verify
verify:
verify: verify-vendored-crds
hack/verify-deps.sh
hack/verify-generated.sh
hack/verify-gofmt.sh
Expand Down
36 changes: 27 additions & 9 deletions pkg/controllers/awsloadbalancercontroller/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ import (

cco "github.com/openshift/cloud-credential-operator/pkg/apis/cloudcredential/v1"

configv1 "github.com/openshift/api/config/v1"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/builder"
"sigs.k8s.io/controller-runtime/pkg/client"
Expand All @@ -46,6 +47,8 @@ import (
const (
// the name of the AWSLoadBalancerController resource which will be reconciled
controllerName = "cluster"
// clusterInfrastructureName is the name of the 'cluster' infrastructure object.
clusterInfrastructureName = "cluster"
// the port on which controller metrics are served
controllerMetricsPort = 8080
// the port on which the controller webhook is served
Expand Down Expand Up @@ -120,6 +123,15 @@ func (r *AWSLoadBalancerControllerReconciler) Reconcile(ctx context.Context, req
}
}

infraConfig := &configv1.Infrastructure{}
if err := r.Client.Get(ctx, types.NamespacedName{Name: clusterInfrastructureName}, infraConfig); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to get infrastructure %q: %w", clusterInfrastructureName, err)
}
platformStatus := infraConfig.Status.PlatformStatus
if platformStatus == nil {
return ctrl.Result{}, fmt.Errorf("failed to determine infrastructure platform status: status.platformStatus is nil")
}
Comment on lines +130 to +133
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the platform status ever be nil? If it can, I don't think we have to block the reconciliation - we just don't add platform tags in that case.

Copy link
Member Author

@chiragkyal chiragkyal Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platform status should not be nil for clusters deployed on AWS. We need the AWS region as well from the platform status for the operator to function correctly.

if infra.Status.PlatformStatus == nil || infra.Status.PlatformStatus.AWS == nil || infra.Status.PlatformStatus.AWS.Region == "" {
err = fmt.Errorf("could not get AWS region from Infrastructure %q status", clusterInfrastructureName)
return
}

So, I think we should block the reconciliation if it is nil

Copy link
Contributor

@alebedev87 alebedev87 Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the AWS region as well from the platform status for the operator to function correctly.

Right, that's what we ensure at the startup of the operator. The region is fundamental for the operator, without it the operator cannot function. Unlike resource tags (platform or additional).

I don't see why a cluster may not have any platform resource tags. So, I consider platformStatus==nil as the same use case as "customer didn't set any resource tags during installation".


if err := r.ensureIngressClass(ctx, lbController); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to ensure default IngressClass for AWSLoadBalancerController %q: %v", req.Name, err)
}
Expand Down Expand Up @@ -187,7 +199,7 @@ func (r *AWSLoadBalancerControllerReconciler) Reconcile(ctx context.Context, req
return ctrl.Result{}, fmt.Errorf("failed to ensure ClusterRole and Binding for AWSLoadBalancerController %q: %w", req.Name, err)
}

deployment, err := r.ensureDeployment(ctx, sa, credSecretNsName.Name, servingSecretName, lbController, trustCAConfigMap)
deployment, err := r.ensureDeployment(ctx, sa, credSecretNsName.Name, servingSecretName, lbController, platformStatus, trustCAConfigMap)
if err != nil {
return ctrl.Result{}, fmt.Errorf("failed to ensure Deployment for AWSLoadbalancerController %q: %w", req.Name, err)
}
Expand Down Expand Up @@ -247,17 +259,17 @@ func (r *AWSLoadBalancerControllerReconciler) BuildManagedController(mgr ctrl.Ma
Owns(&arv1.ValidatingWebhookConfiguration{}).
Owns(&arv1.MutatingWebhookConfiguration{})

if r.TrustedCAConfigMapName != "" {
clusterALBCInstance := func(ctx context.Context, o client.Object) []reconcile.Request {
return []reconcile.Request{
{
NamespacedName: types.NamespacedName{
Name: controllerName,
},
clusterALBCInstance := func(ctx context.Context, o client.Object) []reconcile.Request {
return []reconcile.Request{
{
NamespacedName: types.NamespacedName{
Name: controllerName,
},
}
},
}
}

if r.TrustedCAConfigMapName != "" {
// Requeue the only (cluster) instance of AWSLoadBalancerController
// so that the main reconciliation loop can detect the changes in the trusted CA configmap's contents
// and redeploy the controller if needed.
Expand All @@ -270,6 +282,12 @@ func (r *AWSLoadBalancerControllerReconciler) BuildManagedController(mgr ctrl.Ma
predicate.NewPredicateFuncs(inNamespace(r.Namespace))),
predicate.NewPredicateFuncs(hasName(r.TrustedCAConfigMapName))))
}
// Watch Infrastructure object to detect changes in AWS user tags
bldr = bldr.Watches(&configv1.Infrastructure{},
handler.EnqueueRequestsFromMapFunc(clusterALBCInstance),
builder.WithPredicates(
predicate.NewPredicateFuncs(hasName(clusterInfrastructureName))))

return bldr
}

Expand Down
72 changes: 59 additions & 13 deletions pkg/controllers/awsloadbalancercontroller/deployment.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ import (
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/log"

configv1 "github.com/openshift/api/config/v1"

albo "github.com/openshift/aws-load-balancer-operator/api/v1"
)

Expand Down Expand Up @@ -69,7 +71,7 @@ const (
allCapabilities = "ALL"
)

func (r *AWSLoadBalancerControllerReconciler) ensureDeployment(ctx context.Context, sa *corev1.ServiceAccount, crSecretName, servingSecretName string, controller *albo.AWSLoadBalancerController, trustCAConfigMap *corev1.ConfigMap) (*appsv1.Deployment, error) {
func (r *AWSLoadBalancerControllerReconciler) ensureDeployment(ctx context.Context, sa *corev1.ServiceAccount, crSecretName, servingSecretName string, controller *albo.AWSLoadBalancerController, platformStatus *configv1.PlatformStatus, trustCAConfigMap *corev1.ConfigMap) (*appsv1.Deployment, error) {
deploymentName := fmt.Sprintf("%s-%s", controllerResourcePrefix, controller.Name)

reqLogger := log.FromContext(ctx).WithValues("deployment", deploymentName)
Expand All @@ -90,7 +92,11 @@ func (r *AWSLoadBalancerControllerReconciler) ensureDeployment(ctx context.Conte
trustCAConfigMapHash = configMapHash
}

desired := r.desiredDeployment(deploymentName, crSecretName, servingSecretName, controller, sa, trustCAConfigMapName, trustCAConfigMapHash)
desired, err := r.desiredDeployment(deploymentName, crSecretName, servingSecretName, controller, platformStatus, sa, trustCAConfigMapName, trustCAConfigMapHash)
if err != nil {
return nil, fmt.Errorf("failed to get desired deployment %s: %w", deploymentName, err)
}

err = controllerutil.SetControllerReference(controller, desired, r.Scheme)
if err != nil {
return nil, fmt.Errorf("failed to set owner reference on deployment %s: %w", deploymentName, err)
Expand Down Expand Up @@ -120,7 +126,11 @@ func (r *AWSLoadBalancerControllerReconciler) ensureDeployment(ctx context.Conte
return current, nil
}

func (r *AWSLoadBalancerControllerReconciler) desiredDeployment(name, credentialsRequestSecretName, servingSecret string, controller *albo.AWSLoadBalancerController, sa *corev1.ServiceAccount, trustedCAConfigMapName, trustedCAConfigMapHash string) *appsv1.Deployment {
func (r *AWSLoadBalancerControllerReconciler) desiredDeployment(name, credentialsRequestSecretName, servingSecret string, controller *albo.AWSLoadBalancerController, platformStatus *configv1.PlatformStatus, sa *corev1.ServiceAccount, trustedCAConfigMapName, trustedCAConfigMapHash string) (*appsv1.Deployment, error) {
containerArgs, err := desiredContainerArgs(controller, platformStatus, r.ClusterName, r.VPCID)
if err != nil {
return nil, fmt.Errorf("failed to get container args: %w", err)
}
d := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Expand All @@ -145,7 +155,7 @@ func (r *AWSLoadBalancerControllerReconciler) desiredDeployment(name, credential
{
Name: awsLoadBalancerControllerContainerName,
Image: r.Image,
Args: desiredContainerArgs(controller, r.ClusterName, r.VPCID),
Args: containerArgs,
Env: append([]corev1.EnvVar{
{
Name: awsRegionEnvVarName,
Expand Down Expand Up @@ -257,21 +267,23 @@ func (r *AWSLoadBalancerControllerReconciler) desiredDeployment(name, credential
})
}
}
return d
return d, nil
}

func desiredContainerArgs(controller *albo.AWSLoadBalancerController, clusterName, vpcID string) []string {
func desiredContainerArgs(controller *albo.AWSLoadBalancerController, platformStatus *configv1.PlatformStatus, clusterName, vpcID string) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster name and vpcID are still more fundamental inputs for the flags than the resource tags:

Suggested change
func desiredContainerArgs(controller *albo.AWSLoadBalancerController, platformStatus *configv1.PlatformStatus, clusterName, vpcID string) ([]string, error) {
func desiredContainerArgs(controller *albo.AWSLoadBalancerController, clusterName, vpcID string, platformStatus *configv1.PlatformStatus) ([]string, error) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Updated

var args []string
args = append(args, fmt.Sprintf("--webhook-cert-dir=%s", webhookTLSDir))
args = append(args, fmt.Sprintf("--aws-vpc-id=%s", vpcID))
args = append(args, fmt.Sprintf("--cluster-name=%s", clusterName))

// if additional keys are present then sort them and append it to the arguments
if controller.Spec.AdditionalResourceTags != nil {
var tags []string
for _, t := range controller.Spec.AdditionalResourceTags {
tags = append(tags, fmt.Sprintf("%s=%s", t.Key, t.Value))
}
tags := mergeTags(controller, platformStatus)
// `--default-tags` arg allows a maximum of 24 user tags, but the combination of
// controller.Spec.AdditionalResourceTags and platformStatus.AWS.ResourceTags can result in more.
// Ensure a maximum of 24 merged tags are added.
if len(tags) > 24 {
Comment on lines +280 to +283
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a constant? And can you please add some links which confirm this number. AWS limitation is 50 tags per resource so I don't quite get where 24 is coming from.

Copy link
Member Author

@chiragkyal chiragkyal Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

24 is coming from the validation added for the AdditionalResourceTags field.

// additionalResourceTags are the AWS tags that will be applied to all AWS resources managed by this
// controller. The managed AWS resources don't include the cluster subnets which are tagged by the operator.
// The addition of new tags as well as the update or removal of any existing tags
// will be propagated to the AWS resources. The controller owns all the tags of the managed AWS resources,
// unsolicited tags are removed. The controller doesn't watch for changes on AWS, so the removal of the unsolicited
// tags can only be triggered by an event coming from OpenShift. AWS supports a maximum of 50 tags per resource.
// AWSLoadBalancerController reserves 3 tags for its use, the rest is split between the tag annotation
// which can be set on the ingress and this field: 23 and 24, respectively. Each tag key must be unique.
//
// +kubebuilder:validation:MaxItems=24
// +kubebuilder:validation:Optional
// +optional
// +listType=map
// +listMapKey=key
// +patchMergeKey=key
// +patchStrategy=merge
AdditionalResourceTags []AWSResourceTag `json:"additionalResourceTags,omitempty" patchStrategy:"merge" patchMergeKey:"key"`

Since we are now merging these tags with the Infrastructure tags, isn't this number still valid?

return nil, fmt.Errorf("exceeded maximum of 24 allowed tags, got %d", len(tags))
}
if len(tags) > 0 {
sort.Strings(tags)
args = append(args, fmt.Sprintf(`--default-tags=%s`, strings.Join(tags, ",")))
}
Expand Down Expand Up @@ -302,7 +314,7 @@ func desiredContainerArgs(controller *albo.AWSLoadBalancerController, clusterNam
args = append(args, fmt.Sprintf("--ingress-class=%s", controller.Spec.IngressClass))
args = append(args, "--feature-gates=EnableIPTargetType=false")
sort.Strings(args)
return args
return args, nil
}

func (r *AWSLoadBalancerControllerReconciler) currentDeployment(ctx context.Context, name string, namespace string) (bool, *appsv1.Deployment, error) {
Expand Down Expand Up @@ -542,3 +554,37 @@ func buildMapHash(data map[string]string) (string, error) {
}
return hex.EncodeToString(hash.Sum(nil)), nil
}

// mergeTags merges tags from an AWSLoadBalancerController and PlatformStatus into a slice of strings.
// Tags from `controller.Spec.AdditionalResourceTags` take precedence over those in `platformStatus.AWS.ResourceTags`.
// If a tag key already exists, the value from AdditionalResourceTags will overwrite the one from ResourceTags.
func mergeTags(controller *albo.AWSLoadBalancerController, platformStatus *configv1.PlatformStatus) []string {
// tagMap holds tags with unique keys
tagMap := make(map[string]string)

// Add tags from controller.Spec.AdditionalResourceTags since operator tags has higher precedence
if controller.Spec.AdditionalResourceTags != nil {
for _, t := range controller.Spec.AdditionalResourceTags {
tagMap[t.Key] = t.Value
}
}

// Add tags from platformStatus.AWS.ResourceTags to the map, only if the key doesn't exist
if platformStatus.AWS != nil && len(platformStatus.AWS.ResourceTags) > 0 {
for _, t := range platformStatus.AWS.ResourceTags {
if len(t.Key) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platformStatus.AWS.ResourceTags.Key has OpenAPI schema validations which exclude empty value:

    // +kubebuilder:validation:MinLength=1
    // +kubebuilder:validation:MaxLength=128
    // +kubebuilder:validation:Pattern=`^[0-9A-Za-z_.:/=+-@]+$`

So, this is redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. Removed the if block.

if _, exists := tagMap[t.Key]; !exists {
tagMap[t.Key] = t.Value
}
}
}
}

// Convert map back to a slice
var tags []string
for key, value := range tagMap {
tags = append(tags, fmt.Sprintf("%s=%s", key, value))
}

return tags
}
Loading