Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrent namespace reconciliation to ensure openshift rbac requisites #2468

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

anithapriyanatarajan
Copy link
Contributor

@anithapriyanatarajan anithapriyanatarajan commented Dec 18, 2024

Changes

During upgrade of pipelines in OpenShift cluster there are additional rbac related reconciliation to be done across all namespaces in the cluster. Right now the namespace reconciliation happens one at a time in a for loop. This results in longer upgrade time. This PR improvises this by

  • Introducing go routines to concurrently process the reconciliation along with a configurable concurrency limit defaulted to 25 using a semaphore which acts as counter to control the max number of go routines.
  • Refactored - Utilised Patch in place of Update
  • Improved code modularity

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Release Notes

NONE

@tekton-robot tekton-robot added the release-note-none Denotes a PR that doesnt merit a release note. label Dec 18, 2024
@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign concaf after the PR has been reviewed.
You can assign the PR to them by writing /assign @concaf in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 18, 2024
@tekton-robot
Copy link
Contributor

Hi @anithapriyanatarajan. Thanks for your PR.

I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 18, 2024
@pratap0007
Copy link
Contributor

/ok-to-test

@tekton-robot tekton-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 19, 2024
withError = true
logger.Errorf("failed to ensure default SA in namespace %s, %v", ns.Name, err)
}
go func(ns corev1.Namespace) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you separate the worker logic for a single namespace in a function for readability and testing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a separate function processNamespace that includes the namespace scoped reconciliation steps.

pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
@@ -393,76 +394,116 @@ func (r *rbac) createResources(ctx context.Context) error {
return err
}

// Semaphore to limit concurrency (max 100 goroutines at a time)
sem := make(chan struct{}, 100)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will need a configurable thing here, with a default of 50 for example

Copy link
Contributor Author

@anithapriyanatarajan anithapriyanatarajan Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been made configurable by introducing a new param rbacConcurrencyLimit in TektonConfig. It is defaulted to 25. While testing in CI cluster 50 gave throttling warnings
Screenshot From 2024-12-20 21-04-30

pkg/reconciler/openshift/tektonconfig/rbac.go Outdated Show resolved Hide resolved
}

// Locking section to ensure clusterRoleBinding is updated by one goroutine at a time
mu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the mutex only for clusterinterceptorbinding operation only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed mutex since the code is refactored to update clusterrolebinding finally for all namespaces

updatedNS.SetLabels(nsLabels)

// Update the namespace with set labels
if _, err = r.kubeClientSet.CoreV1().Namespaces().Update(ctx, updatedNS, metav1.UpdateOptions{}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just update the Ns, no need for unreconciledNamespaces, especially we are using the same mutex for different operation, so better avoid this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed all references to unreconciledNamespaces and used Patch instead of Update

@anithapriyanatarajan anithapriyanatarajan force-pushed the openshift-rbac-concurrent-ns-reconcilation branch from 836f97a to 397b47a Compare December 19, 2024 12:50
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question (on this and in some other places in general), shouldn't we use PATCH more often ? It would reduce the possibility to get "this object ways already update" type of failures.


// Locking section to ensure clusterRoleBinding is updated by one goroutine at a time
mu.Lock()
if err := r.ensureClusterRoleBindings(ctx, sa); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this generate a lot of API calls (possibly, on get and one update or create) where we could gather the namespaces and update the ClusterRoleBinding at the end (no matter if some namespaces failed — we would only use the ones that didn't faiil)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised the code to address the comment by collecting all subjects and making a single call to update the ClusterRoleBinding, rather than performing a read/update for each subject on per namespace basis.
TOBE.pdf
AS_IS_Flow.pdf

@anithapriyanatarajan anithapriyanatarajan force-pushed the openshift-rbac-concurrent-ns-reconcilation branch from 397b47a to 1007e40 Compare December 20, 2024 15:26
@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 20, 2024
@tekton-robot
Copy link
Contributor

The following is the coverage report on the affected files.
Say /test pull-tekton-operator-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/common/utils.go 85.7% 83.3% -2.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesnt merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants