Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Parallelize EnsureCleanState for e2e tests, adding timing information #20998

Conversation

andrii-korotkov-verkada
Copy link
Contributor

@andrii-korotkov-verkada andrii-korotkov-verkada commented Nov 29, 2024

Closes #20968
Closes #20967

A number of cleanup and create steps can be done in parallel, reducing the total runtime. Gather timing information as well and log it.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

…rmation

Closes argoproj#20968
Closes argoproj#20967

A number of cleanup and create steps can be done in parallel, reducing the total runtime. Gather timing information as well and log it.

Signed-off-by: Andrii Korotkov <[email protected]>
@andrii-korotkov-verkada andrii-korotkov-verkada requested a review from a team as a code owner November 29, 2024 04:58
Copy link

bunnyshell bot commented Nov 29, 2024

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

…n't fit in Github logs and gets truncated

Signed-off-by: Andrii Korotkov <[email protected]>
Signed-off-by: Andrii Korotkov <[email protected]>
…o server watcher seems to fail

Signed-off-by: Andrii Korotkov <[email protected]>
Copy link

codecov bot commented Nov 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.02%. Comparing base (5fc306e) to head (e2f7cdd).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #20998      +/-   ##
==========================================
- Coverage   55.07%   55.02%   -0.05%     
==========================================
  Files         324      324              
  Lines       55472    55472              
==========================================
- Hits        30550    30524      -26     
- Misses      22308    22327      +19     
- Partials     2614     2621       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andrii-korotkov-verkada
Copy link
Contributor Author

Seems to save a couple of minutes. The savings aren't as big as expected, since login as admin and config maps updates are taking some time, and the later doesn't parallelize well (sequential time is similar to parallel time).

Screenshot 2024-11-29 at 7 52 22 AM
Screenshot 2024-11-29 at 7 52 31 AM

@crenshaw-dev
Copy link
Member

Can we remove the timing info code to simplify things now that you've done the initial speedup/analysis?

@andrii-korotkov-verkada
Copy link
Contributor Author

Can we remove the timing info code to simplify things now that you've done the initial speedup/analysis?

I'm not done yet :) I'm trying another commit to only do updates when necessary for slowest parts.
I'd rather keep them for me or other people who wanna optimize further, as it's not much added complexity. If you insist, I can remove though.

@crenshaw-dev
Copy link
Member

I'd only leave it in place if there's also documentation on how to use the logging for further optimization and if there's a strong likelihood that someone will actually use it to significantly improve performance. If the upper bound to improvement is ~30s, I say kill the code.

@andrii-korotkov-verkada
Copy link
Contributor Author

Okay, I'll clean it up when done with optimizations here.

@andrii-korotkov-verkada
Copy link
Contributor Author

andrii-korotkov-verkada commented Nov 29, 2024

Yay, best case time for login as admin was reduced to ~2ms instead of ~500-600ms due to skipping re-login if not necessary. Config map reset was reduced to ~20-200ms instead of ~800-1000ms due to skipping update if equivalent.
Screenshot 2024-11-29 at 11 36 40 AM

@andrii-korotkov-verkada
Copy link
Contributor Author

That's another ~4 min of savings, bringing total savings to ~6 min.

Screenshot 2024-11-29 at 12 05 01 PM
Screenshot 2024-11-29 at 12 05 26 PM

@andrii-korotkov-verkada
Copy link
Contributor Author

About the same times
Screenshot 2024-11-29 at 2 27 22 PM
Screenshot 2024-11-29 at 2 27 36 PM

@andrii-korotkov-verkada
Copy link
Contributor Author

@crenshaw-dev, I think this is ready to go now.

wg.Add(1)
go func() {
defer wg.Done()
function()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we relying on os.Exits in the parallel functions for error handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's a good point, probably need to return the errors to the main function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that easy. I've checked and it's relying on os.Exit after logging Fatal. Is that okay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've figured out a way.

@andrii-korotkov-verkada
Copy link
Contributor Author

With the latest update, it seems to be only a tiny bit faster, maybe winning like 20 seconds.

Before:
Screenshot 2024-12-01 at 4 36 34 PM
Screenshot 2024-12-01 at 4 36 39 PM

After:
Screenshot 2024-12-01 at 6 21 16 PM
Screenshot 2024-12-01 at 6 21 22 PM

"delete_applications_resources_namespace": func() error {
// Delete the applicationset-e2e namespace, if it exists
err := fixtureClient.KubeClientset.CoreV1().Namespaces().Delete(context.Background(), ApplicationsResourcesNamespace, v1.DeleteOptions{PropagationPolicy: &policy})
if err != nil && !strings.Contains(err.Error(), "not found") { // 'not found' error is expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there's a k8s library function that checks whether an error is a "not found" error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use that

@crenshaw-dev
Copy link
Member

So total savings here is around 5min?

@andrii-korotkov-verkada
Copy link
Contributor Author

About 6-7 minutes.

@andrii-korotkov-verkada
Copy link
Contributor Author

I've merged master in and let's wait for the test run.

Signed-off-by: Andrii Korotkov <[email protected]>
@andrii-korotkov-verkada
Copy link
Contributor Author

@crenshaw-dev, new results are in, with this PR we are down to ~30 min to run e2e tests.

Screenshot 2024-12-02 at 7 57 24 AM
Screenshot 2024-12-02 at 7 57 29 AM

@andrii-korotkov-verkada
Copy link
Contributor Author

For some reason the tests for v1.29 and v1.28 were slower.

Copy link
Member

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last requested change :-)

test/e2e/fixture/util.go Outdated Show resolved Hide resolved
util/errors/errors.go Outdated Show resolved Hide resolved
@crenshaw-dev crenshaw-dev enabled auto-merge (squash) December 2, 2024 22:47
Signed-off-by: Andrii Korotkov <[email protected]>
@crenshaw-dev crenshaw-dev merged commit bd5d76f into argoproj:master Dec 2, 2024
26 checks passed
@andrii-korotkov-verkada andrii-korotkov-verkada deleted the 20968-parallelize-ensure-clean-state-for-e2e-tests branch December 2, 2024 23:29
adriananeci pushed a commit to adriananeci/argo-cd that referenced this pull request Dec 4, 2024
…rmation (argoproj#20998)

* chore: Parallelize EnsureCleanState for e2e tests, adding timing information

Closes argoproj#20968
Closes argoproj#20967

A number of cleanup and create steps can be done in parallel, reducing the total runtime. Gather timing information as well and log it.

Signed-off-by: Andrii Korotkov <[email protected]>

* Add underscores to function map keys

Signed-off-by: Andrii Korotkov <[email protected]>

* Have separate log statements for timing fields, since everything doesn't fit in Github logs and gets truncated

Signed-off-by: Andrii Korotkov <[email protected]>

* Re-run tests

Signed-off-by: Andrii Korotkov <[email protected]>

* Fix duplicate removal attempts, cleanup things in namespaces before removing namespaces

Signed-off-by: Andrii Korotkov <[email protected]>

* Combine config map cleanups, since they seem to have some kind of locking based on timings

Signed-off-by: Andrii Korotkov <[email protected]>

* Re-order some cleanups to not wait if something is already covered by no wait commands

Signed-off-by: Andrii Korotkov <[email protected]>

* Move rm temp dir to be together with re-creation, since otherwise repo server watcher seems to fail

Signed-off-by: Andrii Korotkov <[email protected]>

* Only update config maps and login as admin if there were any changes

Signed-off-by: Andrii Korotkov <[email protected]>

* Also check for user being logged in

Signed-off-by: Andrii Korotkov <[email protected]>

* Improve config maps equivalency check

Signed-off-by: Andrii Korotkov <[email protected]>

* Remove timing information

Signed-off-by: Andrii Korotkov <[email protected]>

* Improve loops with sleep

Signed-off-by: Andrii Korotkov <[email protected]>

* Properly propagate and handle errors for parallel functions

Signed-off-by: Andrii Korotkov <[email protected]>

* Add missing CheckError

Signed-off-by: Andrii Korotkov <[email protected]>

* Fix import

Signed-off-by: Andrii Korotkov <[email protected]>

* Fix imports linter error (try 2)

Signed-off-by: Andrii Korotkov <[email protected]>

* Parallelize EnsureCleanState for application sets

Signed-off-by: Andrii Korotkov <[email protected]>

* Fix errors.IsNotFound

Signed-off-by: Andrii Korotkov <[email protected]>

* Use errgroup to run functions in parallel

Signed-off-by: Andrii Korotkov <[email protected]>

* Remove name keys for functions as unused

Signed-off-by: Andrii Korotkov <[email protected]>

* Use apierr directly

Signed-off-by: Andrii Korotkov <[email protected]>

---------

Signed-off-by: Andrii Korotkov <[email protected]>
Signed-off-by: Adrian Aneci <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
2 participants