Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-73172] Reuse credentials object reference through scans to avoid frequent duplicated lookups #787

Merged
merged 16 commits into from
Oct 16, 2024

Conversation

Dohbedoh
Copy link
Contributor

@Dohbedoh Dohbedoh commented May 15, 2024

Description

Keep a threadlocal cache of scan credentials to avoid credentials lookup storm during branch indexing / org scan. See
JENKINS-73172 for further information.

Submitter checklist

  • Link to JIRA ticket in description, if appropriate.
  • Change is code complete and matches issue description
  • Automated tests have been added to exercise the changes
  • Reviewer's manual test instructions provided in PR description. See Reviewer's first task below.

Reviewer checklist

  • Run the changes and verify that the change matches the issue description
  • Reviewed the code
  • Verified that the appropriate tests have been written or valid explanation given

Documentation changes

  • Link to jenkins.io PR, or an explanation for why no doc changes are needed

Users/aliases to notify

@Dohbedoh Dohbedoh requested a review from a team as a code owner May 15, 2024 04:26
@Dohbedoh Dohbedoh changed the title [SECO-3724] Cache scan credentials in ThreadLocal to avoid frequent d… [JENKINS-73172] Cache scan credentials in ThreadLocal to avoid frequent d… May 15, 2024
@Dohbedoh Dohbedoh marked this pull request as draft May 15, 2024 04:26
@jglick jglick changed the title [JENKINS-73172] Cache scan credentials in ThreadLocal to avoid frequent d… [JENKINS-73172] Cache scan credentials in ThreadLocal to avoid frequent duplicated lookups May 15, 2024
Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not smell like the right fix. Rather, scan credentials should be looked up once at the beginning of branch indexing (or some similar identifiable task associated with a single credentialsId) and the StandardCredentials object should be passed directly to all the methods that would need it.

@Dohbedoh
Copy link
Contributor Author

Dohbedoh commented May 16, 2024

Right, it's not the right fix. GitHubSCMNavigator / GitHubSCMSource should mayby carry this. The SCMNavigator is going to create a lot of SCMSources though, but I think that it might doing within the same thread context. Will need to see how efficient that would be.


And need to rework a bit the GitHubSCMBuilder internals.

For the failing tests, makes sense that they fail here but the fact that the constructor GitHubSCMBuilder that accepts an SCMSource can lookup SSH Username and Private credentials is wrong I don't this that those tests are correct..

Those tests assume that getting a GitHubSCMBuilder from a GitHubSCMSource that has a scan credentials of type SSH should give you a GitSCMBuilder with an SSH remote. That can't be possible. The GitHubSCMSource only accepts StandardUsernamePasswordCredentials as per Connector#githubScanCredentialsMatcher:

So actually you cannot select a SSH Username and Passkey credentials to an SCMSource and if you did set an ID (through CasC maybe) it would not be found through the connector and that would just not work.

All the source.setCredentialsId("user-key"); seems wrong. What it should be doing is a set the credentials using the instance of the builder with GitSCMBuilder.withCredentials("user-key", null); or GitSCMBuilder.withCredentials("user-key", GitHubSCMBuilder.SSH);

The scenario - and probably the reason this code is here - is SSHCheckoutTrait in which case we need to decorate the builder.

The GitHubSCMBuilder should have simpler methods, i.e. a withResolver and already inherit a withCredentials from GitSCMBuilder.

@Dohbedoh
Copy link
Contributor Author

Discussing with @daniel-beck We'd need to take care of some expiration for GitHub App Creds. Based on the stale time window the cache token: https://github.com/jenkinsci/github-branch-source-plugin/blob/master/src/main/java/org/jenkinsci/plugins/github_branch_source/GitHubAppCredentials.java#L384-L435.

@Dohbedoh
Copy link
Contributor Author

In the context of the OrganizationScan, this could be more efficient. The SCMNavigator creates SCMSources for each repo and we change context every time. So we are still doing a lot of lookups for the same credentials. As @jglick we need to pass along the credentials or an instance of some expiring credentials object.

@jglick
Copy link
Member

jglick commented May 16, 2024

I do not think GitHubAppCredentials needs any special treatment? If you just look up the Credentials object at the start of an org scan or whatever, the App credentials should automatically refresh its token after 45m or whatever if the scan is still going on that long.

@Dohbedoh
Copy link
Contributor Author

Dohbedoh commented May 17, 2024

I see. It actually refreshes automatically through the refresh token.
@jglick Do you think that there is any scenario that would require an expiry if we are using this approach ? Or using the same credentials for an entire Org scan is acceptable ? Maybe we could add a feature flag to be able to disable that "cache" just in case ?

@Dohbedoh
Copy link
Contributor Author

Added a system property to disable it. Just in case.

@Dohbedoh Dohbedoh marked this pull request as ready for review May 17, 2024 02:20
@jglick
Copy link
Member

jglick commented May 17, 2024

using the same credentials for an entire Org scan is acceptable

So far as I know that would be fine.

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still -0 as per #787 (review): no cache system should be necessary, the credentials object should just be computed before the scan begins and passed along to whatever methods need it.

@Dohbedoh
Copy link
Contributor Author

Dohbedoh commented May 20, 2024

The context of the scan is not really passed between SCMNavigator / SCMSource. The Org Scan and BranchIndexing both start with retrieving actions. So we should lookup credentials every time we go through this SCMNavigator#retrieveActions / SCMSource#retrieveActions:

Now for the scanning after a SCM Events, it actually depends. Sometimes it does not go through SCMNavigator#retrieveActions / SCMSource#retrieveActions but straight to fetching the sources / branches:

I assume that we do want to do a scan at the beginning of a scan whether it is a branch indexing, an organization scan or a received event. If we make a change to a credentials, the next "ignition" of a scan should pick it up whatever the source. (Which was one of the reason for the ThreadLocal in the first place, whatever the origin, a different thread would run it from start to finish). So we will need to do a force a lookup in SCMSource#retrieveActions / SCMSource#retrieve and SCMNavigator#retrieveActions / SCMNavigator#visitSources. Even though in some case we will go through both.

We need to at least be able from an SCMSource#retrieve to know if we have originate from an organization scan. It is not simple to know from the Branch API and the SCM API which event the whole fetch originates from...
One way I found to do this is that to check that the SCMHeadObserver is of type SCMHeadObserver.Any as per https://github.com/jenkinsci/branch-api-plugin/blob/af810c56e89576e40d69a3a98ce7961a5fbb75a8/src/main/java/jenkins/branch/MultiBranchProjectFactory.java#L262 or if the even is null...

@jglick
Copy link
Member

jglick commented May 20, 2024

The context of the scan is not really passed between…

Sounds like a fix would be more straightforward if scm-api and/or branch-api were improved to define some sort of abstract context (bag) which the impl could use to stash things like Credentials across an entire operation?

I do not really have time to review this in detail, so I am hoping someone (else) in @jenkinsci/github-branch-source-plugin-developers is able to review and merge.

@jglick jglick requested a review from a team May 20, 2024 12:02
@jglick jglick added the bug label May 20, 2024
@@ -598,8 +623,8 @@ public static void setCacheSize(int cacheSize) {
/** {@inheritDoc} */
@Override
public String getRemote() {
return GitHubSCMBuilder.uriResolver(getOwner(), apiUri, credentialsId)
.getRepositoryUri(apiUri, repoOwner, repository);
// Only HTTPS is applicable to the source remote with Username / Password credentials
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can be sure it is only username / password credentials?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rsandell Per my understanding, we can only have a HTTPS resolver from GitHubSCMSource. Only StandardUsernamePasswordCredentials can be selected and used to connect to the GitHub API:

I am not sure why it is not made more explicit with the typing.. But I don't see a scenario where it could be different and that makes sense since this SCM Source is using the GitHub Rest API..

The only specific scenario where we would see SSH if with the SSHCheckoutTrait and that is handled by the trait implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be an app credential couldn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be an app credential couldn't it?

Yes it is, but GitHubAppCredentials implements StandardUsernamePasswordCredentials and is used for REST calls over HTTPS as well as mentioned Alan. So "HTTPS resolver only" here looks good to me.

/** {@inheritDoc} */
@NonNull
@Override
public GitHubSCMSource build() {
GitHubSCMSource result = new GitHubSCMSource(repoOwner, projectName());
GitHubSCMSource result = new GitHubSCMSource(repoOwner, projectName(), credentials);
Copy link
Contributor

@jeromepochat jeromepochat Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to me that credentials and credentialsId could diverge. I think that credentialsId() should return credentials.getId() in case credentials is set. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Would be more consistent to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeromepochat I applied your suggestion.

@jeromepochat
Copy link
Contributor

I manually tested with org scan and multibranch pipeline. I confirm that it works fine and the caches in both GitHubSCMSource and GitHubSCMNavigator are effectives, reducing the credentials lookup from GitHub.

(some comments left but no blocker)

@jglick
Copy link
Member

jglick commented Oct 11, 2024

Recommend #787 (comment) to avoid technical debt, but a low priority compared to the bug itself. Other than that, I defer to @jeromepochat’s review; @rsandell or anyone else still reviewing this? I can physically merge, I am just not actively maintaining this.

@jglick jglick requested a review from a team October 11, 2024 17:15
@jtnord jtnord requested a review from rsandell October 11, 2024 18:15
@jglick jglick changed the title [JENKINS-73172] Reuse credentials object referencr through scans to avoid frequent duplicated lookups [JENKINS-73172] Reuse credentials object reference through scans to avoid frequent duplicated lookups Oct 16, 2024
@jglick jglick merged commit 98e3d8a into jenkinsci:master Oct 16, 2024
17 checks passed
@Dohbedoh Dohbedoh deleted the JENKINS-73172 branch October 16, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants