Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: when multiple (10-20 jobs) JTE jobs are running at the same time they are getting stuck in the JTE library load part for around 10m-15m #305

Open
boazetsec opened this issue Jul 24, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@boazetsec
Copy link

Jenkins Version

Jenkins 2.346.1

JTE Version

2.4

Bug Description

We're experiencing the same issue as well, we've noticed this is happening to us mainly when multiple (10-20 jobs) JTE jobs are running at the same time, and when the jobs are executed they are getting stuck in the JTE library load part for around 10m-15m which is a major pain.
I'm trying to understand where is the bottleneck that is causing this but haven't found it yet, the disk utilization seems normal re IOPS / throughput, is there some common resource that is being used on the Jenkins server by all the jobs and they are waiting for each other to finish? I'd appreciate the help as we are neck deep in the implementation with JTE and this is becoming a serious issue for us, thanks!

image

Relevant log output

No response

Steps to Reproduce

Execute multiple JTE jobs at the same time

@boazetsec boazetsec added the bug Something isn't working label Jul 24, 2022
@steven-terrana
Copy link

steven-terrana commented Jul 25, 2022

Thanks for submitting.
Where are you fetching the libraries from? (GitHub, GitLab, etc)
if IOPs are looking good then the only place this could be happening is the network.

I wonder if many concurrent requests to clone the same repository are resulting in rate limiting.

JTE uses Jenkins APIs to fetch from the remote repository so there's not a ton that can be done here unless JTE implements some custom caching.

An alternative approach that might help would be to package your libraries into a stand-alone Jenkins plugin so that you're not pulling from the remote repository every time.

A release of the Gradle JTE Plugin is pending

@boazetsec
Copy link
Author

@steven-terrana Thanks for the reply, regarding IOPS and throughput we are pretty safe and utilizing only small percentage of it.
We thought it might be related to rate limit as you've suggested (we are using bitbucket cloud), if we will hit the rate limit is there any indication for that? is there a retry mechanism implemented?
Also, is it one api call for fetching the git repo of the JTE library per a job? if so I don't think it will be cause us to reach the rate limit in this case.

As for the network, we didn't see anything maxing out there but once it happens again I'll double check.

@ravindrakmr
Copy link

I am also facing the similar kind of issue where libraries and configuration load is taking around 2 mins for a single job. I am using Bitbucket cloud and loading around 3 configurations and 10 libraries. Any suggestion ?

@asknet
Copy link

asknet commented Dec 14, 2022

I also faced this issue. Occasionally the load part takes 10-20 mins.

@ravindrakmr
Copy link

ravindrakmr commented Dec 22, 2022

Hi @steven-terrana , Thanks for mentioning the gradle plugin. This gradle JTE plugin looks good but it will require a Jenkins restart whenever we will upgrade the libraries because we will have to update the plugin. It's a good option if your libraries have been settled but may not be preferrable if you are still maturing the libraries to meet the demands of different projects. So as a different option, can we make templating engine plugin downloading all the libraries together somehow in one go from remote repository instead of making multiple remote calls ? Just throwing an idea.

@thenam153
Copy link

thenam153 commented Dec 22, 2023

I'm getting same issue, when start build with 1000 service concurrently, action clone libraries JTE very slow and get me stuck from 30 - 60m, it is so crazy, take over time from me.
And when that happening, i see many step fetch, clone repo from SCM (gitlab self host)

git fetch --tags --force --progress --prune -- origin +refs/heads/stable:refs/remotes/origin/stable
/bin/sh /mnt/data-0/jenkins/caches/git-xxxxxxxxxxxxxxxx@tmp/jenkins-gitclient-xxxxxxx.sh-copy -o SendEnv=GIT_PROTOCOL [email protected] git-upload-pack 'xxxxx/template-jenkins-xxxx.git'
ssh -i /mnt/data-0/jenkins/caches/git-xxxxxxxxxxxxxxxx@tmp/jenkins-gitclient-xxxxxxxx.key -l git -o StrictHostKeyChecking=accept-new -o HashKnownHosts=yes -o SendEnv=GIT_PROTOCOL [email protected] git-upload-pack 'xxxxx/template-jenkins-xxxx.git'

Any idea can help me?

@rodridevops
Copy link

Same issue here.

@arechavarria
Copy link

#340 Changes have been added via this merge request that improve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants