Jenkins Slave is not created / container is not started although it's created (stuck in "provisioning" forever) #268

steffen-wilke · 2019-07-18T10:22:24Z

We're using the yad-plugin to provide on-the-fly docker build containers on a single Docker Cloud (Windows Server 2019). In general, this works just fine but recently I've observed an issue that occurs mostly when multiple jobs are triggered at the same time. This sometimes happens for us when we trigger multiple (2) down-stream jobs after a successful run of a parent job but also when an SCM change triggers multiple jobs at once. The issue is that some containers for triggered jobs are created but never connected as Jenkins slaves.

What happens:

The plugin creates new containers by the requirements for each of the triggered jobs
- executing docker container ls -a lists a bunch of new containers with status Created
For most of the created jobs, an agent is created, the container is connected to it and the job performs just fine (containers change their status to Running and are later on properly terminated and removed after the build has been carried out).
For some of the triggered jobs through the Jenkins agents are not initialized. Instead, some jobs remain in the state of Waiting for next available executor on '{LABEL}'
- Looking into docker container ls -a on the Docker host system once again reveals that there are still some remaining (newly created) containers with a Created status.
- These containers were never run and connected to a Jenkins agent
Now, these jobs that weren't carried out remain in the Waiting for next available executor on '{LABEL}' state until another job with that LABEL gets triggered. Then they will "steal" the agent for that new job and the new job will remain in the "Waiting" state. They change their state at some point to All nodes of label '{LABEL}' are offline. but don't trigger a container initialization again. Only after the configured timeout (10 mins in our case) the plugin seems to request another container for the stuck job.

I think there might be a general problem with multiple jobs requesting a new build container at (roughly) the same time. This only happens sporadically though. Most of the time triggering multiple jobs at the same time works just fine.

The text was updated successfully, but these errors were encountered:

steffen-wilke · 2019-07-23T08:16:15Z

I was able not to reproduce this reliably:

Note that all jobs in the steps-to-reproduce request a node of the same label.

Trigger a job that requires multiple nodes or trigger multiple jobs at the same time -> we have an SCM trigger that triggers 2 jobs at a time (Job A and Job B); multiple downstream jobs would also cause the same effect
While the provisioning is in process: Start another job that requires a new node (Job C)
Job C will try to "steal" a node that was originally created for Job A or B
Depending on which node was "stolen" either Job A or B get stuck

steffen-wilke · 2019-08-05T13:38:24Z

This is somewhat related to #74
and jenkinsci/docker-plugin#427

steffen-wilke · 2019-08-05T13:51:17Z

Additional Note: If such an incident occurs, it is tracked by the Cloud Statistics as "stuck" in the Provisioning phase.

Examples: (Note the entries below the 2nd)

Looking at the docker host system (via docker container ls -a), there is always a container in the Created state for these cases:

steffen-wilke · 2019-08-20T07:07:17Z

To me this issue sounds very similar to: jenkinsci/docker-plugin#594

@KostyaSha Do you have any thoughts on this? Would very much appreciate your opinion here since I'm currently a bit puzzled on what could be the solution to this.

KostyaSha · 2019-10-14T09:29:32Z

so they were created but didn't spin and connect?

steffen-wilke · 2019-10-15T11:35:11Z

so they were created but didn't spin and connect?

Exactly.

rdevries · 2019-10-22T08:55:35Z

We are having the same problem. The problem for us started when we updated the ssh-slaves-plugin. At first we thought it was because of https://issues.jenkins-ci.org/browse/JENKINS-58340 but it still doesn't work. Perhapse these issues are related?

steffen-wilke changed the title ~~Jenkins Slave is not created although the container is created~~ Jenkins Slave is not created although the container is created for multiple triggered jobs Jul 18, 2019

steffen-wilke changed the title ~~Jenkins Slave is not created although the container is created for multiple triggered jobs~~ Jenkins Slave is not created / container is not started although it's created (stuck in "provisioning" forever) Aug 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jenkins Slave is not created / container is not started although it's created (stuck in "provisioning" forever) #268

Jenkins Slave is not created / container is not started although it's created (stuck in "provisioning" forever) #268

steffen-wilke commented Jul 18, 2019 •

edited

Loading

steffen-wilke commented Jul 23, 2019

steffen-wilke commented Aug 5, 2019

steffen-wilke commented Aug 5, 2019

steffen-wilke commented Aug 20, 2019

KostyaSha commented Oct 14, 2019

steffen-wilke commented Oct 15, 2019

rdevries commented Oct 22, 2019

Jenkins Slave is not created / container is not started although it's created (stuck in "provisioning" forever) #268

Jenkins Slave is not created / container is not started although it's created (stuck in "provisioning" forever) #268

Comments

steffen-wilke commented Jul 18, 2019 • edited Loading

steffen-wilke commented Jul 23, 2019

steffen-wilke commented Aug 5, 2019

steffen-wilke commented Aug 5, 2019

steffen-wilke commented Aug 20, 2019

KostyaSha commented Oct 14, 2019

steffen-wilke commented Oct 15, 2019

rdevries commented Oct 22, 2019

steffen-wilke commented Jul 18, 2019 •

edited

Loading