Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add TERM handling #375

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions client/src/main/java/hudson/plugins/swarm/Client.java
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.Set;
import java.util.logging.Level;
import java.util.logging.Logger;

Expand Down Expand Up @@ -187,6 +188,34 @@ private static void validateOptions(Options options) {
*/
static void run(SwarmClient swarmClient, Options options, String... args)
throws InterruptedException {

// Gracefully handle TERM request
Runtime.getRuntime().addShutdownHook(new Thread() {
@Override
public void run() {
logger.info("Interrupting threads");
Set<Thread> runningThreads = Thread.getAllStackTraces().keySet();
for (Thread th : runningThreads) {
if (th != Thread.currentThread() && !th.isDaemon()
&& th.getClass().getName().startsWith("hudson.plugins.swarm")) {
logger.info("Interrupting '" + th.getClass() + "' termination");
th.interrupt();
Comment on lines +201 to +202
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you verified, when sending a TERM signal to a connected Swarm client, that this if statement is true and that the code inside the if statement is executed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, so I've confirmed that doing a kill <PID> results in the log line being output:

java -jar client/target/swarm-client.jar -description "fatmcgav swarm test" -disableClientsUniqueId -executors 1 -fsroot jenkins -retry 6 -maxRetryInterval 160 -retryBackOffStrategy exponential -mode exclusive -username local-swarm -password **** -deleteExistingClients -name fatmcgav-swarm-test -master https://*** -username local-swarm -webSocket -labels swarm -labels x86_64
Oct 19, 2021 9:51:54 PM hudson.plugins.swarm.Client logArguments
INFO: Client invoked with: -deleteExistingClients true -description fatmcgav swarm test -disableClientsUniqueId true -executors 1 -fsroot jenkins -labels [swarm, x86_64] -maxRetryInterval 160 -mode exclusive -name fatmcgav-swarm-test -password ***** -retry 6 -retryBackOffStrategy EXPONENTIAL -url https://*** -username ***** -webSocket true
Oct 19, 2021 9:51:54 PM hudson.plugins.swarm.Client run
INFO: Connecting to Jenkins controller
Oct 19, 2021 9:51:54 PM hudson.plugins.swarm.Client run
INFO: Attempting to connect to https://***
...
Oct 19, 2021 9:52:05 PM hudson.plugins.swarm.Client run
WARNING: Remaining retries: 4
Oct 19, 2021 9:52:05 PM hudson.plugins.swarm.Client run
INFO: Retrying in 20 seconds
Oct 19, 2021 9:52:05 PM hudson.plugins.swarm.Client$1 run
INFO: Interrupting threads

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, so I've confirmed that doing a kill <PID> results in the log line being output:

java -jar client/target/swarm-client.jar -description "fatmcgav swarm test" -disableClientsUniqueId -executors 1 -fsroot jenkins -retry 6 -maxRetryInterval 160 -retryBackOffStrategy exponential -mode exclusive -username local-swarm -password **** -deleteExistingClients -name fatmcgav-swarm-test -master https://*** -username local-swarm -webSocket -labels swarm -labels x86_64
Oct 19, 2021 9:51:54 PM hudson.plugins.swarm.Client logArguments
INFO: Client invoked with: -deleteExistingClients true -description fatmcgav swarm test -disableClientsUniqueId true -executors 1 -fsroot jenkins -labels [swarm, x86_64] -maxRetryInterval 160 -mode exclusive -name fatmcgav-swarm-test -password ***** -retry 6 -retryBackOffStrategy EXPONENTIAL -url https://*** -username ***** -webSocket true
Oct 19, 2021 9:51:54 PM hudson.plugins.swarm.Client run
INFO: Connecting to Jenkins controller
Oct 19, 2021 9:51:54 PM hudson.plugins.swarm.Client run
INFO: Attempting to connect to https://***
...
Oct 19, 2021 9:52:05 PM hudson.plugins.swarm.Client run
WARNING: Remaining retries: 4
Oct 19, 2021 9:52:05 PM hudson.plugins.swarm.Client run
INFO: Retrying in 20 seconds
Oct 19, 2021 9:52:05 PM hudson.plugins.swarm.Client$1 run
INFO: Interrupting threads

Do you see a line above that says "Interrupting '[…]' termination"? I can't see one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point... I'll do some debugging...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I ran some tests locally, and even with the swarm-client successfully connecting to a Jenkins controller, I don't see any threads being interrupted.

I've put a couple of thread dump here: https://gist.github.com/fatmcgav/f848c071bbf5f650408c3e7f0b45238a

The first one is just with the agent idle and connected to Jenkins, the 2nd with a simple sleep script running...

It looks like the runningThreads set never gets populated with anything based on adding some additional logging...

So is it safe just to remove the logic? Or should the threads be getting interrupted?

}
}
for (Thread th : runningThreads) {
try {
if (th != Thread.currentThread() && !th.isDaemon() && th.isInterrupted()) {
logger.info("Waiting '" + th.getName() + "' termination");
th.join();
}
} catch (InterruptedException ex) {
logger.info("Shutdown interrupted");
}
}
logger.info("Shutdown finished");
}
});

logger.info("Connecting to Jenkins controller");
URL url = swarmClient.getUrl();

Expand Down