Reduce polling times to prevent accidental DoS #33

abought · 2024-04-19T20:21:06Z

Purpose

Certain commands poll the server as often as every 5 seconds. During periods of high server load and/or long queues, this can effectively cause imputationbot to denial of service (DoS) the server.

Suggested changes

Reduce polling times on "download once job finished" and "retry after 500 error" to 60 seconds. This should give more time to resolve temporary server downtime issues due to slow queries, GC pauses, or connection pool saturation.

In some cases, it will reduce routine server load by > 90%.

Notes

Open questions

Proof of concept patch is below. Very limited testing: verify these are the correct constants for the intended behaviors.

In particular I'm not sure on the organizational level of project, and want to be sure that sleep timers aren't being double counted. Unclear whether these can be abstracted to a new shared constant, or if there's a reason for the different timers.

Future changes

The "download when job complete" check could be extended past 1 minute.

For 500 errors, a capped exponential backoff might be preferred, but would entail larger refactoring.

Proposed patch

diff --git a/src/main/java/genepi/imputationbot/client/CloudgeneClient.java b/src/main/java/genepi/imputationbot/client/CloudgeneClient.java
index d03dd80..1479016 100644
--- a/src/main/java/genepi/imputationbot/client/CloudgeneClient.java
+++ b/src/main/java/genepi/imputationbot/client/CloudgeneClient.java
@@ -30,7 +30,7 @@ public class CloudgeneClient {
 
 	public static int MAX_ATTEMPTS = 5;
 
-	public static int WAIT_BETWEEN_ATTEMPTS_SEC = 5;
+	public static int WAIT_BETWEEN_ATTEMPTS_SEC = 60;
 
 	public static final String USER_AGENT = "imputation-bot " + App.VERSION + " (OS: " + OperatingSystem.NAME
 			+ ", Java: " + System.getProperty("java.version") + ")";
@@ -134,7 +134,7 @@ public class CloudgeneClient {
 	}
 
 	public void waitForProject(Project project) throws CloudgeneException, InterruptedException {
-		waitForProject(project, 10000);
+		waitForProject(project, 60000);
 	}
 
 	public void waitForProject(Project project, int pollingTime) throws CloudgeneException, InterruptedException {
@@ -152,7 +152,7 @@ public class CloudgeneClient {
 
 	public void waitForJob(String id) throws CloudgeneException, InterruptedException {
 		// noinspection StatementWithEmptyBody
-		while (!waitForJob(id, 10000)) {
+		while (!waitForJob(id, 60000)) {
 		}
 	}
 
diff --git a/src/main/java/genepi/imputationbot/commands/DownloadResults.java b/src/main/java/genepi/imputationbot/commands/DownloadResults.java
index ece3a64..452de32 100644
--- a/src/main/java/genepi/imputationbot/commands/DownloadResults.java
+++ b/src/main/java/genepi/imputationbot/commands/DownloadResults.java
@@ -62,7 +62,7 @@ public class DownloadResults extends BaseCommand {
 			if (job.isRunning()) {
 				println("Job " + job.getId() + " is running. Download starts automatically when job is finished...");
 				// noinspection StatementWithEmptyBody
-				while(!client.waitForJob(job.getId(), 30 * 1000)){}
+				while(!client.waitForJob(job.getId(), 60 * 1000)){}
 
 				job = client.getJobDetails(job.getId());

The text was updated successfully, but these errors were encountered:

abought mentioned this issue Apr 19, 2024

Reduce job status page refresh rate genepi/cloudgene#149

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce polling times to prevent accidental DoS #33

Reduce polling times to prevent accidental DoS #33

abought commented Apr 19, 2024

Reduce polling times to prevent accidental DoS #33

Reduce polling times to prevent accidental DoS #33

Comments

abought commented Apr 19, 2024

Purpose

Suggested changes

Notes

Open questions

Future changes

Proposed patch