Skip to content

Commit

Permalink
Sleep and retry on 503 when downloading files
Browse files Browse the repository at this point in the history
  • Loading branch information
centic9 committed Mar 28, 2024
1 parent ed9447d commit 429c9ad
Showing 1 changed file with 11 additions and 1 deletion.
12 changes: 11 additions & 1 deletion src/main/java/org/dstadler/commoncrawl/Utils.java
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,17 @@ public static File downloadFileFromCommonCrawl(CloseableHttpClient httpClient, S
} catch (IOException e) {
// retry once for HTTP 500 that we see sometimes
if(e.getMessage().contains("HTTP StatusCode 500")) {
downloadFileFromCommonCrawl(httpClient, url, header, useWARC, destFile);
downloadFileFromCommonCrawl(httpClient, url, header, useWARC, destFile);
} else if(e.getMessage().contains("HTTP StatusCode 503")) {
log.info("Sleeping 120 seconds before retrying to reduce request rate");

try {
Thread.sleep(120_000);
} catch (InterruptedException ex) {
throw new RuntimeException(ex);
}

downloadFileFromCommonCrawl(httpClient, url, header, useWARC, destFile);
} else {
throw e;
}
Expand Down

0 comments on commit 429c9ad

Please sign in to comment.