Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, I want the WRES to retry WRDS requests when they fail, even when the response body is in flight after tls handshake occurred and headers parsed. #73

Open
epag opened this issue Aug 20, 2024 · 16 comments

Comments

@epag
Copy link
Collaborator

epag commented Aug 20, 2024


Author Name: Hank (Hank)
Original Redmine Issue: 101945, https://vlab.noaa.gov/redmine/issues/101945
Original Date: 2022-03-01


Per investigation by Jesse, the problem presented originally in this ticket occurred because of a WRDS hiccup and the WRES not retrying the request after that hiccup. The issue occurred when WRDS was returning the response body to the WRES, after the socket connection, TLS handshake, and the headers had been received and parsed. In that case, no retry happened and the evaluation failed. This ticket can be resolved when a retry ability is implemented to cover that possibility.

Original Description is below. Thanks,

Hank

===========================================

The job:

5318596096827668180 (production)

The error (URL domain omitted):

022-03-01T16:11:52.406+0000 ERROR IngestSaver Callable task failed
wres.io.reading.IngestException: Unrecoverable exception when getting data from https://nwcal-wrds.[domain]/api/nwm2.1/v2.0/ops/medium_range/streamflow/nwm_feature_id/22743145,22743619,22743759,22744303,22745373,22751957,2275923,2277053,2277309,2277543,2278259,2279137,2281171,22848113,22850065,22865239,2287321,2287397,22878293,22893139,22900496,22904467,22904813,22904899,22911982/?forecast_type=deterministic&reference_time=%2820220130T00Z%2C20220206T00Z%5D&validTime=all
	at wres.io.utilities.WebClient.tryRequest(WebClient.java:359)
	at wres.io.utilities.WebClient.getFromWeb(WebClient.java:228)
	at wres.io.utilities.WebClient.getFromWeb(WebClient.java:186)
	at wres.io.reading.WrdsNwmReader.call(WrdsNwmReader.java:279)
	at wres.io.reading.wrds.WRDSSource.save(WRDSSource.java:131)
	at wres.io.concurrency.IngestSaver.execute(IngestSaver.java:502)
	at wres.io.concurrency.IngestSaver.execute(IngestSaver.java:30)
	at wres.io.concurrency.WRESCallable.call(WRESCallable.java:18)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.InterruptedIOException: null
	at okhttp3.internal.http2.Http2Stream.waitForIo$okhttp(Http2Stream.kt:660)
	at okhttp3.internal.http2.Http2Stream.takeHeaders(Http2Stream.kt:140)
	at okhttp3.internal.http2.Http2ExchangeCodec.readResponseHeaders(Http2ExchangeCodec.kt:97)
	at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:110)
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
	at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154)
	at wres.io.utilities.WebClient.tryRequest(WebClient.java:337)
	... 11 common frames omitted

Manual execution of the request works fine. What went wrong?

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-03-01T18:43:25Z


Is the production WRES still running? Yes, a smoke test evaluation ran successfully.

Hmmm...

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-03-01T18:47:29Z


We've run this evaluation previously several times without issue, and Alex many times more. Let me see if a rerun succeeds. Its a long evaluation, so I may not know for a couple hours.

Thanks,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-03-01T18:49:59Z


On the earlier run, I see this further down:

aused by: java.io.InterruptedIOException: timeout
	at okhttp3.internal.connection.RealCall.timeoutExit(RealCall.kt:398)
	at okhttp3.internal.connection.RealCall.callDone(RealCall.kt:360)
	at okhttp3.internal.connection.RealCall.messageDone$okhttp(RealCall.kt:309)
	at okhttp3.internal.connection.Exchange.bodyComplete(Exchange.kt:198)
	at okhttp3.internal.connection.Exchange$ResponseBodySource.complete(Exchange.kt:329)
	at okhttp3.internal.connection.Exchange$ResponseBodySource.read(Exchange.kt:305)
	at okio.RealBufferedSource$inputStream$1.read(RealBufferedSource.kt:158)
	at java.base/java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:243)
	at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:159)
	at java.base/java.util.zip.GZIPInputStream.read(GZIPInputStream.java:118)
	at java.base/java.io.FilterInputStream.read(FilterInputStream.java:107)
	at org.apache.commons.compress.utils.IOUtils.copy(IOUtils.java:95)
	at org.apache.commons.compress.utils.IOUtils.copy(IOUtils.java:70)
	at org.apache.commons.compress.utils.IOUtils.toByteArray(IOUtils.java:255)
	at wres.io.reading.waterml.WaterMLBasicSource.deserializeInput(WaterMLBasicSource.java:206)
	at wres.io.reading.waterml.WaterMLBasicSource.ingest(WaterMLBasicSource.java:238)
	at wres.io.reading.waterml.WaterMLBasicSource.saveObservation(WaterMLBasicSource.java:133)
	at wres.io.reading.BasicSource.save(BasicSource.java:39)
	at wres.io.concurrency.IngestSaver.execute(IngestSaver.java:502)
	at wres.io.concurrency.IngestSaver.execute(IngestSaver.java:30)
	at wres.io.concurrency.WRESCallable.call(WRESCallable.java:18)
	... 4 common frames omitted
Caused by: java.net.SocketException: Socket closed
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:183)
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
	at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:478)
	at java.base/sun.security.ssl.SSLSocketInputRecord.readFully(SSLSocketInputRecord.java:461)
	at java.base/sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:243)
	at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181)
	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
	at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1429)
	at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1396)
	at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:985)
	at okio.InputStreamSource.read(JvmOkio.kt:91)
	at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:129)
	at okio.RealBufferedSource.request(RealBufferedSource.kt:206)
	at okio.RealBufferedSource.require(RealBufferedSource.kt:199)
	at okio.RealBufferedSource.readHexadecimalUnsignedLong(RealBufferedSource.kt:381)
	at okhttp3.internal.http1.Http1ExchangeCodec$ChunkedSource.readChunkSize(Http1ExchangeCodec.kt:431)
	at okhttp3.internal.http1.Http1ExchangeCodec$ChunkedSource.read(Http1ExchangeCodec.kt:410)
	at okhttp3.internal.connection.Exchange$ResponseBodySource.read(Exchange.kt:281)
	... 19 common frames omitted

What is the timeout limit on our end for a NWM request?

Hnak

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-03-01T18:55:14Z


The job id for my second attempt at the evaluation is 7173193725700182922 (production COWRES).

Again, if either of you know the timeout we apply for WRES NWM service requests, please let me know. Thanks!

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-03-01T18:57:29Z


If I recall correctly, that looks like a request timeout which is at a higher level, as opposed to a socket connect timeout or socket read timeout. Looking.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-03-01T18:58:17Z


In @wres.io.utilities.WebClient@:
@ private static final Duration REQUEST_TIMEOUT = Duration.ofMinutes( 20 );@

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-03-01T19:02:07Z


And looking at @git blame@ where @REQUEST_TIMEOUT@ is added as args to the OkHttp client, I find #69947, and looking over that, I think the above is one of those timeouts, i.e. request timeout.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-03-01T19:02:34Z


So 20 minutes. Gautam tells me they have a 30 minute timeout on their end.

I think the service hit a major hiccup in their response time.

Sadly, from Gautam in the WRES/WRDS meeting, he says they have a very hard time monitoring this, because they don't have Check_MK, and, imho, subpar logging. I asked him if they logged time to fulfill requests, and he said no, but he'll look to add something.

Anyway, sounds like a WRDS issue, but if you want to look into it further, please let me know.

Thanks!

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-03-01T19:05:22Z


I looked for a JUnit test that tests some of this but did not find one that would easily reproduce. I think the quickest way to confirm for sure would be to set that request limit to 1000ms or 500ms in a local development version and then make the request of WRDS, see what stack trace comes up.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-03-01T19:07:26Z


Yes, it's a WRDS issue, but it would be ideal if WRES could retry in this case. I don't think there's an easy way to get WRES to retry here, or it might already be retrying. Was this after several retries or was it with no retries? You would have to look at the full log to find out.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-03-01T19:08:22Z


Looking,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-03-01T19:09:56Z


From looking at the full stdout, it looks like more than one request at that same time failed, and it doesn't look like WRES retried. WRES does retry in almost all situations but there is this narrow band of situations where it is difficult to retry.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-03-01T19:12:36Z


Thanks for looking. I was having a hard time following the chain of events.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-03-01T19:13:44Z


I need to step away for a bit (meeting, then picking up kid from school), but if there is anything you'd like for me to checkout, let me know. My second attempt at the evaluation, 7173193725700182922, is still progressing as expected.

Thanks,

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Jesse (Jesse)
Original Date: 2022-03-01T19:20:38Z


You're welcome, and I think I see what happened now. The stack traces where it shows @InterruptedIOException: null@ is where those threads got interrupted due to that last original/official/real @InterruptedIOException: timeout@ occurred in another thread. So this is the one that propagated and stopped the evaluation and probably happened first, chronologically:

2022-03-01T16:11:52.469+0000 ERROR Main Operation 'execute' completed unsuccessfully
wres.pipeline.InternalWresException: Could not complete project execution
	at wres.pipeline.Evaluator.evaluate(Evaluator.java:323)
	at wres.pipeline.Evaluator.evaluate(Evaluator.java:182)
	at wres.MainFunctions.execute(MainFunctions.java:134)
	at wres.MainFunctions.call(MainFunctions.java:96)
	at wres.Main.main(Main.java:113)
Caused by: wres.pipeline.WresProcessingException: Encountered an error while processing evaluation 'Zzeb1et8QZr6URvZjaZ1QipvOPk': 
	at wres.pipeline.ProcessorHelper.processEvaluation(ProcessorHelper.java:276)
	at wres.pipeline.Evaluator.evaluate(Evaluator.java:298)
	... 4 common frames omitted
Caused by: wres.pipeline.WresProcessingException: Project failed to complete with the following error: 
	at wres.pipeline.ProcessorHelper.processProjectConfig(ProcessorHelper.java:528)
	at wres.pipeline.ProcessorHelper.processEvaluation(ProcessorHelper.java:214)
	... 5 common frames omitted
Caused by: wres.io.reading.IngestException: An ingest task could not be completed.
	at wres.io.Operations.doIngestWork(Operations.java:425)
	at wres.io.Operations.ingest(Operations.java:337)
	at wres.pipeline.ProcessorHelper.processProjectConfig(ProcessorHelper.java:383)
	... 6 common frames omitted
Caused by: java.util.concurrent.CompletionException: wres.io.reading.IngestException: Failed to get web ingest results.
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: wres.io.reading.IngestException: Failed to get web ingest results.
	at wres.io.reading.WebSource.call(WebSource.java:499)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
	... 3 common frames omitted
Caused by: java.util.concurrent.ExecutionException: wres.io.concurrency.WRESRunnableException: Callable task failed
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at wres.io.reading.WebSource.call(WebSource.java:480)
	... 4 common frames omitted
Caused by: wres.io.concurrency.WRESRunnableException: Callable task failed
	at wres.io.concurrency.WRESCallable.call(WRESCallable.java:32)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	... 3 common frames omitted
Caused by: java.io.InterruptedIOException: timeout
	at okhttp3.internal.connection.RealCall.timeoutExit(RealCall.kt:398)
	at okhttp3.internal.connection.RealCall.callDone(RealCall.kt:360)
	at okhttp3.internal.connection.RealCall.messageDone$okhttp(RealCall.kt:309)
	at okhttp3.internal.connection.Exchange.bodyComplete(Exchange.kt:198)
	at okhttp3.internal.connection.Exchange$ResponseBodySource.complete(Exchange.kt:329)
	at okhttp3.internal.connection.Exchange$ResponseBodySource.read(Exchange.kt:305)
	at okio.RealBufferedSource$inputStream$1.read(RealBufferedSource.kt:158)
	at java.base/java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:243)
	at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:159)
	at java.base/java.util.zip.GZIPInputStream.read(GZIPInputStream.java:118)
	at java.base/java.io.FilterInputStream.read(FilterInputStream.java:107)
	at org.apache.commons.compress.utils.IOUtils.copy(IOUtils.java:95)
	at org.apache.commons.compress.utils.IOUtils.copy(IOUtils.java:70)
	at org.apache.commons.compress.utils.IOUtils.toByteArray(IOUtils.java:255)
	at wres.io.reading.waterml.WaterMLBasicSource.deserializeInput(WaterMLBasicSource.java:206)
	at wres.io.reading.waterml.WaterMLBasicSource.ingest(WaterMLBasicSource.java:238)
	at wres.io.reading.waterml.WaterMLBasicSource.saveObservation(WaterMLBasicSource.java:133)
	at wres.io.reading.BasicSource.save(BasicSource.java:39)
	at wres.io.concurrency.IngestSaver.execute(IngestSaver.java:502)
	at wres.io.concurrency.IngestSaver.execute(IngestSaver.java:30)
	at wres.io.concurrency.WRESCallable.call(WRESCallable.java:18)
	... 4 common frames omitted
Caused by: java.net.SocketException: Socket closed
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:183)
	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
	at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:478)
	at java.base/sun.security.ssl.SSLSocketInputRecord.readFully(SSLSocketInputRecord.java:461)
	at java.base/sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:243)
	at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181)
	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
	at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1429)
	at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1396)
	at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:985)
	at okio.InputStreamSource.read(JvmOkio.kt:91)
	at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:129)
	at okio.RealBufferedSource.request(RealBufferedSource.kt:206)
	at okio.RealBufferedSource.require(RealBufferedSource.kt:199)
	at okio.RealBufferedSource.readHexadecimalUnsignedLong(RealBufferedSource.kt:381)
	at okhttp3.internal.http1.Http1ExchangeCodec$ChunkedSource.readChunkSize(Http1ExchangeCodec.kt:431)
	at okhttp3.internal.http1.Http1ExchangeCodec$ChunkedSource.read(Http1ExchangeCodec.kt:410)
	at okhttp3.internal.connection.Exchange$ResponseBodySource.read(Exchange.kt:281)
	... 19 common frames omitted

And then due to the above, which is in the narrow case that is difficult to retry, while it was propagating, the other threads got notified and printed these:

2022-03-01T16:11:52.406+0000 ERROR IngestSaver Callable task failed
wres.io.reading.IngestException: Unrecoverable exception when getting data from https://nwcal-wrds.[host]/api/nwm2.1/v2.0/ops/medium_range/streamflow/nwm_feature_id/22743145,22743619,22743759,22744303,22745373,22751957,2275923,2277053,2277309,2277543,2278259,2279137,2281171,22848113,22850065,22865239,2287321,2287397,22878293,22893139,22900496,22904467,22904813,22904899,22911982/?forecast_type=deterministic&reference_time=%2820220130T00Z%2C20220206T00Z%5D&validTime=all
	at wres.io.utilities.WebClient.tryRequest(WebClient.java:359)
	at wres.io.utilities.WebClient.getFromWeb(WebClient.java:228)
	at wres.io.utilities.WebClient.getFromWeb(WebClient.java:186)
	at wres.io.reading.WrdsNwmReader.call(WrdsNwmReader.java:279)
	at wres.io.reading.wrds.WRDSSource.save(WRDSSource.java:131)
	at wres.io.concurrency.IngestSaver.execute(IngestSaver.java:502)
	at wres.io.concurrency.IngestSaver.execute(IngestSaver.java:30)
	at wres.io.concurrency.WRESCallable.call(WRESCallable.java:18)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.InterruptedIOException: null
	at okhttp3.internal.http2.Http2Stream.waitForIo$okhttp(Http2Stream.kt:660)
	at okhttp3.internal.http2.Http2Stream.takeHeaders(Http2Stream.kt:140)
	at okhttp3.internal.http2.Http2ExchangeCodec.readResponseHeaders(Http2ExchangeCodec.kt:97)
	at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:110)
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
	at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154)
	at wres.io.utilities.WebClient.tryRequest(WebClient.java:337)
	... 11 common frames omitted

On the WRDS side: improve reliability, perhaps a reverse proxy might help if not already added, or varnish cache, or some combination, maybe better tech underneath.
On the WRES side: figure out a way to retry even in these situations where the socket connect is successful, TLS handshake has happened, headers have been received and parsed, and now response body is in flight.

@epag
Copy link
Collaborator Author

epag commented Aug 20, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-03-02T12:35:08Z


I've updated the subject and description to focus this ticket on implementing the retry mechanism.

The second evaluation attempt, 7173193725700182922, succeeded with output generated. I'm going to move up the schedule for the evaluation today to see if it succeeds this morning.

Leaving this ticket as New, lowering to High priority, and leaving it in the Backlog. Thanks,

Hank

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant