[Experiment] Fetch failed error when use fetch_v2 #48

raulgzm · 2024-07-15T15:37:02Z

We are receiving different kind of errors when we call to fetch experiments:

[Experiment] Fetch failed: Remote end closed connection without response
[Experiment] Fetch failed: EOF occurred in violation of protocol (_ssl.c:2427)
[Experiment] Fetch failed: Request-sent
[Experiment] Fetch failed: The read operation timed out
[Experiment] Fetch failed: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1006)

What do those errors mean?

Expected Behavior

fetch_v2() method working properly

Current Behavior

we receive different kind of errors, not always.

Possible Solution

we don't know the root cause

Steps to Reproduce

we don't know.

I can share sentry traces and error information with you if prefer.

Environment

SDK Version: amplitude-experiment = "==1.3.1"
Language Version: 3.11

zhukaihan · 2024-07-16T16:50:25Z

Hey Raul,

Thanks for submitting the issue.

These all looks like issues related to network calls.
When did this start to happen and does it continue to happen?
Is there special configurations of server_url (ex. proxies or EU datacenter)?

Thanks.

raulgzm · 2024-07-18T12:17:51Z

Hey Peter, First of all, thank you so much for your support. It started almost from the beginning when we started to use the library to integrate Amplitude experiments with our backend service. It continues happening, actually this night we had another new issue with a different error: *Fetch failed: Fetch error response: status=502 Bad Gateway* There is no special configuration in from of the server for the connections from the server to the outside world. The trace does not indicate anything interesting because the response is just a 502. What can cause these kinds of errors? Thank you so much for your help.

…

On Tue, 16 Jul 2024 at 18:50, Peter Zhu ***@***.***> wrote: Hey Raul, Thanks for submitting the issue. These all looks like issues related to network calls. When did this start to happen and does it continue to happen? Is there special configurations of server_url (ex. proxies or EU datacenter)? Thanks. — Reply to this email directly, view it on GitHub <#48 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACTKIZRYEO4L625J5Y4GCTDZMVFONAVCNFSM6AAAAABK4ZQTZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZRGM4DSMRQGM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

zhukaihan · 2024-07-18T19:03:03Z

Hi Raul,

Thanks for the response and confirming that it's a persistent error.

We have just identified an issue with one of our CDN vendors which should be the root cause of above errors. We have just stopped routing request to that CDN. The 502 is a new one. It's ~~possible that it's~~ also related to CDN. Please let us know if you still see any errors.

May I just get a bit more extra info:
When (date / time) did you started to use the library / error starts?
And what percentage of the request volume resulted in one of the above errors?

Thanks!

raulgzm · 2024-07-22T08:56:26Z

Hi Peter,

Thank you for your support. Do you know why we have seen the last of these errors 7 hours ago? Maybe the change has not be done yet?

"[Experiment] Fetch failed: Remote end closed connection without response"

We started to use the library 2 months ago.
The first error we saw was 4th June 2024
what percentage of the request volume resulted in one of the above errors? I can`t answer this question with an exact number but we have seen 80 events in Sentry, with a high number of requests in the service. So I would say that the percentage is probably low but has a high impact because we can't use that feature flag in our service properly.

zhukaihan · 2024-07-22T22:11:23Z

The CDN change was applied within minutes.

To ensure these evaluations are served properly, I would suggest to tweak retry parameters to retry these requests.

It's quite unusual that the endpoint just simply close connection without any status code or triggering a timeout. The following questions can help us understand patterns and potential causes.
How frequently does this error occur?
Does the error happen in bursts, or at a consistent rate over the course of a day?
How long does it take for requests to fail, the average latency of failures?

Thanks.

raulgzm · 2024-07-23T09:08:12Z

Hi Peter,

Let me try to answer those questions with helpful information:

How frequently does this error occur?
If you mean the last error "Remote end closed connection without response" we have had in 2 months 157 errors. The last one was on Jul 23, 2:04 AM. The first one was on Jun 4, 12:21 PM. The error seems to happen everyday:

Does the error happen in bursts, or at a consistent rate over the course of a day?
over the course of a day as you can see in the previous image, and the hour when the error occurs is always different. It seems that it does not follow any pattern.

How long does it take for requests to fail, the average latency of failures?
It seems, seeing the traces, fails immediately. Or for some reason, Sentry does not show us the total duration of each trace, so, we can not know that latency on average.

How else can I help out?

zhukaihan · 2024-07-24T00:36:35Z

Thanks for the info!
Looks like we have around 2 errors per day at different times, and worse on several days.
What would be the timezone of the time you mentioned? I'm trying to match numbers up with any of our metrics.
Thanks!

raulgzm · 2024-07-24T08:32:37Z

Hi Peter! anytime! The times are in UTC as you can see here: [image: image.png]

…

On Wed, 24 Jul 2024 at 02:36, Peter Zhu ***@***.***> wrote: Thanks for the info! Looks like we have around 2 errors per day at different times, and worse on several days. What would be the timezone of the time you mentioned? I'm trying to match numbers up with any of our metrics. Thanks! — Reply to this email directly, view it on GitHub <#48 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACTKIZT2AVWI7NAAOBK7UJLZN3ZKRAVCNFSM6AAAAABK4ZQTZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBWGYZTIOBYGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

zhukaihan · 2024-07-24T17:20:28Z

The error happened after we receive a huge spike in traffic. The time is indeed unpredictable. We have been continuously improving the performance of our service during large spikes, and have plans to keep doing so. For the SDK side actions, my suggestion would be to configure retries. Some tweaking may be needed to achieve the optimal results.
Thanks for raising this to us!

zhukaihan closed this as completed Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experiment] Fetch failed error when use fetch_v2 #48

[Experiment] Fetch failed error when use fetch_v2 #48

raulgzm commented Jul 15, 2024

zhukaihan commented Jul 16, 2024

raulgzm commented Jul 18, 2024 via email

zhukaihan commented Jul 18, 2024 •

edited

Loading

raulgzm commented Jul 22, 2024

zhukaihan commented Jul 22, 2024

raulgzm commented Jul 23, 2024

zhukaihan commented Jul 24, 2024

raulgzm commented Jul 24, 2024 via email

zhukaihan commented Jul 24, 2024

[Experiment] Fetch failed error when use fetch_v2 #48

[Experiment] Fetch failed error when use fetch_v2 #48

Comments

raulgzm commented Jul 15, 2024

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Environment

zhukaihan commented Jul 16, 2024

raulgzm commented Jul 18, 2024 via email

zhukaihan commented Jul 18, 2024 • edited Loading

raulgzm commented Jul 22, 2024

zhukaihan commented Jul 22, 2024

raulgzm commented Jul 23, 2024

zhukaihan commented Jul 24, 2024

raulgzm commented Jul 24, 2024 via email

zhukaihan commented Jul 24, 2024

zhukaihan commented Jul 18, 2024 •

edited

Loading