-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC gets stuck in SSL handshake exception when running documentation tests from Linux developer systems #257
Comments
The problem also occurred on locale - with
What is weird, that it also fails on demo.evitadb.io which has valid LE certificate sometimes. |
There are a lot of similar bugs being reported on the web - some of them are summarised in this issue: reactor/reactor-netty#907 It looks like the exception itself is misleading, and can occur whenever the server doesn't respond within a 10s interval, and may not be related to SSL problems. You recommend increasing the number of I've also checked that the clients are closed properly in the documentation tests, and we don't have a problem with too many open clients keeping their connection to the server and thus blocking other clients, but this doesn't seem to be the case. So, unfortunately, I have no solution, even after closer examination. |
Yeah, I've came across some of the mentioned issues in past, but also couldn't come up with the solution. Maybe we will have to find a way to set the |
It seems the problem was partly problem of IPv6 and too low limit on our HA Proxy server. Can you @Khertys elaborate more on this problem and we could close the issue now since it ceased to happen in recent documentation test runs. |
The changes on the HA proxy side were related to the following setting: https://docs.haproxy.org/2.6/configuration.html#4.2-maxconn We initially set it to 30. Probably the TCP mode plays its role here (and not HTTP as it is on tomcat) and after exhausting those connections it probably cuts off. We increased it to |
The documentation tests work now both in ipv4 and ipv6 reliably - no changes in code were necessary - just the settings of the HAProxy server. |
If we run all the documentation tests locally, or single documents with a large number of examples - such as fetching.md - locally on a Linux environment, the Java / gRPC tests will start to fail at a certain point. No other Java / gRPC test will succeed after that. The problem doesn't occur on GitHub CI, nor on Windows developer machine. Other protocols (REST/GraphQL) on the same server work. When the test is restarted, gRPC works again for a while - so the server can recover from the problem. The problem is probably server-side, as HAProxy sees the requests and logs that the server didn't respond.
We should investigate this issue to avoid potential problems on production systems in the future. First, we need to enable more logging on the server side, or add logging at some early stage of request processing in gRPC, and observe what happens where.
The text was updated successfully, but these errors were encountered: