Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Log errors from ranch:handshake" #12304

Merged
merged 1 commit into from
Sep 13, 2024
Merged

Revert "Log errors from ranch:handshake" #12304

merged 1 commit into from
Sep 13, 2024

Conversation

mkuratczyk
Copy link
Contributor

This reverts commit 620fff2 (PR #11176).

It introduced a regression in another area - a TCP health check, such as the default (with cluster-operator) readinessProbe, on a TLS-enabled instance would log a rabbit_reader crash every few seconds:

tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>   crasher:
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     initial call: rabbit_reader:init/3
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     pid: <0.999.0>
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     registered_name: []
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     exception error: no match of right hand side value {error, handshake_failed}
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>       in function  rabbit_reader:init/3 (rabbit_reader.erl, line 171)

We can revisit the original issue and re-introduce a fix similar to #11176 once ranch returns more detailed errors.

This reverts commit 620fff2.

It intoduced a regression in another area - a TCP health check,
such as the default (with cluster-operator) readinessProbe,
on a TLS-enabled instance would log a `rabbit_reader` crash
every few seconds:
```
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>   crasher:
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     initial call: rabbit_reader:init/3
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     pid: <0.999.0>
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     registered_name: []
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     exception error: no match of right hand side value {error, handshake_failed}
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>       in function  rabbit_reader:init/3 (rabbit_reader.erl, line 171)
```
@mkuratczyk mkuratczyk merged commit f0f7500 into main Sep 13, 2024
240 checks passed
@mkuratczyk mkuratczyk deleted the revert-pr-11176 branch September 13, 2024 15:07
mergify bot pushed a commit that referenced this pull request Sep 13, 2024
This reverts commit 620fff2.

It intoduced a regression in another area - a TCP health check,
such as the default (with cluster-operator) readinessProbe,
on a TLS-enabled instance would log a `rabbit_reader` crash
every few seconds:
```
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>   crasher:
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     initial call: rabbit_reader:init/3
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     pid: <0.999.0>
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     registered_name: []
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     exception error: no match of right hand side value {error, handshake_failed}
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>       in function  rabbit_reader:init/3 (rabbit_reader.erl, line 171)
```

(cherry picked from commit f0f7500)
@lukebakken
Copy link
Collaborator

It seems like we could handle {error, _} here - https://github.com/rabbitmq/rabbitmq-server/pull/12304/files#diff-c6da5f893cf5567c62329bbe62145d8dfd282a3f8ad4f00295459ed4e89c0de0L171-L172

Too risky at this point in time?

@mkuratczyk
Copy link
Contributor Author

I'm not sure how to handle it until ranch exposes the lower-level error. Just ignoring all TLS errors would be wrong in my opinion - I think we should try to distinguish between worrying errors and just TCP connections without any further data.

@lukebakken
Copy link
Collaborator

In this case we know it's a handshake timeout, though. At any rate, this issue is extremely rare so we can revisit it once ninenines/ranch#336 is merged!

@mkuratczyk
Copy link
Contributor Author

Right. Well, as you wish. It should be a simple bugfix to a rare issue. It can go into 4.0.1 or we can wait for a ranch release

michaelklishin added a commit that referenced this pull request Sep 13, 2024
Revert "Log errors from `ranch:handshake`" (backport #12304)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants