Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed return text for ETIMEDOUT/ WSAETIMEDOUT #148

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hjelmeland
Copy link

Changed return text for ETIMEDOUT/ WSAETIMEDOUT to “connection timeout”.

This is needed for the application to be able tell to the difference between timeout of TCP connection (ETIMEDOUT/ WSAETIMEDOUT) and a normal return from a non-blocking socket (error codes EAGAIN/WSAEWOULDBLOCK). Both situations returned the text “timeout”.

Changed return text for ETIMEDOUT/ WSAETIMEDOUT to “connection timeout”.

This is needed for the application to be able tell to the difference between timeout of TCP connection (ETIMEDOUT/ WSAETIMEDOUT) and a normal return from a non-blocking socket (error codes EAGAIN/WSAEWOULDBLOCK). Both situations returned the text “timeout”.
@diegonehab
Copy link
Contributor

Can you clarify in which scenario this difference would be relevant?

@hjelmeland
Copy link
Author

My scenario is a copas.lua based server that sends responses to requests. If the remote device is physically disconnected/powered off before the response is sent, then I want the socket:send() to eventually time out, so the error can be logged, connection closed and cleaned up.

After further testing I have found out that select() is behaving different on the device I am working on compared to current linux systems. On the linux 2.6 based device the problem is that after a ETIMEDOUT event on send(), select() will not return write-readyness for that socket. The result is that the connection is stuck forever in the copas.lua system.

On my PC with current linux, I found out that after a ETIMEDOUT event on send(), select() does return write-readyness for the socket, and when copas calls :send() again, error "closed" is returned, allowing the application to clean up the connection.

Even though this is less problem on recent linux, I still think timeout of the TCP connection itself is fundamentally different from a EAGAIN/WSAEWOULDBLOCK response from the socket API, and should give different responses to the application.

In order to preserve compatiblity with copas.lua I would change the return text on ETIMEDOUT/ WSAETIMEDOUT instead of on EAGAIN/WSAEWOULDBLOCK.

If you want to experiment with TCP timeouts on send on linux, you can do

sudo sysctl -w net.ipv4.tcp_retries2=3

to get timeouts in seconds instead of hours.

@ewestbrook
Copy link
Contributor

Is this still desired and/or necessary?

Copy link
Member

@alerque alerque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, but...

  1. What might this break downstream?
  2. Our internal tests need updating to match.

@diegonehab
Copy link
Contributor

Can’t this be a 2.6 specific fix to solve just the hanging problem? Leaking a change out to every other system seems overkill.

@alerque
Copy link
Member

alerque commented Nov 10, 2023

Returning a different error per platform for the same condition doesn't sound right. And the platform being used on end of the socket may not be the same platform as on the other end. If we do differentiate the error messages for the two different error cases I think it should probably be done universally with no platform or version gating logic.

@diegonehab
Copy link
Contributor

I meant solving select hang. Not returning a different error code in different platforms. I think this is the cleanest solution.

@alerque
Copy link
Member

alerque commented Nov 10, 2023

Ah, I suppose that would make sense if it is possible.

I'm not in a position to test and root this out though so if somebody still has this issue and/or wants to jump in I'll help facilitate a contribution.

@diegonehab
Copy link
Contributor

Even if we needed special code inside an ifdef and an additional flag in the socket structure just to work around this bug, I think it would be preferable to adding a new return that could break a lot of code out there.

@alerque alerque marked this pull request as draft November 10, 2023 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants