-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed return text for ETIMEDOUT/ WSAETIMEDOUT #148
base: master
Are you sure you want to change the base?
Conversation
Changed return text for ETIMEDOUT/ WSAETIMEDOUT to “connection timeout”. This is needed for the application to be able tell to the difference between timeout of TCP connection (ETIMEDOUT/ WSAETIMEDOUT) and a normal return from a non-blocking socket (error codes EAGAIN/WSAEWOULDBLOCK). Both situations returned the text “timeout”.
Can you clarify in which scenario this difference would be relevant? |
My scenario is a copas.lua based server that sends responses to requests. If the remote device is physically disconnected/powered off before the response is sent, then I want the socket:send() to eventually time out, so the error can be logged, connection closed and cleaned up. After further testing I have found out that select() is behaving different on the device I am working on compared to current linux systems. On the linux 2.6 based device the problem is that after a ETIMEDOUT event on send(), select() will not return write-readyness for that socket. The result is that the connection is stuck forever in the copas.lua system. On my PC with current linux, I found out that after a ETIMEDOUT event on send(), select() does return write-readyness for the socket, and when copas calls :send() again, error "closed" is returned, allowing the application to clean up the connection. Even though this is less problem on recent linux, I still think timeout of the TCP connection itself is fundamentally different from a EAGAIN/WSAEWOULDBLOCK response from the socket API, and should give different responses to the application. In order to preserve compatiblity with copas.lua I would change the return text on ETIMEDOUT/ WSAETIMEDOUT instead of on EAGAIN/WSAEWOULDBLOCK. If you want to experiment with TCP timeouts on send on linux, you can do sudo sysctl -w net.ipv4.tcp_retries2=3 to get timeouts in seconds instead of hours. |
Is this still desired and/or necessary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, but...
- What might this break downstream?
- Our internal tests need updating to match.
Can’t this be a 2.6 specific fix to solve just the hanging problem? Leaking a change out to every other system seems overkill. |
Returning a different error per platform for the same condition doesn't sound right. And the platform being used on end of the socket may not be the same platform as on the other end. If we do differentiate the error messages for the two different error cases I think it should probably be done universally with no platform or version gating logic. |
I meant solving select hang. Not returning a different error code in different platforms. I think this is the cleanest solution. |
Ah, I suppose that would make sense if it is possible. I'm not in a position to test and root this out though so if somebody still has this issue and/or wants to jump in I'll help facilitate a contribution. |
Even if we needed special code inside an ifdef and an additional flag in the socket structure just to work around this bug, I think it would be preferable to adding a new return that could break a lot of code out there. |
Changed return text for ETIMEDOUT/ WSAETIMEDOUT to “connection timeout”.
This is needed for the application to be able tell to the difference between timeout of TCP connection (ETIMEDOUT/ WSAETIMEDOUT) and a normal return from a non-blocking socket (error codes EAGAIN/WSAEWOULDBLOCK). Both situations returned the text “timeout”.