-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signal handler can interrupt IO system call and cause errors #2825
Comments
Just as a comment: if i remember correctly. the Signal is issued in order to terminate the old connection. In order to keep DisconnectFromMds as fast call it is not waiting for the old connection to terminate but rather issues its termination. I think best to fix this would be to harden recv against interrupts by handling the interrupt error code. It should check the active state of teh connection and retry as long as the connection is claimed to be active. DisconnectFromMds would reset the active state of a connection before issuing the signal. |
I think in the case of a standard disconnect I agree that the best fix is to make the system calls more robust against signal interrupts. I noticed that the |
The restart just re-installs the handler once it was triggered, no? .. i think the handler is meant to catch SIGPIPE after connection close closes the fid used for the connection. i think a good way to check if connection is still in use would be to use Connection::state with CON_INLIST as it is cleared in PopConnection mdsplus/mdstcpip/mdsipshr/Connections.c Line 70 in 80eced4
|
that said .. maybe this is why it does not retry.. during teh connection task the state is not yet INLIST .. so one would have to add a dedicated flag like. CON_INIT that would tell recv to retry when connecting |
The issue is not when trying to close a connection that is inactive or dead, it's an interrupt issue on a system call for any other active connection. |
The following patch solves the issue for the program above (but only addresses the one syscall that was affected, it should be generalized) diff --git a/mdstcpip/io_routines/ioroutinesx.h b/mdstcpip/io_routines/ioroutinesx.h
index a01403f32..e04520860 100644
--- a/mdstcpip/io_routines/ioroutinesx.h
+++ b/mdstcpip/io_routines/ioroutinesx.h
@@ -419,7 +419,8 @@ static ssize_t io_recv_to(Connection *c, void *bptr, size_t num, int to_msec)
{
to = timeout;
rf = readfds;
- recved = select(sock + 1, &rf, NULL, NULL, &to);
+ while ((recved = select(sock + 1, &rf, NULL, NULL, &to)) == -1 && errno == EINTR)
+ continue;
#else
struct pollfd fd;
fd.fd = sock; |
.. not sure how this affects SIGINT through user keyboard interrupt. .. What I take from your link
The handler should mark the connection as closed which it does kind of in CloseConncetion (removing it from the list, but we need an extra flag for connect or set the INLIST before connect) recv etc should the only retry if the specific connection was not affected. |
The issue is not in what the handler does. Again the connection is properly closed and this is all fine. The problem is because a handler is registered, when the signal is caught it interrupts the current system call (for another connection) which will fail and not be retried. |
Actually it seems that some protocol do correctly handle interrupts. |
Affiliation
SPC-EPFL
Version(s) Affected
The MDSplus Version(s) affected, if any.
e.g. Client Version: Alpha 7.148.0, Server Version: Stable 7.142.81
Platform(s)
Ubuntu 24.04
Installation Method(s)
Official MDSplus DEB repository
Describe the bug
When starting a new connection just after disconnecting one using one of the "Tunnel" protocols, the login function fails due to one system call being interrupted by the ChildSignalHandler from
IoRoutinesTunnel.c
.To Reproduce
Expected behavior
The new connection should complete successfully.
Screenshots
Additional context
Uncommenting the
usleep
commands after the disconnect step in the program above allows the connection to complete.The text was updated successfully, but these errors were encountered: