Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal handler can interrupt IO system call and cause errors #2825

Open
merlea opened this issue Nov 12, 2024 · 9 comments
Open

Signal handler can interrupt IO system call and cause errors #2825

merlea opened this issue Nov 12, 2024 · 9 comments
Labels
bug An unexpected problem or unintended behavior tool/mdsip Relates to one of the MDSplus over IP tools (mdsipd, mdsipsd, mdsip_server)

Comments

@merlea
Copy link
Contributor

merlea commented Nov 12, 2024

Affiliation
SPC-EPFL

Version(s) Affected
The MDSplus Version(s) affected, if any.
e.g. Client Version: Alpha 7.148.0, Server Version: Stable 7.142.81

Platform(s)
Ubuntu 24.04

Installation Method(s)
Official MDSplus DEB repository

Describe the bug
When starting a new connection just after disconnecting one using one of the "Tunnel" protocols, the login function fails due to one system call being interrupted by the ChildSignalHandler from IoRoutinesTunnel.c.

To Reproduce

  1. Here is a minimal C program:
#include <stdio.h>
#include <unistd.h>

#include "ipdesc.h"

int main(void)
{
  char *TCPuri = "tcvdata.epfl.ch";
  char *SSHuri = "ssh://tcvdata.epfl.ch";

  int TCPnsock;
  int SSHnsock;

  /* Connect to SSH server */
  printf("Opening connection to SSH MDS server\n");
  SSHnsock = ConnectToMds(SSHuri);
  if (SSHnsock == -1)
    printf("Could not open connection to SSH MDS server\n");

  /* Disconnect SSH server */
  printf("Closing connection to SSH MDS server\n");
  DisconnectFromMds(SSHnsock);

/*
  usleep(10000);
  usleep(10000);
*/

  /* Connect to TCP server */
  printf("Opening connection to TCP MDS server\n");
  TCPnsock = ConnectToMds(TCPuri);
  if (TCPnsock == -1)
    printf("Could not open connection to TCP MDS server\n");
}
  1. Compile using
gcc test.c -o test -I $MDSPLUS_DIR/include -L $MDSPLUS_DIR/lib -l MdsIpShr
  1. Run it
./test
  1. See error below:
Opening connection to SSH MDS server
Closing connection to SSH MDS server
Opening connection to TCP MDS server
E, 2759129:2759129, 1731410141.350875194: mdstcpip/mdsipshr/GetMdsMsg.c:63 get_bytes_to()           Connection(id=-1, state=0x00, protocol='tcp', info_name='tcp', version=0, user='(null)') error 0/48: Interrupted system call
Error during login: recv: Interrupted system call
Could not open connection to TCP MDS server

Expected behavior
The new connection should complete successfully.

Screenshots

Additional context
Uncommenting the usleep commands after the disconnect step in the program above allows the connection to complete.

@merlea merlea added the bug An unexpected problem or unintended behavior label Nov 12, 2024
@zack-vii
Copy link
Contributor

Just as a comment: if i remember correctly. the Signal is issued in order to terminate the old connection. In order to keep DisconnectFromMds as fast call it is not waiting for the old connection to terminate but rather issues its termination. I think best to fix this would be to harden recv against interrupts by handling the interrupt error code. It should check the active state of teh connection and retry as long as the connection is claimed to be active. DisconnectFromMds would reset the active state of a connection before issuing the signal.

@merlea
Copy link
Contributor Author

merlea commented Nov 12, 2024

I think in the case of a standard disconnect ChildSignalHandler actually doesn't do much since the destruction of the connection has already started in CloseConnection which is called by DisconnectFromMds. It is truly useful when the tunnel is lost due to an unforeseen event either on the client or server side.

I agree that the best fix is to make the system calls more robust against signal interrupts. I noticed that the SA_RESTART flag was already used when registering ChildSignalHandler but apparently some systems call can't be restarted. This Unix StackExchange answer offers some explanation and possible handling scheme.

@zack-vii
Copy link
Contributor

zack-vii commented Nov 12, 2024

The restart just re-installs the handler once it was triggered, no? .. i think the handler is meant to catch SIGPIPE after connection close closes the fid used for the connection.

i think a good way to check if connection is still in use would be to use Connection::state with CON_INLIST as it is cleared in PopConnection
c->state &= CON_ACTIVITY; // clears INLIST

c->state &= CON_ACTIVITY; // clears INLIST

@zack-vii
Copy link
Contributor

that said .. maybe this is why it does not retry.. during teh connection task the state is not yet INLIST .. so one would have to add a dedicated flag like. CON_INIT that would tell recv to retry when connecting

@merlea
Copy link
Contributor Author

merlea commented Nov 12, 2024

SA_RESTART is meant to restart the system call, not reinstall the handler.

The issue is not when trying to close a connection that is inactive or dead, it's an interrupt issue on a system call for any other active connection.

@merlea
Copy link
Contributor Author

merlea commented Nov 12, 2024

The following patch solves the issue for the program above (but only addresses the one syscall that was affected, it should be generalized)

diff --git a/mdstcpip/io_routines/ioroutinesx.h b/mdstcpip/io_routines/ioroutinesx.h
index a01403f32..e04520860 100644
--- a/mdstcpip/io_routines/ioroutinesx.h
+++ b/mdstcpip/io_routines/ioroutinesx.h
@@ -419,7 +419,8 @@ static ssize_t io_recv_to(Connection *c, void *bptr, size_t num, int to_msec)
     {
       to = timeout;
       rf = readfds;
-      recved = select(sock + 1, &rf, NULL, NULL, &to);
+      while ((recved = select(sock + 1, &rf, NULL, NULL, &to)) == -1 && errno == EINTR)
+        continue;
 #else
     struct pollfd fd;
     fd.fd = sock;

@zack-vii
Copy link
Contributor

zack-vii commented Nov 12, 2024

.. not sure how this affects SIGINT through user keyboard interrupt. .. What I take from your link

Because you may want to do something upon receiving a signal, and you cannot do much from a signal handler; anything using malloc() or stdio (even printf()) is out of the question. So you have to handle the interrupts in the main loop of the program, and to be able to do that, you should break somehow from a blocking read() (a read() can block even after a poll() has returned a fd as ready for reading).

The handler should mark the connection as closed which it does kind of in CloseConncetion (removing it from the list, but we need an extra flag for connect or set the INLIST before connect) recv etc should the only retry if the specific connection was not affected.
if we need to distinguish between SIGINT and SIGPIPE/SIGCHLD here i think shoudl at least be tested . e.g. in tditest. connecting to a dummy server and Ctrl+C to see it the connect is still properly interrupted.

@merlea
Copy link
Contributor Author

merlea commented Nov 12, 2024

The issue is not in what the handler does. Again the connection is properly closed and this is all fine. The problem is because a handler is registered, when the signal is caught it interrupts the current system call (for another connection) which will fail and not be retried.

@merlea
Copy link
Contributor Author

merlea commented Nov 12, 2024

Actually it seems that some protocol do correctly handle interrupts. EINTR is correctly caught and retried in io_recv_to from ioroutines_pipes.h but not in io_recv_to from ioroutinestcp.h.

@merlea merlea changed the title ChildSignalHandler can interrupt IO system call and cause errors Signal handler can interrupt IO system call and cause errors Nov 18, 2024
@WhoBrokeTheBuild WhoBrokeTheBuild added the tool/mdsip Relates to one of the MDSplus over IP tools (mdsipd, mdsipsd, mdsip_server) label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior tool/mdsip Relates to one of the MDSplus over IP tools (mdsipd, mdsipsd, mdsip_server)
Projects
None yet
Development

No branches or pull requests

3 participants