Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: TCP error 0x2714 after Apache reload #1242

Open
Snowknight26 opened this issue Mar 29, 2021 · 24 comments
Open

Linux: TCP error 0x2714 after Apache reload #1242

Snowknight26 opened this issue Mar 29, 2021 · 24 comments

Comments

@Snowknight26
Copy link

Snowknight26 commented Mar 29, 2021

PHP Driver version or file name

5.9.0

SQL Server version

Microsoft SQL Server 2019 (RTM-CU8-GDR) (KB4583459) - 15.0.4083.2 (X64)

Client operating system

RHEL 7.8
Kernel: Linux 3.10.0-1127.18.2.el7.x86_64 x86_64

PHP version

7.4.16 NTS (7.4.16-1.el7.remi)

Microsoft ODBC Driver version

Microsoft ODBC Driver for SQL Server 17.7.1.1 (msodbcsql17.x86_64 17.7.1.1-1)
unixODBC 2.3.7 (unixODBC.x86_64 2.3.7-1.rh)

Table schema

N/A

Problem description

Reloading Apache during a long-running query causes the next query to be executed after child processes come back to generate the following error:
SQLSTATE[08S01]: [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x2714

Occasionally this error occurs instead:
SQLSTATE[08S01]: [Microsoft][ODBC Driver 17 for SQL Server]Communication link failure

Expected behavior and actual behavior

The next query should succeed.

Repro code or steps to reproduce

Create a script with the following:

$pdo = new PDO( 'sqlsrv:Server=server; LoginTimeout=15; MultipleActiveResultSets=false', '', '', [ PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION ] );

$pdo->query( "WAITFOR DELAY '00:00:05'" );

var_dump( $pdo->query( 'SELECT 1 AS Row1' )->fetchAll( PDO::FETCH_ASSOC ) );

Save the script in a location that can be served up by Apache (httpd24-httpd.x86_64 2.4.34-18.el7 in my case) and executed by PHP.
For example: http://localhost/test.php

Continually refresh the page after after it has been served (will take a minimum of 5 seconds due to the WAITFOR).

Reload the Apache service. My RHEL installation required /bin/systemctl reload httpd24-httpd.service.

Once child processes are back, reloading the page (executing the script) will generate the error stated above.

The issue only seems to happen when using PDO_SQLSRV and not any other PDO driver.
For example, I've tried the following and was unable to reproduce:

  • new PDO( 'odbc:server', '', '' ) - using PDO_ODBC instead
  • new PDO( 'dblib:server', '', '' ) - using FreeTDS

Tried the following but the issue still occurred:

  • Disabling pooling
  • Restarting Apache service instead of reloading (/bin/systemctl restart httpd24-httpd.service)
  • ConnectRetryCount > 1 in the DSN (I believe this is Windows only though so no surprise there)

odbcinst -j output:

unixODBC 2.3.7
DRIVERS............: /etc/odbcinst.ini
SYSTEM DATA SOURCES: /etc/odbc.ini
FILE DATA SOURCES..: /etc/ODBCDataSources
USER DATA SOURCES..: /home/user/.odbc.ini
SQLULEN Size.......: 8
SQLLEN Size........: 8
SQLSETPOSIROW Size.: 8

Relevant section of odbcinst.ini:

[ODBC Driver 17 for SQL Server]
Description=Microsoft ODBC Driver 17 for SQL Server
Driver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.7.so.1.1
UsageCount=1
CPTimeout=120

Let me know if anything else is needed.

@yitam
Copy link
Contributor

yitam commented Mar 29, 2021

Thank you @Snowknight26 for bringing this to our attention. We will investigate and get back to you on this.

@yitam
Copy link
Contributor

yitam commented Mar 29, 2021

Hi @Snowknight26 I can't seem to be able to reproduce this issue. I've tried modifying the pdo script slightly and ran different tests. With connection pooling, I refreshed the browser while or after reloading Apache service. I did not see the aforementioned connection error but sometimes the SELECT query failed after the delay had elapsed.

Could you please provide an ODBC trace and Apache error logs? Make sure you add TraceOn=1; to the connection string and also modify /etc/odbcinst.ini like this, as an example:

[ODBC]
Trace = yes
Trace File = /tmp/sql.log

@Snowknight26
Copy link
Author

Snowknight26 commented Mar 30, 2021

@yitam
My mistake, through my trials and errors I mistyped the test script. The PDO error mode should be set to exception. I've updated my initial comment.

The failed SELECT query that you saw I believe is the exact issue, but without the error mode set to exceptions it wasn't possible to see the underlying reason (just "Call to a member function fetchAll() on boolean" or something like that).

I'm still working on providing logs but I believe setting the PDO::ATTR_ERRMODE attribute to PDO::ERRMODE_EXCEPTION should let you reproduce it.

@Snowknight26
Copy link
Author

@yitam
I've attached some log files reproducing the issue. The Apache service was restarted a few times before so you might see some output from prior reproduction attempts, but the issue happened at Tue Mar 30 13:26:52.885195 2021 CST (Unix timestamp of 1617128812) per the Apache logs. Should be able to match up the Unix timestamp in the ODBC trace..

odbc.log - ODBC trace
php_output.log - browser output
httpd_error.log - Apache error log

@yitam
Copy link
Contributor

yitam commented Mar 30, 2021

Thanks @Snowknight26 I can reproduce it now, and I'll also examine the logs you provided. Will get back to you on this after some investigation.

@yitam
Copy link
Contributor

yitam commented Mar 31, 2021

Hi @Snowknight26 I did try pdo_odbc and could also reproduce the same problem:

image

@Snowknight26
Copy link
Author

@yitam

When I was testing PDO_ODBC I believe FreeTDS was being used to provide the ODBC functionality. I was able to configure my environment to work with PDO_ODBC using Microsoft ODBC Driver 17 for SQL Server.

To elaborate further, I changed the test script to this:

<?php
$pdo = new PDO( 'sqlsrv:Server=server; LoginTimeout=15; MultipleActiveResultSets=false', '', '', [ PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION ] );

try {
  var_dump( $pdo->query( 'SELECT 1 AS Row1' )->fetchAll( PDO::FETCH_ASSOC )[0]['Row1'] );
} catch( Throwable $e ) {
  var_dump( $e );
}

try {
  $pdo->query( "WAITFOR DELAY '00:00:05'" );
} catch( Throwable $e ) {
  var_dump( $e );
}

try {
  var_dump( $pdo->query( 'SELECT 2 AS Row1' )->fetchAll( PDO::FETCH_ASSOC )[0]['Row1'] );
} catch( Throwable $e ) {
  var_dump( $e );
}

Then I ran the same test with 5 different DSNs.

Expand each section to see the output.

sqlsrv:Server=server; LoginTimeout=15; MultipleActiveResultSets=false
string(1) "1"
object(PDOException)#3 (8) {
  ["message":protected]=>
  string(90) "SQLSTATE[08S01]: [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x2714"
  ["string":"Exception":private]=>
  string(0) ""
  ["code":protected]=>
  string(5) "08S01"
  ["file":protected]=>
  string(47) "/usr/local/apache/verify/htdocs2/common/test.php"
  ["line":protected]=>
  int(11)
  ["trace":"Exception":private]=>
  array(1) {
    [0]=>
    array(6) {
      ["file"]=>
      string(47) "/usr/local/apache/verify/htdocs2/common/test.php"
      ["line"]=>
      int(11)
      ["function"]=>
      string(5) "query"
      ["class"]=>
      string(3) "PDO"
      ["type"]=>
      string(2) "->"
      ["args"]=>
      array(1) {
        [0]=>
        string(24) "WAITFOR DELAY '00:00:05'"
      }
    }
  }
  ["previous":"Exception":private]=>
  NULL
  ["errorInfo"]=>
  array(6) {
    [0]=>
    string(5) "08S01"
    [1]=>
    int(10004)
    [2]=>
    string(73) "[Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x2714"
    [3]=>
    string(5) "08S01"
    [4]=>
    int(10004)
    [5]=>
    string(68) "[Microsoft][ODBC Driver 17 for SQL Server]Communication link failure"
  }
}
object(PDOException)#4 (8) {
  ["message":protected]=>
  string(85) "SQLSTATE[08S01]: [Microsoft][ODBC Driver 17 for SQL Server]Communication link failure"
  ["string":"Exception":private]=>
  string(0) ""
  ["code":protected]=>
  string(5) "08S01"
  ["file":protected]=>
  string(47) "/usr/local/apache/verify/htdocs2/common/test.php"
  ["line":protected]=>
  int(17)
  ["trace":"Exception":private]=>
  array(1) {
    [0]=>
    array(6) {
      ["file"]=>
      string(47) "/usr/local/apache/verify/htdocs2/common/test.php"
      ["line"]=>
      int(17)
      ["function"]=>
      string(5) "query"
      ["class"]=>
      string(3) "PDO"
      ["type"]=>
      string(2) "->"
      ["args"]=>
      array(1) {
        [0]=>
        string(16) "SELECT 2 AS Row1"
      }
    }
  }
  ["previous":"Exception":private]=>
  NULL
  ["errorInfo"]=>
  array(3) {
    [0]=>
    string(5) "08S01"
    [1]=>
    int(0)
    [2]=>
    string(68) "[Microsoft][ODBC Driver 17 for SQL Server]Communication link failure"
  }
}
odbc:driver={ODBC Driver 17 for SQL Server}; server=server; Trusted_Connection=yes
string(1) "1"
object(PDOException)#3 (8) {
  ["message":protected]=>
  string(209) "SQLSTATE[08S01]: Communication link failure: 10004 [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x2714 (SQLExecute[10004] at /builddir/build/BUILD/php-7.4.16/ext/pdo_odbc/odbc_stmt.c:259)"
  ["string":"Exception":private]=>
  string(0) ""
  ["code":protected]=>
  string(5) "08S01"
  ["file":protected]=>
  string(67) "/usr/local/apache/verify/htdocs2/common/test.php"
  ["line":protected]=>
  int(11)
  ["trace":"Exception":private]=>
  array(1) {
    [0]=>
    array(6) {
      ["file"]=>
      string(67) "/usr/local/apache/verify/htdocs2/common/test.php"
      ["line"]=>
      int(11)
      ["function"]=>
      string(5) "query"
      ["class"]=>
      string(3) "PDO"
      ["type"]=>
      string(2) "->"
      ["args"]=>
      array(1) {
        [0]=>
        string(24) "WAITFOR DELAY '00:00:05'"
      }
    }
  }
  ["previous":"Exception":private]=>
  NULL
  ["errorInfo"]=>
  array(4) {
    [0]=>
    string(5) "08S01"
    [1]=>
    int(10004)
    [2]=>
    string(158) "[Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x2714 (SQLExecute[10004] at /builddir/build/BUILD/php-7.4.16/ext/pdo_odbc/odbc_stmt.c:259)"
    [3]=>
    string(5) "08S01"
  }
}
object(PDOException)#4 (8) {
  ["message":protected]=>
  string(196) "SQLSTATE[08S01]: Communication link failure: 0 [Microsoft][ODBC Driver 17 for SQL Server]Communication link failure (SQLExecute[0] at /builddir/build/BUILD/php-7.4.16/ext/pdo_odbc/odbc_stmt.c:259)"
  ["string":"Exception":private]=>
  string(0) ""
  ["code":protected]=>
  string(5) "08S01"
  ["file":protected]=>
  string(67) "/usr/local/apache/verify/htdocs2/common/test.php"
  ["line":protected]=>
  int(17)
  ["trace":"Exception":private]=>
  array(1) {
    [0]=>
    array(6) {
      ["file"]=>
      string(67) "/usr/local/apache/verify/htdocs2/common/test.php"
      ["line"]=>
      int(17)
      ["function"]=>
      string(5) "query"
      ["class"]=>
      string(3) "PDO"
      ["type"]=>
      string(2) "->"
      ["args"]=>
      array(1) {
        [0]=>
        string(16) "SELECT 2 AS Row1"
      }
    }
  }
  ["previous":"Exception":private]=>
  NULL
  ["errorInfo"]=>
  array(4) {
    [0]=>
    string(5) "08S01"
    [1]=>
    int(0)
    [2]=>
    string(149) "[Microsoft][ODBC Driver 17 for SQL Server]Communication link failure (SQLExecute[0] at /builddir/build/BUILD/php-7.4.16/ext/pdo_odbc/odbc_stmt.c:259)"
    [3]=>
    string(5) "08S01"
  }
}
odbc:driver=FreeTDSODBC;servername=server
string(1) "1"
string(1) "2"
odbc:server
string(1) "1"
string(1) "2"
dblib:host=server
int(1)
int(2)

From the results, we can see the following:

  • PDO_SQLSRV fails after the first query.
  • PDO_ODBC using ODBC Driver 17 fails after the first query.
  • In my environment at least, specifying odbc:server or odbc:driver=FreeTDSODBC;servername=server seem to be equivalent (based on what I can tell from /etc/odbc.ini, /etc/odbcinst.ini and /etc/freetds.conf). I was not able to replicate the issue using PDO_ODBC with FreeTDS.
  • I was not able to replicate the issue using PDO_DBLIB with FreeTDS.

It's puzzling that in the case of the first two DSNs, only the first query succeeds, as reloading Apache is a graceful operation and in theory should wait for the script to finish executing.

My takeaway is that there's an issue with the Microsoft ODBC Driver 17 for SQL Server. Would that be a fair assessment to make?

Do you know if the queries that failed due to TCP errors are fully executed on SQL Server's end, or is it impossible to tell? I'm not sure when the TCP connection is broken.

@yitam
Copy link
Contributor

yitam commented Mar 31, 2021

Thanks @Snowknight26 for your detailed report and analysis. When re-reading your original description, you mentioned:

Once child processes are back, reloading the page (executing the script) will generate the error stated above.

Can you elaborate further what exactly you meant by "once child processes are back"?

In my case, the reload command takes time to complete, but if I wait for it to finish before refreshing the web browser, then I am unable to reproduce this problem.

That is, so far I can only reproduce this if I run the reload command when the long query is being executed. Because of the delay, the query result takes a long time to show after refreshing the web browser.

@Snowknight26
Copy link
Author

My Apache instance is set up in is a high-traffic environment that receives millions of requests a day. As such, a cron job reloads Apache daily during off-peak hours to both prevent memory leaks and rotate logs.

This is done using /bin/systemctl reload httpd24-httpd.service. Based on the httpd24-httpd.service file, that command translates to /opt/rh/httpd24/root/usr/sbin/httpd-scl-wrapper $OPTIONS -k graceful.

Per Apache's documentation, passing the graceful option causes Apache to gracefully restart.

The USR1 or graceful signal causes the parent process to advise the children to exit after their current request (or to exit immediately if they're not serving anything). The parent re-reads its configuration files and re-opens its log files. As each child dies off the parent replaces it with a child from the new generation of the configuration, which begins serving new requests immediately.

With that in mind, what I meant by "once child processes are back" was the last sentence in that quote.

Further Apache documentation states that 'graceful' does the following:

Gracefully restarts the Apache httpd daemon. If the daemon is not running, it is started. This differs from a normal restart in that currently open connections are not aborted.

My interpretation is that the httpd daemon keeps running but child processes are stopped after requests have been processed and are started again.

The reason for the WAITFOR in the test script is to more easily trigger the issue, simulating a typical load.

With FreeTDS (either using PDO_ODBC or PDO_DBLIB), regardless of when the reload is issued in relation to which line of the script is currently being executed, the above test script keeps the connection open until the script has finished, all 3 queries are executed successfully, the script finishes, PHP flushes its buffers, Apache sends the data to the open HTTP connection and all is well. Effectively the page runs successfully and a person making the request sees the success-state output.

In contrast, with (I assume) Microsoft ODBC Driver 17 for SQL Server, when the reload is issued, it's possible that PHP's connection to the SQL Server instance is abruptly closed, causing the 2nd (longest) and subsequent queries in the test script to fail. The script finishes executing after having thrown/handled the exceptions/errors, PHP flushes its buffers, and Apache sends the data to the open HTTP connection. Effectively the page errors and a person making the HTTP request sees the error-state output.

In the case of my environment, with millions of hits per day, numerous PDOExceptions are thrown during the process of reloading the child processes/threads/workers (whatever they are called).

@yitam
Copy link
Contributor

yitam commented Mar 31, 2021

Thanks again, @Snowknight26
We suspect this might be related to #885 but we will do more investigation or see if there is any workaround.

@Snowknight26
Copy link
Author

@yitam

I saw that bug report as I was submitting this one and considered they might be related - maybe something along the lines of the ODBC driver improperly handling SIGUSR1 or whatever other signal that's being sent.

I did a little more testing to try to figure out what the behavior is on SQL Server's end when the error occurs.

To start, I created a table and a stored procedure:

USE [Contoso]
DROP TABLE IF EXISTS dbo.ConnectionTest
CREATE TABLE dbo.ConnectionTest
(
  [Date]    DATETIME2 NOT NULL,
  Iteration INT NOT NULL,
  Process   VARCHAR(8) NOT NULL
)
GO

GRANT INSERT ON dbo.ConnectionTest TO [public]
GO

CREATE OR ALTER PROCEDURE dbo.ConnectionTestINS
AS
BEGIN
  SET NOCOUNT ON;

  DECLARE @Iteration TINYINT = 0;

  WHILE(@Iteration < 100)
  BEGIN
    SET @Iteration = @Iteration + 1;
    INSERT INTO dbo.ConnectionTest([Date], Iteration, Process) VALUES (GETDATE(), @Iteration, 'Database');
    WAITFOR DELAY '00:00:00.050';
  END

  SELECT @Iteration AS Iteration
END
GO

GRANT EXECUTE ON dbo.ConnectionTestINS TO [public]
GO

Next, I modified the test script to the following:

<?php

$pdo = new PDO( 'sqlsrv:Server=server; LoginTimeout=15; MultipleActiveResultSets=false', '', '', [ PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION ] );
$i = 0;
$process = 'Apache';

try {
    $stmt = $pdo->prepare( 'INSERT INTO Contoso.dbo.ConnectionTest([Date], Iteration, Process) VALUES (GETDATE(), ?, ?)' );
    $stmt->execute( [ $i++, $process ] );
}
catch( Throwable $e ) {}

try {
    $stmt->execute( [ $i++, $process ] );
    $pdo->query( 'EXEC Contoso.dbo.ConnectionTestINS' );
    $stmt->execute( [ $i++, $process ] );
}
catch( Throwable $e ) {}

try {
    $stmt->execute( [ $i++, $process ] );
}
catch( Throwable $e ) {}

Next, I ran the test before, running reload several times and hitting the page several times simultaneously.

What I found out is that it's possible for the error to occur during the stored procedure execution. Without implicit or explicit transactions, that means that instead of the SP inserting 100 rows, it's possible it inserts anywhere from 0 to 100.

I'm sure you can see why this is alarming.

Doing the same test with FreeTDS's ODBC showed that the SP always finished executing (thus inserting all 100 rows). I was hoping that the behavior would be the same regardless of which driver is used but that appears to not be the case.

All this makes trying to make the permanent switch in my environment from FreeTDS to this driver a bit of a challenge.

@yitam
Copy link
Contributor

yitam commented Apr 1, 2021

Hi @Snowknight26, thanks again for providing more details.

After some investigation with the ODBC team, we conclude that this is actually by design. The process has been interrupted by a signal (from Apache), which breaks the connection and thus exceptions are thrown. On the other hand, FreeTDS does not handle the signal as it should have, which might get the process stuck in a state that might be problematic. In other words, signaling it may not get it unstuck.

@Snowknight26
Copy link
Author

@yitam
I'm by no means an expert on signaling nor do I know how the ODBC driver behaves in contexts outside of my use case (PHP running as an Apache module) but I'm a little confused by the design decision.

Based on Apache's graceful signal (emphasis mine)

The USR1 or graceful signal causes the parent process to advise the children to exit after their current request (or to exit immediately if they're not serving anything).

it seems that the connection to the database should not exit immediately, but rather only after the current request, which happens anyway during the PDO object destruction.

I'm just going to naively assume the SIGUSR1 signal is passed as follows:
System -> Apache -> PHP module -> PHP extensions/drivers -> [...]

Under normal conditions, I assume the following happens when SIGUSR1 is sent.

  1. The system notifies the Apache daemon
  2. Apache daemon notifies child processes
  3. Child processes stop accepting new connections
  4. Child processes finish processing their current requests (if there are any)
    1. PHP continues executing any running script
    2. PHP finishes executing running scripts
      1. Objects are destroyed
      2. Shutdown handler is called
  5. Child processes exit
  6. Apache daemon re-reads/parses/validates configuration files
  7. Apache daemon starts child processes

If any currently executing PHP scripts are not immediately halted (basically having the equivalent of an 'exit' statement) in 4., why should the database driver immediately close the connection?

I can see how the behavior would make sense in the case of SIGHUP/SIGTERM where the Apache daemon tries to immediately kill off all children, but not in the SIGUSR1 scenario.

Unless I'm mistaken, there seems to be no way to safely wait and close the database connection via a signal? Better yet, is there a way to have the driver ignore SIGUSR1?

On the other hand, FreeTDS does not handle the signal as it should have, which might get the process stuck in a state that might be problematic. In other words, signaling it may not get it unstuck.

That's why timeouts not only in both FreeTDS and the ODBC driver are configurable, but also script execution time in PHP, no? Both of those mechanisms ensure that eventually (after their current request) Apache child processes restart.

@v-chojas
Copy link

v-chojas commented Apr 1, 2021

We see from strace that Apache does not set SA_RESTART when it sets up the SIGUSR1 handler.
This does not match the description to "exit after their current request", rather the intention seems to be the opposite.
In other words this seems a bug on the Apache side, either in their documentation or implementation.

@Snowknight26
Copy link
Author

@v-chojas
Would you mind elaborating on this a little more? I only just started looking through the apr/httpd source code (specifically the prefork module) but don't see anything out of the ordinary as far as handling signals go.

https://github.com/apache/httpd/blob/21f16155c38e406e0a0daaa60a539d66128cf044/server/mpm/prefork/prefork.c#L1065
https://github.com/apache/httpd/blob/21f16155c38e406e0a0daaa60a539d66128cf044/server/mpm_unix.c#L327

prefork_run() calls ap_wait_or_timeout() (which I assume waits for child processes to finish), then calls ap_mpm_safe_kill() with AP_SIG_GRACEFUL (SIGUSR1), which passes the signal to kill(), with the PID of the child process.

What do you mean by SA_RESTART?

It seems odd that only Apache behaves this way. I haven't heard of issues with how its signal handling works.
Why is it that the PHP module continues executing but doesn't immediately halt with SIGUSR1, or any other module for that matter?

@yitam
Copy link
Contributor

yitam commented Apr 2, 2021

To answer your questions, @Snowknight26 , here are the snippets from strace:

6982  recvfrom(10, 0x55f7d6b57f40, 4112, 0, NULL, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
6982  --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=6980, si_uid=0} ---
6982  close(4)                          = 0
6982  rt_sigreturn({mask=[]})           = -1 EINTR (Interrupted system call)
6982  shutdown(10, SHUT_WR)             = 0
6982  close(10)                         = 0

As you can see, no SA_RESTART is used. Please see my reply to issue 885 to understand what this implies.

6982  rt_sigaction(SIGUSR1, NULL, {sa_handler=0x7f96be7ed110, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7f96c768b630}, 8) = 0
6982  rt_sigaction(SIGUSR1, {sa_handler=0x7f96baeb9f50, sa_mask=~[ILL TRAP ABRT BUS FPE KILL SEGV CONT STOP TSTP TTIN TTOU SYS RTMIN RT_1], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f96c768b630}, NULL, 8) = 0

ODBC driver, upon getting SIGUSR1, breaks the connection and returns. FYI, you might want to read Proper way of handling EINTR in libraries , which explains why it may not be a good idea to "swallow" the signal.

@Snowknight26
Copy link
Author

@yitam
I appreciate the insight. I'm going to try to confirm your findings with strace in my particular environment just for the sake of completeness.

Hopefully I'll be able to determine what is calling sigaction() without the SA_RESTART flag instead of signal() (effectively the equivalent of sigaction() with the SA_RESTART flag) and potentially find a viable workaround.

I have a hunch switching from the prefork to the event Apache MPM will do the trick but I'll follow up.

@Snowknight26
Copy link
Author

@yitam

I was able to confirm your findings about the signal handlers being setup without that flag.

8221  10:05:25 rt_sigaction(SIGUSR1, {sa_handler=0x7f3e53623290, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7f3e599a2630}, {sa_handler=0x55c6a78d46f0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f3e599a2630}, 8) = 0
8222  10:05:25 rt_sigaction(SIGUSR1, {sa_handler=0x7f3e53623290, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7f3e599a2630},  <unfinished ...>
8223  10:05:25 rt_sigaction(SIGUSR1, {sa_handler=0x7f3e53623290, sa_mask=[], sa_flags=SA_RESTORER|SA_INTERRUPT, sa_restorer=0x7f3e599a2630},  <unfinished ...>
[...]

Having said that though, I also dug deeper into the Apache Portable Runtime source code (the runtime that drives Apache) and found the following:

https://svn.apache.org/repos/asf/apr/apr/trunk/threadproc/unix/signals.c

/*
 * Replace standard signal() with the more reliable sigaction equivalent
 * from W. Richard Stevens' "Advanced Programming in the UNIX Environment"
 * (the version that does not automatically restart system calls).
 */
APR_DECLARE(apr_sigfunc_t *) apr_signal(int signo, apr_sigfunc_t * func)
{
    struct sigaction act, oact;

    act.sa_handler = func;
    sigemptyset(&act.sa_mask);
    act.sa_flags = 0;
#ifdef SA_INTERRUPT             /* SunOS */
    act.sa_flags |= SA_INTERRUPT;
#endif
#if defined(__osf__) && defined(__alpha)
    /* XXX jeff thinks this should be enabled whenever SA_NOCLDWAIT is defined */

    /* this is required on Tru64 to cause child processes to
     * disappear gracefully - XPG4 compatible 
     */
    if ((signo == SIGCHLD) && (func == SIG_IGN)) {
        act.sa_flags |= SA_NOCLDWAIT;
    }
#endif
#if defined(__NetBSD__) || defined(DARWIN)
    /* ignoring SIGCHLD or leaving the default disposition doesn't avoid zombies,
     * and there is no SA_NOCLDWAIT flag, so catch the signal and reap status in 
     * the handler to avoid zombies
     */
    if ((signo == SIGCHLD) && (func == SIG_IGN)) {
        act.sa_handler = avoid_zombies;
    }
#endif
    if (sigaction(signo, &act, &oact) < 0)
        return SIG_ERR;
    return oact.sa_handler;
}

Note the last line of the comment:

(the version that does not automatically restart system calls)

Seems like this was intentional. All thread handling functionality in Apache uses apr_signal(), the function above - this means my hunch of switching MPMs would not work.

I also checked nginx's source and found the following:

https://github.com/nginx/nginx/blob/1e92a0a4cef98902aed35d7b402a6a402951aba4/src/os/unix/ngx_process.c#L285

ngx_int_t
ngx_init_signals(ngx_log_t *log)
{
    ngx_signal_t      *sig;
    struct sigaction   sa;

    for (sig = signals; sig->signo != 0; sig++) {
        ngx_memzero(&sa, sizeof(struct sigaction));

        if (sig->handler) {
            sa.sa_sigaction = sig->handler;
            sa.sa_flags = SA_SIGINFO;

        } else {
            sa.sa_handler = SIG_IGN;
        }

        sigemptyset(&sa.sa_mask);
        if (sigaction(sig->signo, &sa, NULL) == -1) {
#if (NGX_VALGRIND)
            ngx_log_error(NGX_LOG_ALERT, log, ngx_errno,
                          "sigaction(%s) failed, ignored", sig->signame);
#else
            ngx_log_error(NGX_LOG_EMERG, log, ngx_errno,
                          "sigaction(%s) failed", sig->signame);
            return NGX_ERROR;
#endif
        }
    }

    return NGX_OK;
}

Based on the code, we also see that the SA_RESTART flag is not passed (only SA_SIGINFO).

I was curious how other languages handled signals and stumbled upon a recent commit for Go (Golang):

https://go-review.googlesource.com/c/go/+/232862/

Historically we've assumed that we can install all signal handlers
with the SA_RESTART flag set, and let the system restart slow functions
if a signal is received. Therefore, we don't have to worry about EINTR.

This is only partially true, and we've added EINTR checks already for
connect, and open/read on Darwin, and sendfile on Solaris.

Other cases have turned up in #36644, #38033, and #38836.

Also, #20400 points out that when Go code is included in a C program,
the C program may install its own signal handlers without SA_RESTART.
In that case, Go code will see EINTR no matter what it does.

So, go ahead and check for EINTR. We don't check in the syscall package;
people using syscalls directly may want to check for EINTR themselves.
But we do check for EINTR in the higher level APIs in os and net,
and retry the system call if we see it.

It seems pretty clear that SA_RESTART isn't used frequently and shouldn't be relied on, even in modern code.

So that brings us back to the original issue: at present, there is no way to advise the web server to exit safely while allowing the ODBC connection to remain open to finish the query.

This holds true for both Apache and nginx which account for 70% of all web server usage. If PHP doesn't stop executing when SIGUSR1 is sent, it should be possible to configure either PDO_SQLSRV and/or the ODBC driver to not stop either.

Is it possible to add a configuration option to prevent this behavior?

@v-chojas
Copy link

v-chojas commented Apr 5, 2021

It seems pretty clear that SA_RESTART isn't used frequently and shouldn't be relied on, even in modern code.
No, that is only because they want to be able to use a signal to interrupt the current operation.

@yitam
Copy link
Contributor

yitam commented Apr 5, 2021

Hi @Snowknight26,

SA_RESTART is used to control whether the library function will resume, or it will return failure with error code EINTR. About what you have observed in the other scenarios, as @v-chojas mentioned, the applications want to use a signal to interrupt the current operation.

Is it possible to add a configuration option to prevent this behavior?

The configuration option is to use SA_RESTART, which indicates that you wish for the system to transparently do the retry without throwing the EINTR failure, as already explained in my reply and this message.

As far as signal handling is concerned, this is by design. Microsoft ODBC driver is not going to change its behavior as the other applications that rely on this might be affected.

@Snowknight26
Copy link
Author

@yitam

Just to confirm we're all on the same page, since it's not possible to control the signaling behavior via PHP, one would have to control the behavior via Apache. As it currently stands, Apache (and nginx) is written such that signals are sent without the SA_RESTART flag, thus there being no way of controlling the ODBC driver's behavior, from a PHP developer's perspective.

In the scenario outlined in this issue, the only way to change the ODBC driver's behavior is to modify Apache's (or nginx's) source code, potentially affecting every other module is loaded.

Does that seem correct?

@yitam
Copy link
Contributor

yitam commented Apr 5, 2021

@Snowknight26 yes, it's up to the applications what to do when using signals. That being said, you may want to discuss your question in Apache or nginx forums?

@Snowknight26
Copy link
Author

@yitam
I posted on the Apache mailing list last week but haven't received a single reply. In the mean time I've been researching signals, interruptions, I/O operations, best practices, etc. and have pretty much concluded that the ODBC driver is not behaving correctly.

This will be a long comment so bear with me.

What do various database drivers do?

ODBC Driver current behavior

When the ODBC driver receives an interrupt signal without SA_RESTART set, recvfrom(), the function used to retrieve data from a socket, returns ERESTARTSYS (To be restarted if SA_RESTART is set), which I believe is translated to EINTR in user code, and isn't restarted. With SA_RESTART, according to what you've said, recvfrom() would be restarted transparently.

Why is this happening?

That's how it was programmed. The driver is written to simply ignore EINTR and treat it as a normal error, exiting the socket read operation.

Per the strace, the 4th parameter in recvfrom() is flags, but since it's set to 0, it is blocking.

clock_gettime(CLOCK_MONOTONIC, {tv_sec=39911752, tv_nsec=943805152}) = 0
sendto(28, "\3\1\0\227\0\0\1\0\26\0\0\0\22\0\0\0\2\0\0\0\0\0\0\0\0\0\1\0\0\0\377\377"..., 151, MSG_NOSIGNAL, NULL, 0) = 151
clock_gettime(CLOCK_MONOTONIC, {tv_sec=39911752, tv_nsec=943937027}) = 0
recvfrom(28, 0x558043a7aa90, 4096, 0, NULL, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3961, si_uid=0} ---
close(3)                                = 0
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
futex(0x7f7fca1d2190, FUTEX_WAKE_PRIVATE, 2147483647) = 0
chdir("/")                              = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=39911759, tv_nsec=904611039}) = 0
setitimer(ITIMER_PROF, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}, NULL) = 0
fcntl(25, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
writev(27, [{iov_base="HTTP/1.1 200 OK\r\nDate: Wed, 14 A"..., iov_len=428}, {iov_base="\37\213\10\0\0\0\0\0\4\3\245\214\261\16\2020\24\0w\277\342\215\272\265\4\203\262ia\323\210i"..., iov_len=141}], 2) = 569

Why is this an issue?

The ODBC driver, and subsequently PDO_SQLSRV do not provide PHP with a mechanism of knowing how many bytes of data have been transferred during the query. You receive either all the data (success) or none (error). This will be important later.

FreeTDS

When FreeTDS receives an interrupt signal in poll(), the kernel automatically restarts the call (SA_RESTART is ignored, so poll() is always restarted with an updated timeout struct), and recvfrom() continues to receive data.

[...]
poll([{fd=29, events=POLLIN|POLLOUT}, {fd=28, events=POLLIN}], 2, 300000) = 1 ([{fd=29, revents=POLLOUT}])
sendto(29, "\3\1\0\215\0\0\1\0\26\0\0\0\22\0\0\0\2\0\0\0\0\0\0\0\0\0\1\0\0\0\377\377"..., 141, MSG_NOSIGNAL, NULL, 0) = 141
setsockopt(29, SOL_TCP, TCP_CORK, [0], 4) = 0
setsockopt(29, SOL_TCP, TCP_CORK, [1], 4) = 0
poll([{fd=29, events=POLLIN}, {fd=28, events=POLLIN}], 2, 300000) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3961, si_uid=0} ---
close(3)                                = 0
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
poll([{fd=29, events=POLLIN}, {fd=28, events=POLLIN}], 2, 300000) = 1 ([{fd=29, revents=POLLIN}])
recvfrom(29, "\4\1\0g\0\217\1\0", 8, MSG_NOSIGNAL, NULL, NULL) = 8
poll([{fd=29, events=POLLIN}, {fd=28, events=POLLIN}], 2, 300000) = 1 ([{fd=29, revents=POLLIN}])
recvfrom(29, "\201\1\0\0\0\0\0!\0&\1\tI\0t\0e\0r\0a\0t\0i\0o\0n\0\321\1"..., 95, MSG_NOSIGNAL, NULL, NULL) = 95
[...]

What does this mean?

When FreeTDS receives the interrupt signal, it keeps retrieving data from the socket until either all the data has been transferred or the configurable timeout has been reached. Once again, you receive either all the data (success) or none (error). The database integrity is maintained.

MySQL Connector/ODBC Driver

I don't have a test environment set up so we'll just look at the source code. The connector uses the underlying MySQL server common network functions.

https://github.com/mysql/mysql-server/blob/8.0/sql-common/net_serv.cc#L305

#ifndef MYSQL_SERVER
  /*
    In the  client library, interrupted I/O operations are always retried.
    Otherwise, it's either a timeout or an unrecoverable error.
  */
  retry = vio_should_retry(net->vio);
#else
  /*
    In the server, interrupted I/O operations are retried up to a limit.
    In this scenario, pthread_kill can be used to wake up
    (interrupt) threads waiting for I/O.
  */
  retry = vio_should_retry(net->vio) && ((*retry_count)++ < net->retry_count);
#endif

What does this mean?

When MySQL Connector/ODBC Driver receives the interrupt signal, it keeps retrieving data from the socket until either all the data has been transferred. Once again, you receive either all the data (success) or none (error).

psqlODBC - PostgreSQL ODBC driver

No test environment again so source code it is. psqlODBC uses libpq for network connectivity.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/libpq/pqcomm.c;hb=f5fc2f5b23d1b1dff60f8ca5dc211161df47eda4#l931

/* --------------------------------
 *      pq_recvbuf - load some bytes into the input buffer
 *
 *      returns 0 if OK, EOF if trouble
 * --------------------------------
 */
static int
pq_recvbuf(void)
{
    /* removed for brevity */

    /* Ensure that we're in blocking mode */
    socket_set_nonblocking(false);

    /* Can fill buffer from PqRecvLength and upwards */
    for (;;)
    {
        int         r;

        r = secure_read(MyProcPort, PqRecvBuffer + PqRecvLength,
                        PQ_RECV_BUFFER_SIZE - PqRecvLength);

        if (r < 0)
        {
            if (errno == EINTR)
                continue;       /* Ok if interrupted */

            /*
             * Careful: an ereport() that tries to write to the client would
             * cause recursion to here, leading to stack overflow and core
             * dump!  This message must go *only* to the postmaster log.
             */
            ereport(COMMERROR,
                    (errcode_for_socket_access(),
                     errmsg("could not receive data from client: %m")));
            return EOF;
        }
        if (r == 0)
        {
            /*
             * EOF detected.  We used to write a log message here, but it's
             * better to expect the ultimate caller to do that.
             */
            return EOF;
        }
        /* r contains number of bytes read, so just incr length */
        PqRecvLength += r;
        return 0;
    }
}

What does this mean?

psqlODBC keeps reading data when interrupted (continue statement in the for loop).

Simba Teradata ODBC Driver with SQL Connector

This one isn't open source so let's check their documentation.

https://www.simba.com/products/Teradata/doc/ODBC_InstallGuide/win/content/odbc/td/options/ui/retryoneintr.htm

Retry System Calls (EINTR)
Description
This option specifies whether the driver retries the socket system calls or returns a SQL_ERROR when an EINTR error occurs.

  • Enabled (1): The driver retries the socket system calls.
  • Disabled (0): The driver returns a SQL_ERROR, and the ODBC application becomes responsible for recovering from the interrupted socket system calls.

Default value is 1.

What does this mean?

Simba Teradata ODBC Driver retries the socket system calls, like every other driver so far (but the Microsoft one). The beauty of exposing a configuration option like this is that developers can choose what to do instead of being forced into an error state with no recovery like the Microsoft ODBC driver.

What do non-database driver libraries do?

PHP

Let's check the source for PHP's socket wrappers that are accessible to userland code via fread, fwrite, etc.
https://github.com/php/php-src/blob/cbcfd860264025b50ed1a1f247d0a2481728fe89/main/streams/plain_wrapper.c#L339

Writes:

static ssize_t php_stdiop_write(php_stream *stream, const char *buf, size_t count)
{
	php_stdio_stream_data *data = (php_stdio_stream_data*)stream->abstract;

	assert(data != NULL);

	if (data->fd >= 0) {
                /* snip */
		ssize_t bytes_written = write(data->fd, buf, count);
                /* snip */
		if (bytes_written < 0) {
			if (PHP_IS_TRANSIENT_ERROR(errno)) {
				return 0;
			}
			if (errno == EINTR) {
				/* TODO: Should this be treated as a proper error or not? */
				return bytes_written;
			}
                       /* snip */
		}
		return bytes_written;
	} else {

                /* snip */

		return (ssize_t) fwrite(buf, 1, count, data->file);
	}
}

Reads:

static ssize_t php_stdiop_read(php_stream *stream, char *buf, size_t count)
{
	php_stdio_stream_data *data = (php_stdio_stream_data*)stream->abstract;
	ssize_t ret;

	assert(data != NULL);

	if (data->fd >= 0) {
        /* snip */
		ret = read(data->fd, buf,  PLAIN_WRAP_BUF_SIZE(count));

		if (ret == (size_t)-1 && errno == EINTR) {
			/* Read was interrupted, retry once,
			   If read still fails, give up with feof==0
			   so script can retry if desired */
			ret = read(data->fd, buf,  PLAIN_WRAP_BUF_SIZE(count));
		}

        /* snip */

		ret = fread(buf, 1, count, data->file);

		stream->eof = feof(data->file);
	}
	return ret;
}

What does this mean?

Write operations return the number of bytes written on EINTR. Since you know the length of what you're try to write and you just received the number of bytes actually written, you can try again, as one would expect.
Read operations retry once automatically, transparently to userland code, but userland code can always try again as an appropriate value/error is returned.

When I said "This will be important later." under the Microsoft ODBC driver section, this is what I was referring to. PHP gives the developer the control and mechanism to retry socket operations. The Microsoft ODBC driver does not expose any way for a developer to control the socket operations. It's fine that it doesn't, but because the developer expects 'all or nothing', the query should either fail instantly or succeed fully. Not somewhere inbetween.

libwww-perl

https://rt.cpan.org/Public/Bug/Display.html?id=32356

In order to avoid errors, LWP should check for the reason of a sysread
failure (using $!) and retry reading
if the failure was due to on of the errors EAGAIN or EINTR (Unix error
codes returned by read).

The patch goes on to show code that applies a looping mechanism in the case that EINTR is returned.

What does this mean?

Their sysread/can_read functions automatically retry when interrupted. This is a network library so it makes sense.

Haskell

To the source code we go.
https://hackage.haskell.org/package/socket-0.2.0.0/docs/src/System-Socket.html
Just goign to quote relevant sections.

accept()

                    else if e == eINTR
                      -- On EINTR it is good practice to just retry.
                      then retry

close()

      closeFdWith
        -- The c_close operation may (according to Posix documentation) fails with EINTR or EBADF or EIO.
        -- EBADF: Should be ruled out by the library's design.
        -- EINTR: It is best practice to just retry the operation what we do here.
        -- EIO: Only occurs when filesystem is involved (?).
        -- Conclusion: Our close should never fail. If it does, something is horribly wrong.
        ( const $ fix $ \retry-> do
            i <- c_close fd
            if i < 0 then do
              e <- getErrno
              if e == eINTR 
                then retry
                else throwIO (SocketException e)
            else return ()
        ) fd

recvFrom()

-- | Receive a message on a socket and additionally yield the peer address.
--
--   - Calling `recvFrom` on a `close`d socket throws @EBADF@ even if the former file descriptor has been reassigned.
--   - The operation takes a buffer size in bytes a first parameter which
--     limits the maximum length of the returned `Data.ByteString.ByteString`.
--   - @EAGAIN@, @EWOULDBLOCK@ and @EINTR@ and handled internally and won't be thrown.

What does this mean?

Again, operations are retried.

What is the common consensus for handling EINTR/SA_RESTART?

This depends heavily on the operating system implementation but the general censuses is that you can't rely on SA_RESTART.
When using SA_RESTART, specific system calls are automatically restarted, but that doesn't mean that when SA_RESTART isn't set that functions returning EINTR shouldn't be restarted. Quite the opposite actually.

http://linux-kernel.2935.n7.nabble.com/strace-accept-ERESTARTSYS-and-EINTR-td227496.html

Sorry for being dense, but what is the implication of your comment "Since
it isn't restarted" ? Are you saying that the kernel isn't going to
restart it and will have converted it to EINTR and returned that to
user-space, and that this modified return value is not reported by strace?

Yes, assuming there was a signal handler and it wasn't registered with
SA_RESTART.

https://beej.us/guide/bgnet/html/

Signals tend to cause blocked system calls to return -1 with errno set to EINTR. When you set up a signal handler with sigaction(), you can set the flag SA_RESTART, which is supposed to restart the system call after it was interrupted.

Naturally, this doesn’t always work.

https://stackoverflow.com/a/51220021/1695130

A return value of EINTR means that the function was interrupted by a signal before the function could finish its normal job.

(emphasis mine)

https://www2.eecs.berkeley.edu/Pubs/TechRpts/1999/CSD-99-1056.pdf

Janus: an approach for confinement of untrusted applications
by David A. Wagner
Research Project
Submitted to the Department of Electrical Engineering and Computer Sciences,
University of California at Berkeley, in partial satisfaction of the requirements for the degree
of Master of Science, Plan II.

The fact that an aborted system call returns EINTR to the application presents a potential problem. Some applications are coded in such a way that, if they receive an EINTR error from a system call, they will retry the system call. Thus, if such a application tries to execute a system call that is denied by the security policy, it will get stuck in a retry loop. We detect this problem by noticing when a large number (currently 100) of the same system call with the same arguments are consecutively denied. If this occurs, we assume the traced application is not going to make any further progress, and just kill the application entirely, giving an explanatory message to the user.

The EINTR return code was originally intended for system calls that are interrupted by a signal and should be restarted by the application.

(emphasis mine)

https://en.wikipedia.org/wiki/Unix_philosophy

In these cases Ken Thompson and Dennis Ritchie favored simplicity over perfection. The Unix system would occasionally return early from a system call with an error stating that it had done nothing—the "Interrupted System Call", or an error number 4 (EINTR) in today's systems. Of course the call had been aborted in order to call the signal handler. This could only happen for a handful of long-running system calls such as read(), write(), open(), and select(). On the plus side, this made the I/O system many times simpler to design and understand. The vast majority of user programs were never affected because they did not handle or experience signals other than SIGINT and would die right away if one was raised. For the few other programs—things like shells or text editors that respond to job control key presses—small wrappers could be added to system calls so as to retry the call right away if this EINTR error was raised. Thus, the problem was solved in a simple manner.

(emphasis mine)

https://250bpm.com/blog:12/

Instead of remembering these intricacies you can just remember a simple rule of thumb: When handling EINTR error, check any conditions that may have been altered by signal handlers. Then restart the blocking function.

Additionally, If you are implementing a blocking function yourself, take care to return EINTR when you encounter a signal.

(emphasis mine)

https://www.linuxprogrammingblog.com/all-about-linux-signals?page=show

  • When read(2) waits for data or write(2) waits for stdout to put some data and no data were yet transfered in the call and SIGUSR1 arrives those functions exit with return value of -1. You can distinguish this situation from other errors by reading the value of the errno variable. If it's EINTR it means that the function was interrupted without any data transfered and we can call the function again with the same parameters.
  • Another case is that some data were transfered but the function was interrupted before it finished. In this case the functions don't return an error but a value less that the supplied data size (or buffer size). Neither the return value nor the errno variable tells us that the function was interrupted by a signal, if we want to distinguish this case we need to set some flag in the signal handler (as we do in this example). To continue after interruption we need to call the function again keeping in mind that some data were consumed or read adn we must restart from the right point. In our example only the write(2) must be properly restarted, we use the written variable to track how many bytes were actually written and properly call write(2) again if there are data left in the buffer.
    Remember that not all system calls behave exactly the same way, consult their manual page to make sure.

Reading the sigaction(2) manual page you can think that setting the SA_RESTART flag is simpler that handling system call interruption. The documentation says that setting it will make certain system calls automatically restartable across signals. It's not specified which calls are restarted. This flag is mainly used for compatibility with older systems, don't use it.

(emphasis mine)

https://kirste.userpage.fu-berlin.de/chemnet/use/info/libc/libc_21.html

Typically, POSIX applications that use signal handlers must check for EINTR after each library function that can return it, in order to try the call again. Often programmers forget to check, which is a common source of error.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=568149

zlib: please allow resuming after EINTR

Hi,

zlib's gzFile interface provides no way I can see to recover from
EINTR.

Context:

When library code makes syscalls, I live in constant danger that the
application may have installed a signal handler with SA_RESTART
disabled.

(emphasis mine)

Fenner, B., Stevens, W. R., Rudoff, A. M. (2004). UNIX Network Programming: The sockets networking API. Spain: Addison-Wesley.

Handling Interrupted System Calls
We used the term "slow system call" to describe accept, and we use this term for any system call that can block forever. That is, the system call need never return. Most networking functions fall into this category. For example, there is no guarantee that a server's call to accept will ever return, if there are no clients that will connect to the server. Similarly, our server's call to read in Figure 5.3 will never return if the client never sends a line for the server to echo. Other examples of slow system calls are reads and writes of pipes and terminal devices. A notable exception is disk I/O, which usually returns to the caller (assuming no catastrophic hardware failure).

The basic rule that applies here is that when a process is blocked in a slow system call and the process catches a signal and the signal handler returns, the system call can return an error of EINTR. Some kernels automatically restart some interrupted system calls. For portability, when we write a program that catches signals (most concurrent servers catch SIGCHLD), we must be prepared for slow system calls to return EINTR. Portability problems are caused by the qualifiers "can" and "some," which were used earlier, and the fact that support for the POSIX SA_RESTART flag is optional. Even if an implementation supports the SA_RESTART flag, not all interrupted system calls may automatically be restarted. Most Berkeley-derived implementations, for example, never automatically restart select, and some of these implementations never restart accept or recvfrom.

To handle an interrupted accept, we change the call to accept in Figure 5.2, the beginning of the for loop, to the following:

     for ( ; ; ) {
         clilen = sizeof (cliaddr);
         if ( (connfd = accept (listenfd, (SA *) &cliaddr, &clilen)) < 0) {
             if (errno == EINTR)
                 continue;         /* back to for () */
             else
                 err_sys ("accept error");
        }

Notice that we call accept and not our wrapper function Accept, since we must handle the failure of the function ourselves.

What we are doing in this piece of code is restarting the interrupted system call. This is fine for accept, along with functions such as read, write, select, and open.

(emphasis mine)

https://cis.temple.edu/~ingargio/cis307/readings/signals.html

To use signals well is very tricky. Signals can interfere with system services. For example, what happens if we receive an exception while executing a system call? The answer depends on the specific call. Usually time consuming operations such as IO operations are interrupted by a signal. For example if we are reading from a file we may be interrupted. We find out about it by checking errno for the value EINTR. For example:

      bytesread = read(input_fd, buf, BLKSIZE);
      if (bytesread < 0) 
         if (errno == EINTR)
            printf("The read was interrupted. You can try to read again\n");
         else 
            printf("Error in read. Don't know what to do about it\n");

http://www.microhowto.info/howto/reap_zombie_processes_using_a_sigchld_handler.html

When an operating system function is interrupted by a signal the default behaviour is to return immediately (either with the error EINTR, or reporting partial completion if that is possible). This creates a need for such functions to be wrapped in a loop for the purpose of handling EINTR,

https://programmer.help/blogs/linux-interrupted-system-calls.html

Portable code must explicitly handle the error return of key functions. When a function fails and errno equals EINTR, it can be handled according to actual needs, such as restarting the function.

(emphasis mine)

https://perldoc.perl.org/perlipc#Restartable-system-calls

On systems that supported it, older versions of Perl used the SA_RESTART flag when installing %SIG handlers. This meant that restartable system calls would continue rather than returning when a signal arrived. In order to deliver deferred signals promptly, Perl 5.8.0 and later do not use SA_RESTART. Consequently, restartable system calls can fail (with $! set to EINTR) in places where they previously would have succeeded.

The default :perlio layer retries read, write and close as described above; interrupted wait and waitpid calls will always be retried.

(emphasis mine)

https://news.ycombinator.com/item?id=5089014

The right thing is almost always to retry the syscall. Syscalls on Unix return EINTR because it makes the kernel simpler, which was a key design goal in Unix[1]. If you need to do something when a signal fires, you do it in a signal handler (relying on EINTR instead of a signal handler is error-prone because if the signal fires between syscalls you lose it).

(emphasis mine)

https://blog.reverberate.org/2011/04/eintr-and-pc-loser-ing-is-better-case.html

if you’re implementing a library you can never know if the main application will use signals or not, so any library that wants to be robust will have to wrap these system calls in a retry loop.
[...]
Conclusion
The whole thing did leave a rather large wart, though -- all Unix programs have to wrap these system calls in an EINTR retry loop unless they can be absolutely sure the process will never catch signals that don't have SA_RESTART set. So there is a price to pay for this incremental evolution.

(emphasis mine)

https://lwn.net/Articles/414618/

Only lightly touched on here is the horrible-ness of signals interrupting system calls. The ability to set SA_RESTART is better than nothing but it has significant problems, in particular that it's not implemented appropriately everywhere (Linux is pretty good but Solaris, for example, is pretty bad).
This means that if, for example, you set a signal handler for SIGCHLD you have major problems since SA_RESTART can't be considered reliable (portably).

dotnet/runtime#48663

It appears that on MacOS Big Sur, open can sometimes return EINTR if it is interrupted by a signal even if the signal has a handler installed with SA_RESTART.

  • MonoVM thread suspend/resume sometimes (when using posix instead of Mach) will use suspend/resume signals with the SA_RESTART flags. (although this is not the default - we prefer to use Mach suspend/resume operations)
  • The System.Native PAL installs some signal handlers with SA_RESTART

We should make sure that all our calls to open handle EINTR.

(the irony, considering this is .NET)

And a follow-up:
https://lore.kernel.org/git/YDiRywyld%[email protected]/

We can work around it by retrying open() calls that get EINTR, just as
we do for read(), etc.
Since we don't ever want to interrupt an open()
call, we can get away with just redefining open, rather than insisting
all callsites use xopen().

(emphasis mine)

https://softwareengineering.stackexchange.com/a/409060

Always leave the decision of how to handle EINTR to the user, and make it easy to resume the operation as appropriate.

It's not possible for the user (aka PHP developer using PDO_SQLSRV) to resume.

https://bugzilla.redhat.com/show_bug.cgi?id=1427500

Bug 1427500 - util-linux: script does not retry on EINTR when logging command output

This should be fixed by switching to a direct write system call with an EINTR retry loop, or installing the signal handler with SA_RESTART.

(emphasis mine)

https://people.gnome.org/~federico/blog/rust-libstd-syscalls-and-errors.html

Many system calls can return EINTR, which means "interrupted system call", which means that something interrupted the kernel while it was doing your system call and it returned control to userspace, with the syscall unfinished. For example, your process may have received a Unix signal (e.g. you send it SIGSTOP by pressing Ctrl-Z on a terminal, or you resized the terminal and your process got a SIGWINCH). Most of the time EINTR means simply that you must retry the operation: if you Control-Z a program to suspend it, and then fg to continue it again; and if the program was in the middle of open()ing a file, you would expect it to continue at that exact point and to actually open the file. Software that doesn't check for EINTR can fail in very subtle ways!

(emphasis mine)

https://lists.x.org/archives/xorg-devel/2011-April/021242.html

Retrying the ioctl as long as it fails with errno == EINTR fixes the
problem and allows server regenerations to trigger VT switches that
actually succeed.

(Signed off by developer at nvidia.com)

https://golang.org/doc/go1.14#runtime

A consequence of the implementation of preemption is that on Unix systems, including Linux and macOS systems, programs built with Go 1.14 will receive more signals than programs built with earlier releases. This means that programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors. Those programs will have to handle those errors in some way, most likely looping to try the system call again.

(emphasis mine)

python-ldap/python-ldap#242

python-ldap doesn't appear to retry ldap calls that fail with EINTR.

https://www.winehq.org/pipermail/wine-devel/2014-February/103112.html

The callers of WS2_recv() and WS2_send() already have tests for EINTR and EAGAIN. The strategy for dealing with those should probably exist in just one place. For my part, I think that all interruptible syscalls should be in a loop that immediately retries for EINTR, but EAGAIN should usually go back to some select(), poll(), kevent() or similar wait function.

(emphasis mine)

Linux kernel automatically retrying in various libraries:
lockd: https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/fs/lockd/svc.c#L174
CIFS: https://github.com/torvalds/linux/blob/17e7124aad766b3f158943acb51467f86220afe9/fs/cifs/connect.c#L551
NFS: https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/fs/nfs/callback.c#L91
Intel DRM: https://github.com/torvalds/linux/blob/fab0fca1da5cdc48be051715cd9787df04fdce3a/include/drm/drm_mode_config.h#L196
RamFS: https://github.com/torvalds/linux/blob/8b83369ddcb3fb9cab5c1088987ce477565bb630/init/initramfs.c#L28
https://github.com/torvalds/linux/blob/144c79ef33536b4ecb4951e07dbc1f2b7fa99d32/tools/perf/util/evlist.c#L1930

Conclusion

I think it's pretty fair to say that code should generally retry interrupted operations regardless of what signal handler flags are set, solely because those can't be relied on.

Because a non-transactional database query can be interrupted with the current ODBC driver, the database can be left in an invalid state, with no ability to recover from PHP's end. PHP has no control over Apache or any system it is running on as far as signals go (disregarding CLI), so it only makes sense to at minimum give the developer control over the driver's behavior. A perfect example of this is the Simba Teradata ODBC Driver. They expose a flag that userland code can change, giving developers more control over connections, ensuring database integrity.

At minimum, the Microsoft ODBC driver should do the same. At best, it should always retry system calls. As quoted above, your colleagues working on the .NET framework for Linux would agree.

@David-Engel
Copy link
Contributor

@Snowknight26 Thanks for the extensive research on this. I'm taking a bit of a deeper dive into signal handling in *nix and seeing if this behavior is something we want to change in the ODBC driver or at least make configurable. I can't make any promises, but from what I've read so far, it does seem like something we should seriously consider for the ODBC driver.

Regards,
David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants