Hung TCPConnection can interfere with Pony runtime shutdown #166

slfritchie · 2018-08-16T22:40:38Z

While fiddling with Pony 0.24.4, I've discovered that a suspended/hung/machine crashed TCP peer can interfere with the Pony runtime's shutdown. "Interfere" means blocking the runtime's timely shutdown due to network sockets remaining partially open and thus causing the ASIO subsystem to continue being noisy.

General steps to reproduce:

Run a Pony program that opens a TCP socket to a peer.
Suspend the peer's OS process with Control-z/SIGSTOP
The local Pony program attempts to shutdown by dispose()ing of the socket (and any others), stopping all remaining timers from firing, etc.
The Pony program will not exit until the peer OS process is unsuspended (e.g., by SIGCONT), killed, or the peer's host crashes.

A demo program is at https://gist.github.com/slfritchie/558f44bcef5a29ad4ae9eaf208723bbc. Use as follows:

Use a program like netcat to listen to TCP port 8888 on the localhost interface, e.g., nc -l 8888
Compile and run hang-bug.pony on the same machine as netcat.
In less than 5 seconds after running hang-bug, press Control-z to suspend the netcat process.
The hang-bug program will not exit until the netcat process is resumed or killed.

The last message printed is Ticker, dispose socket

The hang-bug program will exit 5 seconds after starting if the netcat process's execution is not interfered with.

AFAICT, this delay is a feature of the runtime. TCP sockets are implemented by actors, and reads & writes & dispose() requests with sockets involve async messaging as any other Pony actor. In keeping with synchronous socket behavior of a quick sequence of several writes followed by a close by something written in C for a POSIX OS, if the TCP socket isn't closed prematurely, we expect all bytes written to be sent prior to the close. Any bytes not written due to flow control would be signalled by the return value of thewrite/writev/send/etc system calls.

Pony's async messaging doesn't give the sending actor direct feedback of the system call return status; the TCPConnection actor is responsible for buffering not-yet-sent data and managing yet-to-be-read bytes from the socket.

If any data remains buffered by the TCPConnection actor in the _pending_writev array, the socket will not be closed, and the ASIO subsystem will remain noisy.
TCPConnection needs to observe a read of 0 bytes from the socket to trigger its final closing logic. If the remote peer is suspended/hung/crashed, that event is delayed for an unknown period of time.
In a related area of regular vs. hard socket close, TCPConnection will use the hard close path if dispose() is called and the actor is in muted state. However, if the actor is in throttled state, the hard close path is not taken. I think there's a good argument to make that a hard close is appropriate when in throttled state.

Possible remedies might include:

a. Adding a hard_close() behavior to give a "close the socket NOW" option to socket users.
b. Add an optional timer + per-socket configurable that starts when dispose() is called. If the timer fires, and the socket isn't yet fully closed, then the socket will go the hard close path.

The text was updated successfully, but these errors were encountered:

slfritchie · 2018-08-22T20:53:26Z

Notes from today dev sync meeting, which was quite small & so might not have input from Those Who Have An Opinion.

Joe would like to see an RFC developed.
Joe suggested splitting the buffered writes and pending reads into separate remedies.
Perhaps not enough consensus yet on what to do about dispose() when the socket is throttled.

SeanTAllen transferred this issue from ponylang/ponyc May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hung TCPConnection can interfere with Pony runtime shutdown #166

Hung TCPConnection can interfere with Pony runtime shutdown #166

slfritchie commented Aug 16, 2018

slfritchie commented Aug 22, 2018

Hung TCPConnection can interfere with Pony runtime shutdown #166

Hung TCPConnection can interfere with Pony runtime shutdown #166

Comments

slfritchie commented Aug 16, 2018

slfritchie commented Aug 22, 2018