You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While fiddling with Pony 0.24.4, I've discovered that a suspended/hung/machine crashed TCP peer can interfere with the Pony runtime's shutdown. "Interfere" means blocking the runtime's timely shutdown due to network sockets remaining partially open and thus causing the ASIO subsystem to continue being noisy.
General steps to reproduce:
Run a Pony program that opens a TCP socket to a peer.
Suspend the peer's OS process with Control-z/SIGSTOP
The local Pony program attempts to shutdown by dispose()ing of the socket (and any others), stopping all remaining timers from firing, etc.
The Pony program will not exit until the peer OS process is unsuspended (e.g., by SIGCONT), killed, or the peer's host crashes.
Use a program like netcat to listen to TCP port 8888 on the localhost interface, e.g., nc -l 8888
Compile and run hang-bug.pony on the same machine as netcat.
In less than 5 seconds after running hang-bug, press Control-z to suspend the netcat process.
The hang-bug program will not exit until the netcat process is resumed or killed.
The last message printed is Ticker, dispose socket
The hang-bug program will exit 5 seconds after starting if the netcat process's execution is not interfered with.
AFAICT, this delay is a feature of the runtime. TCP sockets are implemented by actors, and reads & writes & dispose() requests with sockets involve async messaging as any other Pony actor. In keeping with synchronous socket behavior of a quick sequence of several writes followed by a close by something written in C for a POSIX OS, if the TCP socket isn't closed prematurely, we expect all bytes written to be sent prior to the close. Any bytes not written due to flow control would be signalled by the return value of thewrite/writev/send/etc system calls.
Pony's async messaging doesn't give the sending actor direct feedback of the system call return status; the TCPConnection actor is responsible for buffering not-yet-sent data and managing yet-to-be-read bytes from the socket.
If any data remains buffered by the TCPConnection actor in the _pending_writev array, the socket will not be closed, and the ASIO subsystem will remain noisy.
TCPConnection needs to observe a read of 0 bytes from the socket to trigger its final closing logic. If the remote peer is suspended/hung/crashed, that event is delayed for an unknown period of time.
In a related area of regular vs. hard socket close, TCPConnection will use the hard close path if dispose() is called and the actor is in muted state. However, if the actor is in throttled state, the hard close path is not taken. I think there's a good argument to make that a hard close is appropriate when in throttled state.
Possible remedies might include:
a. Adding a hard_close() behavior to give a "close the socket NOW" option to socket users.
b. Add an optional timer + per-socket configurable that starts when dispose() is called. If the timer fires, and the socket isn't yet fully closed, then the socket will go the hard close path.
The text was updated successfully, but these errors were encountered:
While fiddling with Pony 0.24.4, I've discovered that a suspended/hung/machine crashed TCP peer can interfere with the Pony runtime's shutdown. "Interfere" means blocking the runtime's timely shutdown due to network sockets remaining partially open and thus causing the ASIO subsystem to continue being noisy.
General steps to reproduce:
SIGSTOP
dispose()
ing of the socket (and any others), stopping all remaining timers from firing, etc.SIGCONT
), killed, or the peer's host crashes.A demo program is at https://gist.github.com/slfritchie/558f44bcef5a29ad4ae9eaf208723bbc. Use as follows:
nc -l 8888
Ticker, dispose socket
The hang-bug program will exit 5 seconds after starting if the netcat process's execution is not interfered with.
AFAICT, this delay is a feature of the runtime. TCP sockets are implemented by actors, and reads & writes &
dispose()
requests with sockets involve async messaging as any other Pony actor. In keeping with synchronous socket behavior of a quick sequence of several writes followed by a close by something written in C for a POSIX OS, if the TCP socket isn't closed prematurely, we expect all bytes written to be sent prior to the close. Any bytes not written due to flow control would be signalled by the return value of thewrite
/writev
/send
/etc system calls.Pony's async messaging doesn't give the sending actor direct feedback of the system call return status; the
TCPConnection
actor is responsible for buffering not-yet-sent data and managing yet-to-be-read bytes from the socket.TCPConnection
actor in the_pending_writev
array, the socket will not be closed, and the ASIO subsystem will remain noisy.TCPConnection
needs to observe a read of 0 bytes from the socket to trigger its final closing logic. If the remote peer is suspended/hung/crashed, that event is delayed for an unknown period of time.TCPConnection
will use the hard close path ifdispose()
is called and the actor is in muted state. However, if the actor is in throttled state, the hard close path is not taken. I think there's a good argument to make that a hard close is appropriate when in throttled state.Possible remedies might include:
a. Adding a
hard_close()
behavior to give a "close the socket NOW" option to socket users.b. Add an optional timer + per-socket configurable that starts when
dispose()
is called. If the timer fires, and the socket isn't yet fully closed, then the socket will go the hard close path.The text was updated successfully, but these errors were encountered: