Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH: incorrect channel cache update when closing the channel client-side #9102

Open
yarisx opened this issue Nov 22, 2024 · 0 comments
Open
Assignees
Labels
bug Issue is reported as a bug team:PS Assigned to OTP team PS waiting waiting for changes/input from author

Comments

@yarisx
Copy link
Contributor

yarisx commented Nov 22, 2024

Describe the bug
When there are multiple channels open on one connection, and one of the channels gets closed one of the issues can happen:

  • if the user channel handler process closes the channel and immediately exits then it is possible that SSH connection handler (upon receiving 'DOWN' message) removes the corresponding channel entry from the cache before the proper close message exchange completes. In this case if anything arrives from the peer (in our case it is either 'exit-status' or 'exit-signal' message) the crash ensues because of failed cache lookup. This was partially fixed by PR ssh: avoid crash upon exit-signal #8995, but the root cause of the erroneous cache eviction is still there
  • when the previous issue is mitigated by the fix from ssh: avoid crash upon exit-signal #8995 then there is a possibility to send the 'channel-close' message twice: once upon receiving 'exit-signal' from the peer and then - when the peer replies with its own 'channel-close', our side again sends 'channel-close', thus breaking the protocol. With some bad luck this may lead to the peer closing the connection with any channels that are still alive on that connection.

To Reproduce
Create an SSH connection with two channels, run some traffic through both channels, then while one channel is still in active use, make the user channel handler to close the channel and exit immediately. See #8929 for the description of the scenario when the first issue happens. The second issue is quite easy to reproduce - just use a custom OpenSSH subsystem on the server side, then kill -PIPE the subsystem (which can be a shell script), on my machine every second test of this scenario leads to the "protocol error" from OpenSSH because the second 'channel-close' is received for a channel that does not exist on the server.

Expected behavior
When a channel is closed the correct procedure is implemented, that is - each side sends 'channel-close' once. The 'exit-signal' or 'exit-status' are not channel-closing messages.

Affected versions
The OTP versions that are affected by this bug.

Additional context
The example of server-side OpenSSH trace for the second issue:

debug1: session_exit_message: session 0 channel 0 pid 19880 signal 13
debug1: session_exit_message: release channel 0
debug2: channel 0: write failed
debug2: chan_shutdown_write: channel 0: (i3 o0 sock -1 wfd 11 efd -1 [closed])
debug2: channel 0: send eow
debug2: channel 0: output open -> closed
debug2: channel 0: send close
debug3: send packet: type 97
debug3: channel 0: will not send data after close
debug3: receive packet: type 97
debug2: channel 0: rcvd close
debug3: receive packet: type 97
debug2: channel 0: rcvd close
channel 0: protocol error: close rcvd twice
debug3: channel 0: will not send data after close
debug2: channel 0: is dead
debug2: channel 0: gc: notify user
debug1: session_by_channel: session 0 channel 0
debug1: session_close_by_channel: channel 0 child 0

Note the "received packet: type 97" repeated.

@yarisx yarisx added the bug Issue is reported as a bug label Nov 22, 2024
@IngelaAndin IngelaAndin added the team:PS Assigned to OTP team PS label Nov 25, 2024
@u3s u3s self-assigned this Nov 26, 2024
@u3s u3s added the waiting waiting for changes/input from author label Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:PS Assigned to OTP team PS waiting waiting for changes/input from author
Projects
None yet
Development

No branches or pull requests

3 participants