Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "sshdriver: Prevent timeout from deadlock" #1288

Conversation

Bastian-Krause
Copy link
Member

Description
This reverts commit b8f059f.

OpenSSH < 8.5 called with -f waits for the stderr pipe to be closed before forking. [1] fixes this behavior.
The labgrid commit to be reverted calls communicate(), relying on a timely fork. Affected OpenSSH versions now run into a TimeoutExpired exception on communicate() leading to an ExecutionError being raised in labgrid.

A wait() with timeout was used since the initial implementation [2]. We wanted to make sure that we don't depend on the state of the pipes, so use of wait() was intentional, as it directly covers the interesting cases:

  • immediate abort (due to a config error or similar)
  • normal startup (parent process exits after fork)
  • hang (via wait() timeout)

Reverting the problematic commit avoids the complexity of having to maintain two different ways to start SSH as suggested in [3].

If timeouts still occur after the revert, we should implement this suggestion [4].

Discussion that lead to this patch can be found in the PR introducing the problematic commit [5] as well as in [3].

The other commit [6] in the PR [5] has a similar approach for labgrid.util.SSHConnection._start_own_master(). It shouldn't be problematic though: The ssh process is not expected to fork as it is not called with -f.

[1] openssh/openssh-portable@396d32f ("upstream: There are lots of place where we want to redirect stdin,")
[2] e4862fa ("SSHDriver: ControlMaster & Driver")
[3] #1278
[4] #1265 (comment)
[5] #1265
[6] f9ca024 ("ssh: Prevent timeout from deadlock")

Checklist

  • PR has been tested

Fixes #1265
Closes #1278

This reverts commit b8f059f.

OpenSSH < 8.5 called with -f waits for the stderr pipe to be closed
before forking. [1] fixes this behavior.
The labgrid commit to be reverted calls communicate(), relying on a
timely fork. Affected OpenSSH versions now run into a TimeoutExpired
exception on communicate() leading to an ExecutionError being raised in
labgrid.

A wait() with timeout was used since the initial implementation [2].
We wanted to make sure that we don't depend on the state of the pipes, so
use of wait() was intentional, as it directly covers the interesting cases:

- immediate abort (due to a config error or similar)
- normal startup (parent process exits after fork)
- hang (via wait() timeout)

Reverting the problematic commit avoids the complexity of having to
maintain two different ways to start SSH as suggested in [3].

If timeouts still occur after the revert, we should implement this
suggestion [4].

Discussion that lead to this patch can be found in the PR introducing
the problematic commit [5] as well as in [3].

The other commit [6] in the PR [5] has a similar approach for
`labgrid.util.SSHConnection._start_own_master()`. It shouldn't
be problematic though: The ssh process is not expected to fork as it is
not called with -f.

[1] OpenSSH: 396d32f3a ("upstream: There are lots of place where we want to redirect stdin,")
[2] e4862fa ("SSHDriver: ControlMaster & Driver")
[3] labgrid-project#1278
[4] labgrid-project#1265 (comment)
[5] labgrid-project#1265
[6] f9ca024 ("ssh: Prevent timeout from deadlock")

Signed-off-by: Bastian Krause <[email protected]>
@codecov
Copy link

codecov bot commented Oct 20, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (0657977) 63.0% compared to head (d425be4) 63.0%.
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master   #1288     +/-   ##
========================================
- Coverage    63.0%   63.0%   -0.1%     
========================================
  Files         160     160             
  Lines       11902   11901      -1     
========================================
- Hits         7502    7501      -1     
  Misses       4400    4400             
Files Coverage Δ
labgrid/driver/sshdriver.py 55.6% <66.6%> (+0.1%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Emantor Emantor merged commit 422abf3 into labgrid-project:master Oct 21, 2023
8 of 9 checks passed
@Bastian-Krause Bastian-Krause deleted the bst/revert-sshdriver-wait-on-pipe-closure branch February 6, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants