added ssh keepalive and pseudo terminal support to orchestrator #804
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This change adds two parameters to the
[ssh]
section inmedusa.ini
:keepalive_seconds
: seconds between ssh keepalive messages to the ssh server. Default to 60 seconds. Due to a limitation in parallel-ssh, ifcert_file
is defined, thenkeepalive_seconds
will be ignored and no keep alive messages will be sentuse_pty
: Boolean: Allocates a pseudo-terminal. Default to False. It's useful if sudo settings require a ttydetails about the change for keepalive_seconds
parallel-ssh
includes two implementations of a parallel ssh client:ssh2-python
(libssh2
) .ssh-python
(libssh
)Medusa uses implementation (2) which has an important limitation when it comes to ssh keepalive messages as
libssh
client does not implement it.libssh
can understand these messages on its server implementation, but its client side (what parallel-ssh uses), does not physically has the code to generate the keepalive messages. The library is also documented as using~/.ssh/config
(an openssh config file) but that only applies to a small subset of parameters, and crucially excludesServerAlive*
parameters that control the way a client sends those messages. The keepalive implementation in parallel-ssh when using libssh is explicitly dummy.in implementation (1),
libssh2
client code has the machinery to send client generated keep alive messages and that can be controlled programmatically only (libssh2
does not care about~/.ssh/config
).parallel-ssh
uses it, and theParallelSSHClient
constructor has akeepalive_seconds=60
parameter.When enabling debug (
LogLevel DEBUG3
) on/etc/ssh/sshd_config
the messages are recorded asTo see the difference, the keepalive messages coming from an openssh ssh client are recorded as:
The downside of using
libssh2
is that it does not implement certificate-based ssh authentication, which Medusa uses (parameterssh.cert_file
).Therefore, the changes switches using (1) unless
ssh.cert_file
has a value in which case it uses (2) instead and keep-alive won't be generated.testing keepalive
All the
sshd
servers need to be configured (/etc/ssh/sshd_config
) with values like towhich would close an ssh connection after 60s if it detects no "activity" on it, where activity looks to be either a keystroke or a keepalive message coming from the ssh client. If you run a cluster backup that takes several minutes to complete, the command will fail after ~
ClientAliveInterval
seconds withIf you then re-do the backup with a Medusa that includes this patch, and set a keep_alive of 20s, the backup will go through and you should also see the libssh2 keepalive messages in the sshd logs (
/var/log/secure
often) spaced 20s from one another.This change should take care of medusa issue 689 SSH Timeout ( Error on first backup-cluster or restore ).
details about the change for use-pty
In restrictive environments, sudoers can be required to have a terminal associated to what they run. This is the parameter
Defaults requiretty
in/etc/sudoers*
. If this setting cannot be changed for the user running Medusa command, then when executing a cluster backup with sudo (or anything distributed requiring sudo) the operation will fail withsetting
use_pty=true
will avoid this problem.Both implementations (1) and (2) accept the parameter.