Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added ssh keepalive and pseudo terminal support to orchestrator #804

Merged
merged 1 commit into from
Oct 14, 2024

Conversation

alberto-bortolan
Copy link
Contributor

Summary

This change adds two parameters to the [ssh] section in medusa.ini:

  • keepalive_seconds : seconds between ssh keepalive messages to the ssh server. Default to 60 seconds. Due to a limitation in parallel-ssh, if cert_file is defined, then keepalive_seconds will be ignored and no keep alive messages will be sent
  • use_pty : Boolean: Allocates a pseudo-terminal. Default to False. It's useful if sudo settings require a tty

details about the change for keepalive_seconds

parallel-ssh includes two implementations of a parallel ssh client:

  1. Native Parallel Client - Based on ssh2-python (libssh2) .
  2. ssh-python Parallel Client - Based on ssh-python (libssh)

Medusa uses implementation (2) which has an important limitation when it comes to ssh keepalive messages as libssh client does not implement it. libssh can understand these messages on its server implementation, but its client side (what parallel-ssh uses), does not physically has the code to generate the keepalive messages. The library is also documented as using ~/.ssh/config (an openssh config file) but that only applies to a small subset of parameters, and crucially excludes ServerAlive* parameters that control the way a client sends those messages. The keepalive implementation in parallel-ssh when using libssh is explicitly dummy.


in implementation (1), libssh2 client code has the machinery to send client generated keep alive messages and that can be controlled programmatically only ( libssh2 does not care about ~/.ssh/config). parallel-ssh uses it, and the ParallelSSHClient constructor has a keepalive_seconds=60 parameter.

When enabling debug (LogLevel DEBUG3) on /etc/ssh/sshd_config the messages are recorded as

Sep  4 12:40:01 b07 sshd[1961781]: debug3: receive packet: type 80
Sep  4 12:40:01 b07 sshd[1961781]: debug1: server_input_global_request: rtype [email protected] want_reply 0

To see the difference, the keepalive messages coming from an openssh ssh client are recorded as:

Sep  1 20:18:13 b07 sshd[468119]: debug3: receive packet: type 80
Sep  1 20:18:13 b07 sshd[468119]: debug1: server_input_global_request: rtype [email protected] want_reply 1
Sep  1 20:18:13 b07 sshd[468119]: debug3: send packet: type 82

The downside of using libssh2 is that it does not implement certificate-based ssh authentication, which Medusa uses (parameter ssh.cert_file).

Therefore, the changes switches using (1) unless ssh.cert_file has a value in which case it uses (2) instead and keep-alive won't be generated.

testing keepalive

All the sshd servers need to be configured (/etc/ssh/sshd_config) with values like to

ClientAliveInterval 60
ClientAliveCountMax 0

which would close an ssh connection after 60s if it detects no "activity" on it, where activity looks to be either a keystroke or a keepalive message coming from the ssh client. If you run a cluster backup that takes several minutes to complete, the command will fail after ~ ClientAliveInterval seconds with

ssh.exceptions.SSHError: (-1, b'Socket error: disconnected')

If you then re-do the backup with a Medusa that includes this patch, and set a keep_alive of 20s, the backup will go through and you should also see the libssh2 keepalive messages in the sshd logs (/var/log/secure often) spaced 20s from one another.

This change should take care of medusa issue 689 SSH Timeout ( Error on first backup-cluster or restore ).

details about the change for use-pty

In restrictive environments, sudoers can be required to have a terminal associated to what they run. This is the parameter Defaults requiretty in /etc/sudoers* . If this setting cannot be changed for the user running Medusa command, then when executing a cluster backup with sudo (or anything distributed requiring sudo) the operation will fail with

sudo: sorry, you must have a tty to run sudo

setting use_pty=true will avoid this problem.
Both implementations (1) and (2) accept the parameter.

Copy link

No linked issues found. Please add the corresponding issues in the pull request description.
Use GitHub automation to close the issue when a PR is merged

Copy link

sonarcloud bot commented Sep 17, 2024

@rzvoncek rzvoncek linked an issue Oct 10, 2024 that may be closed by this pull request
@rzvoncek rzvoncek merged commit dc19e41 into thelastpickle:master Oct 14, 2024
27 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSH Timeout Error on first backup-cluster or restore
2 participants