Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long running SSH connections causing exceptions/failures #38

Open
CpuID opened this issue Sep 17, 2013 · 5 comments
Open

Long running SSH connections causing exceptions/failures #38

CpuID opened this issue Sep 17, 2013 · 5 comments

Comments

@CpuID
Copy link

CpuID commented Sep 17, 2013

I have some long running scripts that maintain SSH connections, by virtue that Rye boxes keep the SSH connection open persistently from initial use through to script completion (unless you call disconnect in between).

I seem to get the below after a few minutes:

Failure on thread threadname: connection closed by remote host
/var/lib/gems/1.8/gems/net-ssh-2.6.8/lib/net/ssh/transport/packet_stream.rb:87:in next_packet': connection closed by remote host (Net::SSH::Disconnect) from /var/lib/gems/1.8/gems/net-ssh-2.6.8/lib/net/ssh/transport/session.rb:172:inpoll_message'
from /var/lib/gems/1.8/gems/net-ssh-2.6.8/lib/net/ssh/transport/session.rb:167:in loop' from /var/lib/gems/1.8/gems/net-ssh-2.6.8/lib/net/ssh/transport/session.rb:167:inpoll_message'
from /var/lib/gems/1.8/gems/net-ssh-2.6.8/lib/net/ssh/connection/session.rb:454:in dispatch_incoming_packets' from /var/lib/gems/1.8/gems/net-ssh-2.6.8/lib/net/ssh/connection/session.rb:216:inpreprocess'
from /var/lib/gems/1.8/gems/net-ssh-2.6.8/lib/net/ssh/connection/session.rb:200:in process' from /var/lib/gems/1.8/gems/net-ssh-2.6.8/lib/net/ssh/connection/session.rb:164:inloop'
from /var/lib/gems/1.8/gems/net-ssh-2.6.8/lib/net/ssh/connection/session.rb:164:in loop_forever' from /var/lib/gems/1.8/gems/net-ssh-2.6.8/lib/net/ssh/connection/session.rb:164:inloop'
from /var/lib/gems/1.8/gems/rye-0.9.8/lib/rye/box.rb:717:in disconnect' from /usr/lib/ruby/1.8/timeout.rb:67:intimeout'
from /var/lib/gems/1.8/gems/rye-0.9.8/lib/rye/box.rb:716:in disconnect' from /var/lib/gems/1.8/gems/rye-0.9.8/lib/rye/box.rb:142:ininitialize'

I have checked the server side, and the auth.log contains:

Sep 17 03:26:54 XXX sshd[31216]: Timeout, client not responding.
Sep 17 03:26:54 XXX sshd[31197]: pam_unix(sshd:session): session closed for user XXX

@delano
Copy link
Owner

delano commented Sep 17, 2013

If there's no output sent by or received by the client, it's possible that the ssh server or some point along the network is closing the connection.

Is the script executing a bunch of commands or just calling a couple really slow one?

@CpuID
Copy link
Author

CpuID commented Sep 17, 2013

Couple of really slow ones, in this case a tar xjf on a 20-30GB file.

One thing to note, I am calling the Rye box from within a Thread.new { } block, mainly because I perform an rsync then an untar within the same thread (3-4 boxes at a time, each one waits for its rsync to finish independently, then Rye is called to perform the untar straight after).

So it is possible one box might be running a command and waiting, another box might have a connection open but doing absolutely nothing in a different thread.

Ideally if I can enable some kind of keep-alive that would be the preferred option so I can move forward... :)

@delano
Copy link
Owner

delano commented Sep 17, 2013

Two things:

  • you can use Rye::Set to run commands in parallel instead of running your own threads. That'll make your code a little cleaner.
  • instead of calling the command directly, you could write a short script that outputs a period every few seconds (one example: http://bash.cyberciti.biz/guide/Putting_functions_in_background).

@CpuID
Copy link
Author

CpuID commented Sep 17, 2013

Yea I use Rye sets already actually :) The main thing here is I need to use ruby-rsync within the thread in addition to Rye calls to boxes, hence the use of a separate wrapper Thread.new call.

I see what you mean, to force some output to be generated. Ideally though having SSH keep-alive would be nicer :)

@CpuID
Copy link
Author

CpuID commented Sep 17, 2013

In the end I think I solved it. I did a disconnect on all boxes that I am not using during the long running segment. I think I had some boxes sitting there idle, and they were causing havoc with the ones that were in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants