-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add plaintext client as alternative to SSH. #198
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,10 +23,12 @@ | |
|
||
from rally.common import db | ||
|
||
import socket | ||
import selectors | ||
import time | ||
|
||
cidr_incr = utils.RAMInt() | ||
|
||
|
||
''' | ||
Find credential resource from DB by deployment uuid, and return | ||
info as a dict. | ||
|
@@ -147,9 +149,87 @@ def get_sandboxes(deploy_uuid, farm="", tag=""): | |
return sandboxes | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
class NCatError(Exception): | ||
def __init__(self, details): | ||
self.details = details | ||
|
||
|
||
class NCatClient(object): | ||
def __init__(self, server): | ||
self.server = server | ||
self.sock = socket.create_connection((server, 8000)) | ||
self.sel = selectors.DefaultSelector() | ||
self.sel.register(self.sock, selectors.EVENT_READ) | ||
|
||
def run(self, cmd, stdin=None, stdout=None, stderr=None, | ||
raise_on_error=True, timeout=3600): | ||
start = time.clock_gettime(time.CLOCK_MONOTONIC) | ||
end = time.clock_gettime(time.CLOCK_MONOTONIC) + timeout | ||
to = end - start | ||
# We have to doctor the command a bit for three reasons: | ||
# 1. We need to add a newline to ensure that the command | ||
# gets sent to the server and doesn't just get put in | ||
# the socket's write buffer. | ||
# 2. We need to pipe stderr to stdout so that stderr gets | ||
# returned over the client connection. | ||
# 3. We need to add some marker text so our client knows | ||
# that it has received all output from the command. This | ||
# marker text let's us know if the command completed | ||
# successfully or not. | ||
good = "SUCCESS" | ||
bad = "FAIL" | ||
result = f"&& echo -n {good} || echo -n {bad}" | ||
self.sock.send(f"({cmd}) 2>&1 {result}\n".encode('utf-8')) | ||
out = "" | ||
stream = None | ||
error = False | ||
while True: | ||
events = self.sel.select(to) | ||
for key, mask in events: | ||
buf = key.fileobj.recv(4096).decode('utf-8') | ||
if buf.endswith(good): | ||
out += buf[:-len(good)] | ||
stream = stdout | ||
break | ||
elif buf.endswith(bad): | ||
out += buf[:-len(bad)] | ||
# We assume that if the command errored, then everything | ||
# that was output was stderr. This isn't necessarily | ||
# accurate but it hopefully won't ruffle too many feathers. | ||
stream = stderr | ||
error = True | ||
break | ||
else: | ||
out += buf | ||
to = end - time.clock_gettime(time.CLOCK_MONOTONIC) | ||
|
||
if stream is not None: | ||
stream.write(out) | ||
|
||
if error and raise_on_error: | ||
details = (f"Error running command {cmd}\n" | ||
f"Last stderr output is {out}\n") | ||
raise NCatError(details) | ||
|
||
def close(self): | ||
# Test scenarios call close after every operation because with SSH, | ||
# this is necessary to ensure that we do not open too many | ||
# connections. Our ncat client cache is keyed on hostname rather than | ||
# on the controller node. This means that we open far fewer connections | ||
# than SSH does. Therefore, there is no reason to close connections | ||
# as frequently as we do with SSH. We can afford to leave the | ||
# connection open and reuse the clients instead. This is why we "pass" | ||
# in this method. | ||
pass | ||
|
||
|
||
NCAT_CLIENT_CACHE = {} | ||
|
||
|
||
def get_client_connection(cred): | ||
try: | ||
global NCAT_CLIENT_CACHE | ||
server = cred["host"] | ||
return NCAT_CLIENT_CACHE.setdefault(server, NCatClient(server)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you please add a option for selecting client? It's better to keep ssh client as default instead of a fallback. There could be some security concerns about using plain text client in corp network. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Absolutely, I can do that. |
||
except socket.error: | ||
return get_ssh_from_credential(cred) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding this is not true for SSH. We don't close connections for every operation. SSH connection is cached for every farm node, which models a physical server. For each new operation, it doesn't recreate new SSH connection, but just reusing existing connection to the node. Connections are closed when the test is done and the test program exits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have an explanation for this :)
This patch was originally made against https://github.com/dceara/ovn-scale-test/tree/ovn-switch-per-node-devel . When I discussed with Dumitru about pushing this change, we agreed to open a PR here first and then we could port it to that particular branch after.
On that branch, we have added close() calls in a lot of places. We ran into issues where tests would fail after a while. We could no longer open new SSH connections because the servers would reject incoming connections due to having too many open. We added close() calls after basically every SSH command. My comment here makes sense in the context of how that branch operates, but here it's just wrong. I'll correct it.
By the way, if you're wondering why we are running into this problem of too many SSH connections, it likely has to do with our different understanding of farm nodes. As I said in another comment, we configure our tests to have multiple farm nodes per physical machine. So the number of SSH connections can pile up more quickly than if each farm node maps to a single physical machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@putnopvut sorry for the confusion with the branches in my fork. Branch ovn-switch-per-node-devel is stale. What we use downstream is ovn-switch-per-node and in that branch I removed all the
close()
calls because we addressed the issues we had, i.e., use runner typeserial
instead of a single runner that would connect to all farm nodes. Like this only the connections used in one of the iteration of the runner will be open (and will get cleaned up when the iteration ends).I'll try to open PRs for all the stuff that's only in my fork as soon as I get the chance. I hope that will clear things out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explain @putnopvut and @dceara