Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert ovn-tester to use asyncio #66

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Conversation

putnopvut
Copy link
Collaborator

ovn-tester currently works in a synchronous and serialized manner. If we are going to provision, say, 50 ports, the operation will go something like:

  • Issue SSH command to create a port, and wait for that to return
  • Issue SSH commands to update load balancers, and wait for them to return
  • Issue SSH commands to ping ports, and wait for them to return
    Repeat 50 times.

With this series, we use python's asyncio framework to create tasks that are each responsible for performing the above operations. Each task looks something like:

  • Issue SSH command to create a port, then yield control to the event loop
  • Collect result of SSH command, then issue SSH commands to update load balancers. Yield control to the event loop.
  • Collect result of SSH command, then issue SSH commands to ping ports. Yield control to the event loop.
  • Collect result of SSH command.
    With asyncio, we can deploy multiple of these tasks at once, allowing the event loop to give control to the individual tasks as they have data to work with. The tests have been updated to take a "queries_per_second" option that allows for the number of tasks to create per second to be configured. By default, all the tests create 20 tasks per second.

In order to support asyncio, this also changes the SSH library from paramiko to asyncssh.

See the individual commits in this PR for explanations of all of what is happening in the code.

Some notes:

  • This change makes use of asyncio functions that are only available in python 3.7+. ovn-heater should probably be updated to ensure that a proper version of python is used in the virtualenv when running tests.
  • When testing this change in a cluster of RHEL8 servers, I found that the central container that was receiving the ovn-nbctl commands would get hung. It appeared that issuing multiple docker exec calls at once caused the container to deadlock. I fixed this by upgrading the docker version to 20.10.8 on the server in question. ovn-heater should probably be updated to ensure that a new enough docker version is installed on the physical nodes.
  • Stats are accurate, but may potentially be misleading.
    • Since test iterations run concurrently, adding together multiple iterations' times doesn't make much sense. As an example, I ran the ovn-20.yml scenario on a cluster, and the entire test took 20 minutes to complete. However, looking at the netpol_multitenant report, the sum of all iterations is 70 hours.
    • Measurements include idle time while the event loop runs other tasks. As an example, let's say we issue an SSH command to ping a port. 5 seconds later, the SSH command completes. However, because the event loop is running other tasks, the result of that SSH command is not collected until 30 seconds after the ping was initially issued. In this case, the stats would show the ping took 30 seconds. It may be worth changing our measurements to be performed on the server instead (or in addition to our client-side timing)

Copy link
Collaborator

@dceara dceara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall, thanks! I only left a few minor comments and questions. Also, sorry, but you have to rebase.

ovn-tester/ovn_load_balancer.py Show resolved Hide resolved
ovn-tester/ovn_stats.py Outdated Show resolved Hide resolved
ovn-tester/ovn_workload.py Show resolved Hide resolved
ovn-tester/ovn_context.py Outdated Show resolved Hide resolved
ovn-tester/ovn_workload.py Outdated Show resolved Hide resolved
n_vips = len(self.load_balancer.vips.keys())
vip_ip = vip_net.ip.__add__(n_vips + 1)
async def provision_vips_to_load_balancers(self, backend_lists):
async with self.lock:
Copy link
Collaborator

@dceara dceara Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid this lock all together if we split the add_vips API into, e.g., reserve_vips() (synchronous and just adding the items to the load balancer vips dict and returning them) and add_vips() coroutine to actually set the vips. If you agree, I'll leave it up to you whether we should make the change in this PR or in a separate one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was messing with this just now, and removing the locking is going to be non-trivial.

Adding VIPs is not an additive operation. Under the hood it calls ovn-nbctl set Load_Balancer vips .... That means each individual task is re-setting the entire VIP list instead of just adding their own VIPs to the configured load balancer. This means that each task needs to have exclusive access to the load balancer.

The VIPs are added in batches, meaning that each time we add new VIPs, we might end up making multiple ovn-nbctl calls per load balancer. Since all of those ovn-nbctl calls are asynchronous, it gives other tasks the opportunity to foul up the configured VIPs. So even if we could make the locking finer-grained, I think we'd still need locking as long as we're batching the VIPs.

Having said that, I actually noticed that there's a bug in the current add_vips() method due to the batching code. Since we are setting the VIPs on the load balancer instead of adding to them, it means that each batch undoes the previous batch's operation. Effectively it means that a load balancer will never have more than MAX_VIPS_IN_BATCH (currently 500) vips configured. This may not be as big an issue with load balancer per service, but it's still a bug. Since the operation is not additive, I actually have no idea how batching can work here.

If we could make adding VIPs additive, then this would be easy since each task would be performing independent non-overlapping operations. It would also allow for proper batching.

Copy link
Collaborator

@dceara dceara Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was messing with this just now, and removing the locking is going to be non-trivial.

Adding VIPs is not an additive operation. Under the hood it calls ovn-nbctl set Load_Balancer vips .... That means each individual task is re-setting the entire VIP list instead of just adding their own VIPs to the configured load balancer. This means that each task needs to have exclusive access to the load balancer.

Actually, I think adding VIPs is an additive operation, it translates to something equivalent to:

$ ovn-nbctl lb-add lb-test 42.42.42.42:4242 21.21.21.21:2121
$ ovn-nbctl list load_balancer
_uuid               : 00874eb0-2266-43da-870c-3e8f9cb55617
name                : lb-test
vips                : {"42.42.42.42:4242"="21.21.21.21:2121"}
$ ovn-nbctl set load_balancer lb-test vips:\"43.43.43.43:4343\"=\"22.22.22.22:222\"
$ ovn-nbctl list load_balancer
_uuid               : 00874eb0-2266-43da-870c-3e8f9cb55617
name                : lb-test
vips                : {"42.42.42.42:4242"="21.21.21.21:2121", "43.43.43.43:4343"="22.22.22.22:222"

The VIPs are added in batches, meaning that each time we add new VIPs, we might end up making multiple ovn-nbctl calls per load balancer. Since all of those ovn-nbctl calls are asynchronous, it gives other tasks the opportunity to foul up the configured VIPs. So even if we could make the locking finer-grained, I think we'd still need locking as long as we're batching the VIPs.

Having said that, I actually noticed that there's a bug in the current add_vips() method due to the batching code. Since we are setting the VIPs on the load balancer instead of adding to them, it means that each batch undoes the previous batch's operation. Effectively it means that a load balancer will never have more than MAX_VIPS_IN_BATCH (currently 500) vips configured. This may not be as big an issue with load balancer per service, but it's still a bug. Since the operation is not additive, I actually have no idea how batching can work here.

As mentioned above, unless I'm missing something, the operation is already additive.

If we could make adding VIPs additive, then this would be easy since each task would be performing independent non-overlapping operations. It would also allow for proper batching.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think adding VIPs is an additive operation, it translates to something equivalent to:

$ ovn-nbctl lb-add lb-test 42.42.42.42:4242 21.21.21.21:2121
$ ovn-nbctl list load_balancer
_uuid               : 00874eb0-2266-43da-870c-3e8f9cb55617
name                : lb-test
vips                : {"42.42.42.42:4242"="21.21.21.21:2121"}
$ ovn-nbctl set load_balancer lb-test vips:\"43.43.43.43:4343\"=\"22.22.22.22:222\"
$ ovn-nbctl list load_balancer
_uuid               : 00874eb0-2266-43da-870c-3e8f9cb55617
name                : lb-test
vips                : {"42.42.42.42:4242"="21.21.21.21:2121", "43.43.43.43:4343"="22.22.22.22:222"

That works as long as the key for the VIP is different each time. It appears our tests do not attempt to add backends to existing VIPs, so you're right this should work for our purposes.

If we can guarantee that the set of keys in the dictionary of VIPs passed to add_vips() does not intersect with the set of keys already configured on the load balancer, we could also simplify some of the code in add_vips()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we raise an exception then if a test tries to add backends to an already configured VIP?

ovn-tester/ovn_workload.py Show resolved Hide resolved
ovn-tester/ovn_context.py Show resolved Hide resolved
@@ -649,12 +649,14 @@ async def create_cluster_join_switch(self, sw_name):
)

async def provision_ports(self, n_ports, passive=False):
ret_list = []
tasks = []
ctx = ovn_context.active_context
Copy link

@mohammadheib mohammadheib Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
i think here we can save few seconds by looping over workers instead of ports and create a bundle of ports per ports for each loop iteration i'm not sure if that could really improve the timing of the tests but i saw some improvement on my tests:


@@ -651,10 +659,15 @@ class Cluster(object):
     async def provision_ports(self, n_ports, passive=False):
         tasks = []
         ctx = ovn_context.active_context
-        for _ in range(n_ports):
+        
+        for _ in range(len(self.worker_nodes)):
             worker = self.select_worker_for_port()
-            tasks.append(ctx.create_task(worker.provision_ports(self, 1,
+            tasks.append(ctx.create_task(worker.provision_ports(self, (n_ports//len(self.worker_nodes)),
                                                                 passive)))
+        for _ in range(n_ports%len(self.worker_nodes)):
+            worker = self.select_worker_for_port()
+            tasks.append(ctx.create_task(worker.provision_ports(self, 1, passive)))
+
         results = await asyncio.gather(*tasks)
         return [ports[0] for ports in results]
@@ -319,10 +320,17 @@ class WorkerNode(Node):
         await vsctl.del_port(port)
 
     async def provision_ports(self, cluster, n_ports, passive=False):
-        ports = [await self.provision_port(cluster, passive)
+        pcreate_tasks = []
+        pbind_tasks = []
+        ctx = ovn_context.active_context
+        
+        pcreate_tasks = [ctx.create_task(self.provision_port(cluster, passive))
                  for i in range(n_ports)]
+        ports = await asyncio.gather(*pcreate_tasks)
+
         for port in ports:
-            await self.bind_port(port)
+            pbind_tasks.append(ctx.create_task(self.bind_port(port)))
+        await asyncio.gather(*pbind_tasks)
         return ports
 

@putnopvut
Copy link
Collaborator Author

Hi, just leaving a comment to explain a few things. I have a couple more commits I want to add to this series and then I'll update the PR. I'm testing them in my async_sandbox branch right now. The first is one that changes how we wait for ports to be up. The second adds the connection pool as has been discussed in this PR.

When reviewing, I suspect you'll be able to find things that could be optimized, or you could find places where we can use asyncio features in ways that will be more efficient. But I think those sorts of improvements can wait until after the initial asyncio support is merged.

I also opened two issues: #85 and #86 . These issues exist without asyncio, but with asyncio they're even worse. I don't think those issues should impede us from getting the initial asyncio code in, though. These can be addressed afterwards.

It's currently Friday afternoon where I am so I've run out of time for the week. However, I plan on getting this PR updated early next week so that this can be reviewed again.

This commit converts the code to use asyncio. At the heart of the
change, the paramiko SSH library is replaced with asyncssh. From there,
the necessary async/await keywords are added to make the code work
properly with this library.

This initial commit is done in a pretty brainless way. That is, we
simply add the necessary "async" and "await" calls throughout the code
to make this work in an asynchronous manner. However, there is nothing
that actually takes advantage of the asynchronicity at this point, so
the code still behaves exactly the same as it did prior to this change.
Upcoming changes will take advantage of the asyncio and optimize some
code sections.

__init()__ methods on a class cannot be made asynchronous. In cases
where we were using SSH in __init()__ methods, a factory function has
been created to call instead.

Note: Running external commands is broken by this patch. This is because
running the external command is asynchronous, but it's called from the
synchronous __next__() method of Context. I could change Context to use
__aiter__() and __anext__() but future commits will completely change
how Contexts are structured, and it's easier to just leave this one
aspect broken temporarily.

Signed-off-by: Mark Michelson <[email protected]>
Using the typical iteration method for contexts with asynchronous code
doesn't work. Different iterations can overlap, so you can't just keep
state of a single "current" iteration.

To fix this, we need to be able to keep up with multiple iterations at
once within a context. Therefore iterations have been made a first class
object and are kept in a dictionary in the Context. A worker can
determine the current iteration by calling the new
get_current_iteration() function.

Signed-off-by: Mark Michelson <[email protected]>
The current_ext_cmds logic has some flaws with asynchronous code:
* Since all commands are run at the beginning of the specified
  iteration, and since iterations can be started in a batch, it means
  that commands might run in quicker succession than anticipated.
* Iterations typically start in the expected order, but sometimes you
  may see iterations run out of order. This could lead to strangeness if
  you assume that iterations always run in order.
* Running background commands by appending '&' to the end of the command
  doesn't work as expected with asyncssh. Control is not returned
  immediately, and so the code blocks forever.

Presumably, the only reason why it's important to start a process in the
background explicitly is so that the process can then be terminated in a
later iteration. Since asyncssh can keep track of the remote processes
it is running, it's easier to just explicitly define the number of
iterations that an external command should run for before terminating.

This commit adds a "duration" configuration option to external commands.
If the duration is 0, then the command runs immediately and we wait for
the command to complete before continuing. Otherwise, we start a remote
process. As each iteration completes, we decrease the remaining duration
until it reaches 0. At that point, the command is terminated.

This also fixes a minor bug in the ext_cmd code. The ext_cmd code was
returning the output of the command, but the only caller to execute
commands did not inspect the returned value. We no longer attempt to
return the output of the command.

Signed-off-by: Mark Michelson <[email protected]>
When running multiple provisioners, we need to ensure that each one is
allocating a uniquely named port. This change rearranges the code to
allow for this.

Signed-off-by: Mark Michelson <[email protected]>
Without the lock, multiple tasks could request the same VIPs to be added
to the load balancer. The lock ensures that each task adds a distinct
set of load balancer VIPs.

Signed-off-by: Mark Michelson <[email protected]>
When pinging multiple ports, we should initiate the pings all at once
rather than issuing them in series.

Signed-off-by: Mark Michelson <[email protected]>
This will be used heavily by individual tests in upcoming commits.

Signed-off-by: Mark Michelson <[email protected]>
With the switch to asynchronous code, these methods are no longer used
and they're no longer useful. Remove.

Signed-off-by: Mark Michelson <[email protected]>
If we need to provision ports on multiple workers, we may as well do it
all at once.

Suggested-by: Mohammad Heib <[email protected]>
Signed-off-by: Mark Michelson <[email protected]>
Pings are used for two reasons in ovn-tester:
1) To test that a network policy is being enforced.
2) To determine that a newly-created port is up.

The first reason is a good use of pings since it ensures that the
packets are able (or unable) to make a specific roundtrip.

The second reason is useful, but it's also used to determine the time
that it takes for the port in question to go from down to up. This is
where it falls short.

First, it's vital that the ping attempts begin ASAP after provisioning
the new port. If we do not attempt to ping for a long time, then the
pings are likely to succeed immediately, and they don't reflect well on
how long it actually took for the port to be up. For synchronous code,
this isn't much of a concern, since only a single worker is ever
"active" at once. It's easy to ensure that the ping is attempted
immediately after binding the port. For asynchronous code, though, this
guarantee is harder to make. If there are enough active tasks, it may be
several seconds (or even minutes) between when the port is bound and
when we start pings.

Second, multiple pings may be necessary before we can make the
determination the port is up. Like with the problem in the above
paragraph, it's hard for asynchronous code to ensure that it can attempt
the follow-up pings in a timely manner. So even if we were to fix the
first problem and immediately start pinging after binding a port, we
also have to ensure that all follow-up pings happen ASAP.

A much better means of measuring the time it takes for a port to come up
is to use the external_ids:ovn-installed-ts value in the OVS Interface
table. This records the timestamp when the ovn-controller instance
claimed the binding. If we record when we provisioned the port, then we
can subtract this time from the ovn-installed-ts to find out the amount
of time that passed between when we inserted the port into the
northbound database and when the port was claimed by an ovn-controller.

Another nice thing is that we can perform this calculation long after
the port has been claimed and it will be the same. Therefore, we can
retrospectively find the wait times for all ports after a test has
completed, rather than performing mid-test pings.

This commit adds some new methods to WorkerNode and Cluster to wait and
measure the time it took for a port to be claimed. It then changes the
base_cluster_bringup, density_heavy, density_light, and cluster_density
tests to use this method for determining the amount of time the port
took to come up.

One thing this commit "breaks" a bit is the logging for whether an
iteration succeeded or failed. Since we wait until after the test is
completed to calculate the wait times, it means that iterations will
complete and potentially log that they were successful, but our
calculation afterward can end up failing the iteration.

One final note: with this commit, the ping_ports() members of WorkerNode
and Cluster are no longer used. They could be removed altogether, but
they could prove useful at some point, so I've kept them in the code.

Signed-off-by: Mark Michelson <[email protected]>
asyncio has the potential to create many tasks that each may be issuing
SSH commands. When using a single SSH connection, asyncssh has to
multiplex the commands, and it can lead to slowdown of tests.

This commit adds a configuration option for the central node that allows
for multiple SSH connections to be used instead. It also sets a default
of 30 connections to the central node. Doing this greatly increases the
speed at which tests run.

The worker nodes have a hard-coded 5 connections per node. This could
potentially be configurable too, but worker nodes tend not to have as
much traffic sent to them, so there is typically not a need to have a
lot of connections.

Signed-off-by: Mark Michelson <[email protected]>
Until this point, when running a qps_test, the ovn_context code would
automatically take care of marking an iteration as started or ended.

This is a problem, though, for tests that wait until all iterations have
completed to determine if ports came up within the configured time
limit. The issue is that the ovn_context will mark the iteration as
completed, potentially logging the iteration as successful. However,
after this has been logged, the test could mark the iteration as failed
if it turns out the port did not come up within the configured time
limit.

The solution here is to allow for tests to override the default behavior
by letting them mark the iteration as completed. To do this, the
qps_test() function now accepts a kwargs parameter, and setting
end_iteration=False will allow for the individual test to mark
iterations as complete instead of having it done automatically by the
ovn_context code.

For convenience, a new all_iterations_completed() method is added to
ovn_context so the tests don't have to keep track of iterations to mark
them as complete.

Signed-off-by: Mark Michelson <[email protected]>
@putnopvut
Copy link
Collaborator Author

Hello again. I've committed the last set of patches I planned to commit for this PR. It is stable and tests are behaving as expected. As mentioned in the previous comment, there is room for some improvement, and those recognized improvements have been opened as issues.

Copy link
Collaborator

@dceara dceara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, with this, the only (mostly) reliable measurement is the wait_for_port_up value because we use ovn-installed-ts to determine when a port is up (I'm saying "mostly" because the start timestamp is inaccurate, it's after lsp-add and lsp-get were executed).

Should we emphasize this in the report somehow? Or shall we remove the other measurements?

Also, do you have some test run results on larger scale deployments (more physical machines)?

await self.run(cmd=f'ovs-vsctl --timeout={timeout} wait-until '
f'Interface {port.name} '
f'external_ids:ovn-installed=true')
log.debug(f'Port {port.name} is up')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if ovs-vsctl --timeout=... wait-until .. timed out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants