-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert ovn-tester to use asyncio #66
base: main
Are you sure you want to change the base?
Conversation
3e17079
to
392401a
Compare
2a59825
to
fe6366c
Compare
ab2bbed
to
92c0388
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall, thanks! I only left a few minor comments and questions. Also, sorry, but you have to rebase.
n_vips = len(self.load_balancer.vips.keys()) | ||
vip_ip = vip_net.ip.__add__(n_vips + 1) | ||
async def provision_vips_to_load_balancers(self, backend_lists): | ||
async with self.lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can avoid this lock all together if we split the add_vips
API into, e.g., reserve_vips()
(synchronous and just adding the items to the load balancer vips dict and returning them) and add_vips()
coroutine to actually set the vips. If you agree, I'll leave it up to you whether we should make the change in this PR or in a separate one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was messing with this just now, and removing the locking is going to be non-trivial.
Adding VIPs is not an additive operation. Under the hood it calls ovn-nbctl set Load_Balancer vips ...
. That means each individual task is re-setting the entire VIP list instead of just adding their own VIPs to the configured load balancer. This means that each task needs to have exclusive access to the load balancer.
The VIPs are added in batches, meaning that each time we add new VIPs, we might end up making multiple ovn-nbctl calls per load balancer. Since all of those ovn-nbctl calls are asynchronous, it gives other tasks the opportunity to foul up the configured VIPs. So even if we could make the locking finer-grained, I think we'd still need locking as long as we're batching the VIPs.
Having said that, I actually noticed that there's a bug in the current add_vips()
method due to the batching code. Since we are setting the VIPs on the load balancer instead of adding to them, it means that each batch undoes the previous batch's operation. Effectively it means that a load balancer will never have more than MAX_VIPS_IN_BATCH
(currently 500) vips configured. This may not be as big an issue with load balancer per service, but it's still a bug. Since the operation is not additive, I actually have no idea how batching can work here.
If we could make adding VIPs additive, then this would be easy since each task would be performing independent non-overlapping operations. It would also allow for proper batching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was messing with this just now, and removing the locking is going to be non-trivial.
Adding VIPs is not an additive operation. Under the hood it calls
ovn-nbctl set Load_Balancer vips ...
. That means each individual task is re-setting the entire VIP list instead of just adding their own VIPs to the configured load balancer. This means that each task needs to have exclusive access to the load balancer.
Actually, I think adding VIPs is an additive operation, it translates to something equivalent to:
$ ovn-nbctl lb-add lb-test 42.42.42.42:4242 21.21.21.21:2121
$ ovn-nbctl list load_balancer
_uuid : 00874eb0-2266-43da-870c-3e8f9cb55617
name : lb-test
vips : {"42.42.42.42:4242"="21.21.21.21:2121"}
$ ovn-nbctl set load_balancer lb-test vips:\"43.43.43.43:4343\"=\"22.22.22.22:222\"
$ ovn-nbctl list load_balancer
_uuid : 00874eb0-2266-43da-870c-3e8f9cb55617
name : lb-test
vips : {"42.42.42.42:4242"="21.21.21.21:2121", "43.43.43.43:4343"="22.22.22.22:222"
The VIPs are added in batches, meaning that each time we add new VIPs, we might end up making multiple ovn-nbctl calls per load balancer. Since all of those ovn-nbctl calls are asynchronous, it gives other tasks the opportunity to foul up the configured VIPs. So even if we could make the locking finer-grained, I think we'd still need locking as long as we're batching the VIPs.
Having said that, I actually noticed that there's a bug in the current
add_vips()
method due to the batching code. Since we are setting the VIPs on the load balancer instead of adding to them, it means that each batch undoes the previous batch's operation. Effectively it means that a load balancer will never have more thanMAX_VIPS_IN_BATCH
(currently 500) vips configured. This may not be as big an issue with load balancer per service, but it's still a bug. Since the operation is not additive, I actually have no idea how batching can work here.
As mentioned above, unless I'm missing something, the operation is already additive.
If we could make adding VIPs additive, then this would be easy since each task would be performing independent non-overlapping operations. It would also allow for proper batching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think adding VIPs is an additive operation, it translates to something equivalent to:
$ ovn-nbctl lb-add lb-test 42.42.42.42:4242 21.21.21.21:2121 $ ovn-nbctl list load_balancer _uuid : 00874eb0-2266-43da-870c-3e8f9cb55617 name : lb-test vips : {"42.42.42.42:4242"="21.21.21.21:2121"} $ ovn-nbctl set load_balancer lb-test vips:\"43.43.43.43:4343\"=\"22.22.22.22:222\" $ ovn-nbctl list load_balancer _uuid : 00874eb0-2266-43da-870c-3e8f9cb55617 name : lb-test vips : {"42.42.42.42:4242"="21.21.21.21:2121", "43.43.43.43:4343"="22.22.22.22:222"
That works as long as the key for the VIP is different each time. It appears our tests do not attempt to add backends to existing VIPs, so you're right this should work for our purposes.
If we can guarantee that the set of keys in the dictionary of VIPs passed to add_vips()
does not intersect with the set of keys already configured on the load balancer, we could also simplify some of the code in add_vips()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we raise an exception then if a test tries to add backends to an already configured VIP?
@@ -649,12 +649,14 @@ async def create_cluster_join_switch(self, sw_name): | |||
) | |||
|
|||
async def provision_ports(self, n_ports, passive=False): | |||
ret_list = [] | |||
tasks = [] | |||
ctx = ovn_context.active_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
i think here we can save few seconds by looping over workers instead of ports and create a bundle of ports per ports for each loop iteration i'm not sure if that could really improve the timing of the tests but i saw some improvement on my tests:
@@ -651,10 +659,15 @@ class Cluster(object):
async def provision_ports(self, n_ports, passive=False):
tasks = []
ctx = ovn_context.active_context
- for _ in range(n_ports):
+
+ for _ in range(len(self.worker_nodes)):
worker = self.select_worker_for_port()
- tasks.append(ctx.create_task(worker.provision_ports(self, 1,
+ tasks.append(ctx.create_task(worker.provision_ports(self, (n_ports//len(self.worker_nodes)),
passive)))
+ for _ in range(n_ports%len(self.worker_nodes)):
+ worker = self.select_worker_for_port()
+ tasks.append(ctx.create_task(worker.provision_ports(self, 1, passive)))
+
results = await asyncio.gather(*tasks)
return [ports[0] for ports in results]
@@ -319,10 +320,17 @@ class WorkerNode(Node):
await vsctl.del_port(port)
async def provision_ports(self, cluster, n_ports, passive=False):
- ports = [await self.provision_port(cluster, passive)
+ pcreate_tasks = []
+ pbind_tasks = []
+ ctx = ovn_context.active_context
+
+ pcreate_tasks = [ctx.create_task(self.provision_port(cluster, passive))
for i in range(n_ports)]
+ ports = await asyncio.gather(*pcreate_tasks)
+
for port in ports:
- await self.bind_port(port)
+ pbind_tasks.append(ctx.create_task(self.bind_port(port)))
+ await asyncio.gather(*pbind_tasks)
return ports
Hi, just leaving a comment to explain a few things. I have a couple more commits I want to add to this series and then I'll update the PR. I'm testing them in my When reviewing, I suspect you'll be able to find things that could be optimized, or you could find places where we can use asyncio features in ways that will be more efficient. But I think those sorts of improvements can wait until after the initial asyncio support is merged. I also opened two issues: #85 and #86 . These issues exist without asyncio, but with asyncio they're even worse. I don't think those issues should impede us from getting the initial asyncio code in, though. These can be addressed afterwards. It's currently Friday afternoon where I am so I've run out of time for the week. However, I plan on getting this PR updated early next week so that this can be reviewed again. |
This commit converts the code to use asyncio. At the heart of the change, the paramiko SSH library is replaced with asyncssh. From there, the necessary async/await keywords are added to make the code work properly with this library. This initial commit is done in a pretty brainless way. That is, we simply add the necessary "async" and "await" calls throughout the code to make this work in an asynchronous manner. However, there is nothing that actually takes advantage of the asynchronicity at this point, so the code still behaves exactly the same as it did prior to this change. Upcoming changes will take advantage of the asyncio and optimize some code sections. __init()__ methods on a class cannot be made asynchronous. In cases where we were using SSH in __init()__ methods, a factory function has been created to call instead. Note: Running external commands is broken by this patch. This is because running the external command is asynchronous, but it's called from the synchronous __next__() method of Context. I could change Context to use __aiter__() and __anext__() but future commits will completely change how Contexts are structured, and it's easier to just leave this one aspect broken temporarily. Signed-off-by: Mark Michelson <[email protected]>
Using the typical iteration method for contexts with asynchronous code doesn't work. Different iterations can overlap, so you can't just keep state of a single "current" iteration. To fix this, we need to be able to keep up with multiple iterations at once within a context. Therefore iterations have been made a first class object and are kept in a dictionary in the Context. A worker can determine the current iteration by calling the new get_current_iteration() function. Signed-off-by: Mark Michelson <[email protected]>
The current_ext_cmds logic has some flaws with asynchronous code: * Since all commands are run at the beginning of the specified iteration, and since iterations can be started in a batch, it means that commands might run in quicker succession than anticipated. * Iterations typically start in the expected order, but sometimes you may see iterations run out of order. This could lead to strangeness if you assume that iterations always run in order. * Running background commands by appending '&' to the end of the command doesn't work as expected with asyncssh. Control is not returned immediately, and so the code blocks forever. Presumably, the only reason why it's important to start a process in the background explicitly is so that the process can then be terminated in a later iteration. Since asyncssh can keep track of the remote processes it is running, it's easier to just explicitly define the number of iterations that an external command should run for before terminating. This commit adds a "duration" configuration option to external commands. If the duration is 0, then the command runs immediately and we wait for the command to complete before continuing. Otherwise, we start a remote process. As each iteration completes, we decrease the remaining duration until it reaches 0. At that point, the command is terminated. This also fixes a minor bug in the ext_cmd code. The ext_cmd code was returning the output of the command, but the only caller to execute commands did not inspect the returned value. We no longer attempt to return the output of the command. Signed-off-by: Mark Michelson <[email protected]>
When running multiple provisioners, we need to ensure that each one is allocating a uniquely named port. This change rearranges the code to allow for this. Signed-off-by: Mark Michelson <[email protected]>
Without the lock, multiple tasks could request the same VIPs to be added to the load balancer. The lock ensures that each task adds a distinct set of load balancer VIPs. Signed-off-by: Mark Michelson <[email protected]>
When pinging multiple ports, we should initiate the pings all at once rather than issuing them in series. Signed-off-by: Mark Michelson <[email protected]>
This will be used heavily by individual tests in upcoming commits. Signed-off-by: Mark Michelson <[email protected]>
Signed-off-by: Mark Michelson <[email protected]>
Signed-off-by: Mark Michelson <[email protected]>
Signed-off-by: Mark Michelson <[email protected]>
Signed-off-by: Mark Michelson <[email protected]>
Signed-off-by: Mark Michelson <[email protected]>
Signed-off-by: Mark Michelson <[email protected]>
Signed-off-by: Mark Michelson <[email protected]>
With the switch to asynchronous code, these methods are no longer used and they're no longer useful. Remove. Signed-off-by: Mark Michelson <[email protected]>
If we need to provision ports on multiple workers, we may as well do it all at once. Suggested-by: Mohammad Heib <[email protected]> Signed-off-by: Mark Michelson <[email protected]>
Pings are used for two reasons in ovn-tester: 1) To test that a network policy is being enforced. 2) To determine that a newly-created port is up. The first reason is a good use of pings since it ensures that the packets are able (or unable) to make a specific roundtrip. The second reason is useful, but it's also used to determine the time that it takes for the port in question to go from down to up. This is where it falls short. First, it's vital that the ping attempts begin ASAP after provisioning the new port. If we do not attempt to ping for a long time, then the pings are likely to succeed immediately, and they don't reflect well on how long it actually took for the port to be up. For synchronous code, this isn't much of a concern, since only a single worker is ever "active" at once. It's easy to ensure that the ping is attempted immediately after binding the port. For asynchronous code, though, this guarantee is harder to make. If there are enough active tasks, it may be several seconds (or even minutes) between when the port is bound and when we start pings. Second, multiple pings may be necessary before we can make the determination the port is up. Like with the problem in the above paragraph, it's hard for asynchronous code to ensure that it can attempt the follow-up pings in a timely manner. So even if we were to fix the first problem and immediately start pinging after binding a port, we also have to ensure that all follow-up pings happen ASAP. A much better means of measuring the time it takes for a port to come up is to use the external_ids:ovn-installed-ts value in the OVS Interface table. This records the timestamp when the ovn-controller instance claimed the binding. If we record when we provisioned the port, then we can subtract this time from the ovn-installed-ts to find out the amount of time that passed between when we inserted the port into the northbound database and when the port was claimed by an ovn-controller. Another nice thing is that we can perform this calculation long after the port has been claimed and it will be the same. Therefore, we can retrospectively find the wait times for all ports after a test has completed, rather than performing mid-test pings. This commit adds some new methods to WorkerNode and Cluster to wait and measure the time it took for a port to be claimed. It then changes the base_cluster_bringup, density_heavy, density_light, and cluster_density tests to use this method for determining the amount of time the port took to come up. One thing this commit "breaks" a bit is the logging for whether an iteration succeeded or failed. Since we wait until after the test is completed to calculate the wait times, it means that iterations will complete and potentially log that they were successful, but our calculation afterward can end up failing the iteration. One final note: with this commit, the ping_ports() members of WorkerNode and Cluster are no longer used. They could be removed altogether, but they could prove useful at some point, so I've kept them in the code. Signed-off-by: Mark Michelson <[email protected]>
asyncio has the potential to create many tasks that each may be issuing SSH commands. When using a single SSH connection, asyncssh has to multiplex the commands, and it can lead to slowdown of tests. This commit adds a configuration option for the central node that allows for multiple SSH connections to be used instead. It also sets a default of 30 connections to the central node. Doing this greatly increases the speed at which tests run. The worker nodes have a hard-coded 5 connections per node. This could potentially be configurable too, but worker nodes tend not to have as much traffic sent to them, so there is typically not a need to have a lot of connections. Signed-off-by: Mark Michelson <[email protected]>
Until this point, when running a qps_test, the ovn_context code would automatically take care of marking an iteration as started or ended. This is a problem, though, for tests that wait until all iterations have completed to determine if ports came up within the configured time limit. The issue is that the ovn_context will mark the iteration as completed, potentially logging the iteration as successful. However, after this has been logged, the test could mark the iteration as failed if it turns out the port did not come up within the configured time limit. The solution here is to allow for tests to override the default behavior by letting them mark the iteration as completed. To do this, the qps_test() function now accepts a kwargs parameter, and setting end_iteration=False will allow for the individual test to mark iterations as complete instead of having it done automatically by the ovn_context code. For convenience, a new all_iterations_completed() method is added to ovn_context so the tests don't have to keep track of iterations to mark them as complete. Signed-off-by: Mark Michelson <[email protected]>
Hello again. I've committed the last set of patches I planned to commit for this PR. It is stable and tests are behaving as expected. As mentioned in the previous comment, there is room for some improvement, and those recognized improvements have been opened as issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, with this, the only (mostly) reliable measurement is the wait_for_port_up
value because we use ovn-installed-ts
to determine when a port is up (I'm saying "mostly" because the start timestamp is inaccurate, it's after lsp-add
and lsp-get
were executed).
Should we emphasize this in the report somehow? Or shall we remove the other measurements?
Also, do you have some test run results on larger scale deployments (more physical machines)?
await self.run(cmd=f'ovs-vsctl --timeout={timeout} wait-until ' | ||
f'Interface {port.name} ' | ||
f'external_ids:ovn-installed=true') | ||
log.debug(f'Port {port.name} is up') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if ovs-vsctl --timeout=... wait-until ..
timed out?
ovn-tester currently works in a synchronous and serialized manner. If we are going to provision, say, 50 ports, the operation will go something like:
Repeat 50 times.
With this series, we use python's asyncio framework to create tasks that are each responsible for performing the above operations. Each task looks something like:
With asyncio, we can deploy multiple of these tasks at once, allowing the event loop to give control to the individual tasks as they have data to work with. The tests have been updated to take a "queries_per_second" option that allows for the number of tasks to create per second to be configured. By default, all the tests create 20 tasks per second.
In order to support asyncio, this also changes the SSH library from paramiko to asyncssh.
See the individual commits in this PR for explanations of all of what is happening in the code.
Some notes:
docker exec
calls at once caused the container to deadlock. I fixed this by upgrading the docker version to 20.10.8 on the server in question. ovn-heater should probably be updated to ensure that a new enough docker version is installed on the physical nodes.ovn-20.yml
scenario on a cluster, and the entire test took 20 minutes to complete. However, looking at thenetpol_multitenant
report, the sum of all iterations is 70 hours.