-
Notifications
You must be signed in to change notification settings - Fork 6
Add DMC tests and extra Windows guest tool tests #348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
0cf55ea
to
338b0b1
Compare
295f52d
to
629017e
Compare
Added a rework of |
snapshot.revert() | ||
|
||
|
||
@pytest.mark.small_vm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to also test it with a variety of VMs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I can mark it multi_vms
. But I'm not sure if all of these VMs properly support ballooning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RHEL 8 & 9 ones don't, if I remember correctly. Can we detect that and skip for uncooperative VMs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check for other/feature-balloon
. However, this check won't work correctly on Linux VMs and current XCP-ng WinPV VMs, since in both cases the guest agent insists on setting feature-balloon regardless of driver support. I've added a fix in the WinPV guest agent, but due to this issue, I'll leave it marked small_vm for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often does the guest agent currently set feature-balloon despite drivers not being there? Is this a realistic scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that the case with the "recent" RHEL guests mentioned above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, unfortunately this is a problem with all Linux guests. The Linux balloon driver doesn't set feature-balloon, so it's up to the guest agent to do that. I don't know if there's a way to check if the balloon driver is enabled, but at least the Rust agent doesn't do any such checks.
On this the Rust agent just mimicked what the XS one does. Worth a Plane (+gitlab?) ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plane card created. OTOH Gitlab ticket can't really be created (IMO) until the current refactor situation is sorted out.
vm.suspend() | ||
wait_for_vm_running_and_ssh_up_without_tools(vm) | ||
|
||
def test_toggle_device_id(self, running_unsealed_windows_vm: VM, guest_tools_iso: dict[str, Any]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the objective of this test? I understand we want to make sure the VM still boots after changing the device ID, but why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a test of our driver, which after the unplug rework must remain activated even if the device ID changes. It also serves as a proxy for device ID changes if the Windows Update option was toggled. It's not an exact reproduction of the situation, but since we don't yet support the C200 device, it's good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment above the test function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also moved the device ID assert up one line.
b1670f7
to
e58b6c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work - both with the new tests and fixing up old tests to be more reliable. Looks good to me from the xapi point of view as a starting point for DMC testing.
def test_dmc_suspend(self, vm_with_memory_limits: VM): | ||
"""Suspend a VM with DMC enabled.""" | ||
vm = vm_with_memory_limits | ||
self.start_dmc_vm(vm) | ||
vm.set_memory_target(MEMORY_TARGET_LOW) | ||
wait_for_vm_balloon_finished(vm) | ||
vm.suspend(verify=True) | ||
vm.resume() | ||
vm.wait_for_vm_running_and_ssh_up() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the tests here set dynamic-min and dynamic-max to be the same value (oscillating between LOW and HIGH), that's what set_memory_target
does. Do we plan on having tests with dynamic-min set lower than dynamic-max (not in this PR, but in the future)? Would be great to test how squeezed redistributes memory between VMs dynamically/how VMs are ballooned down to dynamic-min on migrations (but not on "localhost migrations" anymore).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how such scenarios will behave (i.e. what should we test?) so I'll need your input on that.
snapshot.revert() | ||
|
||
|
||
@pytest.mark.small_vm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often does the guest agent currently set feature-balloon despite drivers not being there? Is this a realistic scenario?
e58b6c2
to
588f3ae
Compare
Yes, unfortunately this is a problem with all Linux guests. The Linux balloon driver doesn't set feature-balloon, so it's up to the guest agent to do that. I don't know if there's a way to check if the balloon driver is enabled, but at least the Rust agent doesn't do any such checks. |
588f3ae
to
0195214
Compare
Backed out the |
|
||
|
||
@pytest.fixture(scope="module") | ||
def imported_vm_and_snapshot(imported_vm: VM): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-obvious fixture needs docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
def wait_for_vm_balloon_finished(vm: VM): | ||
memory_target = int(vm.param_get("memory-target")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seem to be possibly subject to a race condition: nothing seems to ensure that the param cannot get changed behind the test's back and that we indeed get the expected value here. Looks like that target should rather be passed as a parameter to the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the parameter change behind the test's back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, my comment was not 100% on the spot. But this parameter is RO, so likely based on the dynamic ranges, so the race is rather "how can we be sure that parameter has been set to the value we should be expecting?"
Intuitively, I would expect that the target would be set by squeezed
to ensure the dynamic aspect of things - if that's right, it would even be expected that it changes behind our back.
But then, in the existence of vm-memory-target-set
raises doubts on my interpretation above.
On a different note, vm-memory-target-wait
, would look like a candidate for replacing DmcMemoryTracker
?
I was not able to locate a dedicated doc for the DMC feature, so maybe we need that at some point, and in the meantime I guess more explanations about what we expect and test would help understanding this PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed that's why vm-memory-target-set
was used, and why I wasn't sure of how to test the situation where dynamic-min and dynamic-max are different. (vm-memory-target-set
, despite the name, sets both dynamic-min and dynamic-max)
vm-memory-target-wait
looks interesting, but it doesn't have a way to bail out. I'm not sure how it reports failure either. Could you give me a quick explanation of how it works, @last-genius? I can either use it directly or replicate its logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It waits for abs (memory_actual - memory_target) <= tolerance
for up to 256 seconds, where tolerance=1MB. Sadly it doesn't have a way to provide the timeout or tolerance parameters, but I can add that if you want to.
The errors it reports are VM_MEMORY_TARGET_WAIT_TIMEOUT
and TASK_CANCELLED
(which is how you can cancel any task, with xe task-cancel
, but it's pretty awkward with xe, much easier with the API directly).
I also wonder why vm-memory-target-wait
is hidden from the CLI help (so it's not autocompleted 🤔 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I've opted to replicate the logic you described in DmcMemoryTracker.
def test_drivers_detected(self, vm_install_test_tools_per_test_class: VM): | ||
def test_vif_replug(self, vm_install_test_tools_per_test_class: VM): | ||
vm = vm_install_test_tools_per_test_class | ||
assert vm.are_windows_tools_working() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to have that assert
systematically inside the vm_install_test_tools_per_test_class
fixture? I think it would help by having the test in ERROR in case that happens, and not even starting, instead of going FAIL later when the problem is not really with what the tests checks.
Then maybe for test_drivers_detected
an _unchecked
version of the fixture would help so that one test does go FAIL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like a very roundabout solution for little gain.
vifs = vm.vifs() | ||
for vif in vifs: | ||
vif.unplug() | ||
# HACK: Allow some time for the unplug to settle. If not, Windows guests have a tendency to explode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a ticket for that explosion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there isn't one, only a problem revealed during debugging. It's already being tracked internally.
0195214
to
24926b1
Compare
If CACHE_IMPORTED_VM is specified, the source VM is unconditionally cloned, even if it was referred to by UUID. Clean that up during teardown. Signed-off-by: Tu Dinh <[email protected]>
Otherwise you can't pass a dict[str, str] to host.xe, as mypy complained here: lib/vm.py:875: error: Argument 2 to "xe" of "Host" has incompatible type "dict[str, str]"; expected "dict[str, str | bool]" [arg-type] lib/vm.py:875: note: "dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance lib/vm.py:875: note: Consider using "Mapping" instead, which is covariant in the value type Signed-off-by: Tu Dinh <[email protected]>
Signed-off-by: Tu Dinh <[email protected]>
60323b0
to
7271c97
Compare
These tests verify a VM's responsiveness to memory target changes, and checks for several suspend bugs when DMC is enabled. Signed-off-by: Tu Dinh <[email protected]>
Signed-off-by: Tu Dinh <[email protected]>
Signed-off-by: Tu Dinh <[email protected]>
Remove duplicate test_tools_after_reboot which was no longer used. Reenable upgrade tests. Add suspend test with emulated NVMe. Add device ID toggle test. Add VIF replug test. Signed-off-by: Tu Dinh <[email protected]>
7271c97
to
8027905
Compare
Add "Clean up cached VM even if specified from UUID", which changes how VMs are cleaned up if specified by UUID.
Aside from the new DMC tests, the Windows tests were also enhanced with some tests that were previously failure-prone (upgrades, suspend with emulated NVMe, device ID changes, VIF unplug)
Requires WinPV 9.0.9135 or later.