Skip to content

xenopsd: Don't balloon down memory on same-host migration #6437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

last-genius
Copy link
Contributor

When the VM (and its memory) isn't actually going to be moved anywhere (like in VDI migration to another SR), there's no point in ballooning down, it's actually likely to make VDI migration take longer if swap is engaged.

reducess unnecessary memory copying. *)
( try B.VM.wait_ballooning t vm
with Xenopsd_error Ballooning_timeout_before_migration -> ()
(* CA-78365: set the memory dynamic range to a single value to stop ballooning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need this part: even if you want to avoid ballooning down, you still need to stop ballooning. A VM attempting to balloon during migration won't be very healthy.

Instead we probably need to look at how much memory it is currently using, and set the balloon target to that to stop it from changing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should probably skip all this when static_min == dynamic_min == dynamic_max == static_max, which is the usual setting (I'd hope that the code here is mostly a noop in that case, but I'm not sure).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these should be done now

When the VM (and its memory) isn't actually going to be moved anywhere (like in
VDI migration to another SR), there's no point in ballooning down, it's actually
likely to make VDI migration take longer if swap is engaged. Instead change the
ballooning target to memory_actual and wait for any ballooning to be stopped.

If no ballooning could have been happening in the first place (dynamic_min =
dynamic_max = static_max), then don't do any ballooning manipulations at all.

Signed-off-by: Andrii Sultanov <[email protected]>
@last-genius last-genius force-pushed the asv/xenopsd-vdi-migration-ballooning branch from 25e4a88 to 64a5a0a Compare April 24, 2025 13:46
Copy link
Member

@robhoes robhoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but generally I think this is fine.

( if
not
(vm.memory_dynamic_min = vm.memory_dynamic_max
&& vm.memory_dynamic_max = vm.memory_static_max
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second condition is not needed: if the dynamic range is fixed, then there will not be any ballooning (the atomic VM_set_memory_dynamic_range (id, vm.Vm.memory_dynamic_min, vm.Vm.memory_dynamic_min) will not do anything).

then
(* There's no need to balloon down when doing localhost migration -
we're not copying any memory in the first place. This would
likely increase VDI migration time as swap would be engaged.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a guess or do you have evidence from tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part comes from a report on the xcp-ng Discord

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the observation on Windows VMs whilst performing VDI migrations in a production environment.

For example, on a VM with 32GB dynamic MAX and 16GB dynamic MIN with 20GB in use, the ballooning would mean waiting for 4GB to be pushed into the page file (my assumption being that this would then also mean that the changed blocks which would have to be sent to the new SR if we are migrating the disc backing the page file). The free 12GB may also have been used by the guest OS read cache and would be ejected, meaning potential subsequent reads from disc that may have been cache hits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants