Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting VMM Reservoir Takes a While - How do we cope? #5121

Closed
smklein opened this issue Feb 22, 2024 · 2 comments
Closed

Setting VMM Reservoir Takes a While - How do we cope? #5121

smklein opened this issue Feb 22, 2024 · 2 comments
Assignees
Labels
Sled Agent Related to the Per-Sled Configuration and Management virtualization Propolis Integration & VM Management

Comments

@smklein
Copy link
Collaborator

smklein commented Feb 22, 2024

See: #5116 , #5111 for context

Here are the facts

(At the time of writing this issue)

  • Setting the memory reservoir size on our production configuration, on production hardware (using 80% of usable physical DRAM) takes ~2 minutes
  • Setting the memory reservoir is currently blocking the Sled Agent from coming "fully online" (as a part of the /add-sled and RSS pathways), and being able to provision VMMs
  • VMMs do want to use memory from this reservoir, so we expect provisions on a sled to fail until the reservoir has sufficient space for them
  • Setting the reservoir is currently an "all or nothing" operation -- it does not expose any reservoir space as usable until the ioctl, requesting reservoir space, fully completes.

(Source: @jordanhendricks , thank you!)

As an aside, I just did a small experiment on a commodity box. In one terminal, I added ~100 GiB of memory to the VMM reservoir. This took about 13 seconds. In another pane, I queried the reservoir with rsrvrctl -q every second, and it didn't report anything reserved until the add ioctl completed. So maybe some more exploration to do that as to whether it can be queried when the add ioctl is in progress.

Here are the opinions

There are a few different axes to consider here: What gets blocked, when do we adjust the reservoir, and how long does should it take?

  • Who gets blocked? Right now, reservoir creation is performed synchronously as a part of Sled Agent bring-up. This ensures that "when VMMs are requested", they can use RAM from the reservoir, but this isn't a necessary pre-requisite for the sled agent to come up and expose and HTTP server. It would be possible to run the reservoir allocation work in a background task, to allow sled agent to initialize before VMMs actually get placed on sleds. Even in this case, however, although other services could run while the reservoir re-allocation is in-progress, VMM provisions would fail until this operation completes successfully.
  • When do we actually do the reservoir allocation? (and a related question: does reservoir re-sizing need to take that long?) This needs additional profiling, but current suspicion is that reservoir allocation requires a fair amount of memory re-shuffling, as it is done post-boot, when other services are up-and-running using storage. We could look into optimizing the reservoir allocation, or performing that request to do re-allocation at a different time when the system is less loaded. As one example, the OS could set a default reservoir size on boot, before userspace is fully initialized, which would hopefully make the system memory allocator responsible for re-shuffling fewer pages.
  • Whatever we do, does it make sense in a world where Nexus can adjust the reservoir later? Regardless of the decision we end up making, it must be valid in a scenario where Nexus can decide to change the amount of reservoir space allocated to individual sleds after the boot process has completed, and the Sled Agent HTTP server has come up.
@smklein smklein changed the title Setting Reservoir Takes a While - How do we cope? Setting VMM Reservoir Takes a While - How do we cope? Feb 22, 2024
@smklein smklein added Sled Agent Related to the Per-Sled Configuration and Management virtualization Propolis Integration & VM Management labels Feb 22, 2024
@andrewjstone
Copy link
Contributor

FYI: I'm working on putting the reservoir creation in a background task right now, and providing an API to modify and inspect it as needed.

@andrewjstone
Copy link
Contributor

I think we can probably close this out now that #5124 is in. We no longer block sled-agent boot and VMs are not eligible for deployment until the reservoir is allocated. We have not lost the ability to change the reservoir size from Nexus, although that has never been implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sled Agent Related to the Per-Sled Configuration and Management virtualization Propolis Integration & VM Management
Projects
None yet
Development

No branches or pull requests

2 participants