Setup the VMM reservoir in a background task #5124

andrewjstone · 2024-02-22T22:45:42Z

Fixes the first part of #5121

sled-agent/src/instance_manager.rs

andrewjstone · 2024-02-22T23:02:52Z

We almost certainly want to test this on real hardware and try provisioning a few instances.

gjcolombo

I read the Matrix discussion that led up to this change, but I'm not sure I understand where it all landed, so forgive me if I missed something: does this change need some extra synchronization in the InstanceManager to ensure that the reservoir is actually allocated before trying to start any instances on this sled? ISTM that absent any other synchronization, you could hit the following race:

Sled agent starts up, creates the reservoir manager
Reservoir manager spawns its background task thread, but for some reason this doesn't get scheduled right away
Sled agent notifies Nexus that this sled is ready to rock and roll
Nexus allocates a new VM to this sled
VM startup fails because the new Propolis sees that there's no reservoir space allocated (because the background thread never got a chance to run)

Assuming this is a real issue, I think one easy way to work around this would be to have the manager include a tokio::sync::watch that indicates whether the reservoir thread has completed its initial reservoir setup; anyone who wants to create an instance can use watch::Receiver::wait_for to make sure the reservoir is properly established before continuing.

Again, forgive my ignorance if there's another plan of record here or if I've overlooked something--just wanted to make sure this scenario was accounted for :)

sled-agent/src/vmm_reservoir.rs

andrewjstone · 2024-02-23T23:34:33Z

I read the Matrix discussion that led up to this change, but I'm not sure I understand where it all landed, so forgive me if I missed something: does this change need some extra synchronization in the InstanceManager to ensure that the reservoir is actually allocated before trying to start any instances on this sled? ISTM that absent any other synchronization, you could hit the following race:

Sled agent starts up, creates the reservoir manager

Reservoir manager spawns its background task thread, but for some reason this doesn't get scheduled right away

Sled agent notifies Nexus that this sled is ready to rock and roll

Nexus allocates a new VM to this sled

VM startup fails because the new Propolis sees that there's no reservoir space allocated (because the background thread never got a chance to run)

Assuming this is a real issue, I think one easy way to work around this would be to have the manager include a tokio::sync::watch that indicates whether the reservoir thread has completed its initial reservoir setup; anyone who wants to create an instance can use watch::Receiver::wait_for to make sure the reservoir is properly established before continuing.

Again, forgive my ignorance if there's another plan of record here or if I've overlooked something--just wanted to make sure this scenario was accounted for :)

I was actually planning on doing exactly this before opening the code but forgot. I wasn't sure where to use it exactly, but presumably it should be done for all instance launches to keep the semantics we have today, right?

andrewjstone · 2024-02-23T23:34:47Z

Thanks so much for the great reviews @jgallagher @gjcolombo

gjcolombo · 2024-02-24T00:18:35Z

I wasn't sure where to use it exactly

The reservoir needs to be ready to go no later than the Propolis instance_ensure call in sled_agent::InstanceRunner::propolis_ensure_inner. You can do the check anywhere along the path that leads to that function, but I would at least try to get it into the InstanceRunner logic somewhere if possible, since that's where all the other interesting decisions about when and how to talk to Propolis happen.

but presumably it should be done for all instance launches to keep the semantics we have today, right?

Yeah, as things currently stand this needs to happen for all attempts to ask Propolis to start a new VM.

For background, today the sled allocation flow works like this:

Every sled has a reservoir_ram column in the sled table that gives the total size of its reservoir
Every running VMM has a row in sled_resource that identifies what sled it belongs to and how much reservoir space it's using
When you ask to start an instance, the start saga atomically selects a sled that can fit the instance and writes a new sled_resource entry for the new VMM. (IIRC, more specifically the query is a CTE that finds all the sleds for which "sled's current reservoir usage" + "new instance's reservoir usage" <= "sled's total reservoir size", picks one at random, and then uses that as the sled ID in a new row in sled_resource.)

So, if you're starting up an instance on a sled, it's because Nexus has already determined that there's going to be space there--there's not really a back-and-forth between Nexus and the sleds to determine who might currently have enough space, whose reservoir is ready, etc.

We could change this over time if we wanted (e.g. have the sleds periodically post their current reservoir sizes so that Nexus will only try to allocate to a sled once it has room; but then you have to make sure that sleds won't shrink the reservoir without explicitly being asked, etc.). But for now it's simplest just to wait for the reservoir to reach the correct size.

One additional case to consider here is what to do if a sled fails to reach the expected reservoir size. If the reservoir is just smaller than expected, then starting the VM might work anyway, so maybe we should try it? But eventually it's probably best to tell Nexus what the actual reservoir size ended up being, so that it doesn't try to send VMs to a sled where we know they're not going to be able to run.

andrewjstone · 2024-02-24T00:34:08Z

So, if you're starting up an instance on a sled, it's because Nexus has already determined that there's going to be space there--there's not really a back-and-forth between Nexus and the sleds to determine who might currently have enough space, whose reservoir is ready, etc.

This makes sense, and follows how our system works elsewhere. I don't think theres a real need to change it unless I"m missing something else.

One additional case to consider here is what to do if a sled fails to reach the expected reservoir size. If the reservoir is just smaller than expected, then starting the VM might work anyway, so maybe we should try it? But eventually it's probably best to tell Nexus what the actual reservoir size ended up being, so that it doesn't try to send VMs to a sled where we know they're not going to be able to run.

I definitely thought about this, and then .. unceremoniously punted... In the current main code there is only the onetime allocation, and if it fails it returns an error and sled-agent fails to start. With this change we log it, but don't fail to start. Presumably we could panic, but there should be a way to alert nexus instead. I don't think we are much worse off this way.

For the new API that allows changing the size of the reservoir, it is synchronous, and failure is returned, although its not currently in use anywhere.

gjcolombo · 2024-02-24T01:09:38Z

This makes sense, and follows how our system works elsewhere. I don't think theres a real need to change it unless I"m missing something else.

Yep, I think you're right. I was spitballing a bit based on some of the discussion I saw in chat (but didn't clearly say so). We can (and should, I think) keep this behavior for this PR.

I definitely thought about this, and then .. unceremoniously punted... In the current main code there is only the onetime allocation, and if it fails it returns an error and sled-agent fails to start. With this change we log it, but don't fail to start. Presumably we could panic, but there should be a way to alert nexus instead. I don't think we are much worse off this way.

If I'm reading things right, if you try to start an instance on a sled where this has happened, the start sequence will fail in the propolis_ensure_inner step (Propolis will get ENOSPC back from bhyve when trying to allocate memory for the guest and will return an error to sled agent, and it'll bubble up from there). I'm 90% sure that will cause your start saga to unwind and your start attempt to fail, though with what external error code I'm not at all certain.

To be honest, I think this is materially worse (ETA: in degree, though maybe not in overall pain caused, see below) than just having the sled agent panic. If a sled never registers itself with Nexus, Nexus will never try to schedule an instance there. But if you have a sled that's in this state, and it's a viable allocation target, users might find that their instances randomly fail to start sometimes, that trying to start them again works sometimes (if the Wheel of Sleds lands on a different, non-broken sled), and that at other times the instance never starts (if the only available sleds are broken ones).

That said, if the probability of this is very low, it's probably not worth holding this change for it. But I think we should at least put a TODO(#5121) comment on the "failed to allocate reservoir" code path as a reminder to revisit this (and we should make a note in that issue of how we plan to address this once we have that plan).

andrewjstone · 2024-02-26T19:12:54Z

To be honest, I think this is materially worse (ETA: in degree, though maybe not in overall pain caused, see below) than just having the sled agent panic. If a sled never registers itself with Nexus, Nexus will never try to schedule an instance there. But if you have a sled that's in this state, and it's a viable allocation target, users might find that their instances randomly fail to start sometimes, that trying to start them again works sometimes (if the Wheel of Sleds lands on a different, non-broken sled), and that at other times the instance never starts (if the only available sleds are broken ones).

@gjcolombo I thought more about this over the weekend, and it really feels to me like we should separate the desired state of the reservoir from the actual state of the reservoir. There is inherently a TOCTOU in setting it from nexus, and it being reflected in the sled-agent. If we switch to the strategy you mentioned of having sled-agent notify nexus when the size changes and only allow nexus to change it, then we would get rid of this issue altogether. It feels like we have to do this anyway if we want to allow changing the reservoir at runtime from Nexus. Otherwise we'd run into the same issue here where the reservoir allocation could fail causing the instance start to fail.

If we did things this way, it means that Nexus would get notified at start up of sled-agent having a zero size reservoir, but the default desired state of 80%. When allocation was done, sled-agent would notify nexus of the condition or a failure to allocate. I think it would be useful to encode the failure in CRDB, for lack of a better place right now, but I'm not sure if that would be universally agreed upon. I think this would effectively make the external client behavior what it is now with regards to instance allocation, at the cost of more complexity. I'm more than happy to make this change in a follow up if we agree.

andrewjstone · 2024-02-26T19:20:56Z

To be honest, I think this is materially worse (ETA: in degree, though maybe not in overall pain caused, see below) than just having the sled agent panic. If a sled never registers itself with Nexus, Nexus will never try to schedule an instance there. But if you have a sled that's in this state, and it's a viable allocation target, users might find that their instances randomly fail to start sometimes, that trying to start them again works sometimes (if the Wheel of Sleds lands on a different, non-broken sled), and that at other times the instance never starts (if the only available sleds are broken ones).

@gjcolombo I thought more about this over the weekend, and it really feels to me like we should separate the desired state of the reservoir from the actual state of the reservoir. There is inherently a TOCTOU in setting it from nexus, and it being reflected in the sled-agent. If we switch to the strategy you mentioned of having sled-agent notify nexus when the size changes and only allow nexus to change it, then we would get rid of this issue altogether. It feels like we have to do this anyway if we want to allow changing the reservoir at runtime from Nexus. Otherwise we'd run into the same issue here where the reservoir allocation could fail causing the instance start to fail.

If we did things this way, it means that Nexus would get notified at start up of sled-agent having a zero size reservoir, but the default desired state of 80%. When allocation was done, sled-agent would notify nexus of the condition or a failure to allocate. I think it would be useful to encode the failure in CRDB, for lack of a better place right now, but I'm not sure if that would be universally agreed upon. I think this would effectively make the external client behavior what it is now with regards to instance allocation, at the cost of more complexity. I'm more than happy to make this change in a follow up if we agree.

Hmm, Literally right after I wrote this comment out, I realized the reservoir has to be reallocated on each sled-agent startup. This is a particular problem on sled reboot. Don't we already have the problem after RSS where Nexus knows of the sled and will try to allocate instanced to it even if it restarts and the allocation fails? In that case, with our current code, sled-agent will be down due to the failed allocation and will not respond to requests. With the new code we'd get a different level of failure. But Nexus would still try to allocate to that sled in either case due to lack of fault detection machinery.

gjcolombo · 2024-02-26T19:53:11Z

@andrewjstone I agree with both your comments. It definitely makes sense to me to decouple "this is the size Nexus thinks the reservoir should be given what the sled has said about its physical memory capacity" from "this is the last reservoir size sled agent reported" and have instance placement decisions depend only the latter figure.

In that case, with our current code, sled-agent will be down due to the failed allocation and will not respond to requests. With the new code we'd get a different level of failure. But Nexus would still try to allocate to that sled in either case due to lack of fault detection machinery.

This seems right to me. In fact I'd say these are both specific cases of a more general problem where Nexus thinks that some sled is a viable host for an instance, but in fact it's not for $REASONS. We should certainly handle this problem more robustly, but I don't think I'd make splitting the desired/actual reservoir states contingent on fixing it, unless I've missed something and splitting those responsibilities will make that problem significantly worse.

andrewjstone · 2024-02-26T20:02:56Z

@andrewjstone I agree with both your comments. It definitely makes sense to me to decouple "this is the size Nexus thinks the reservoir should be given what the sled has said about its physical memory capacity" from "this is the last reservoir size sled agent reported" and have instance placement decisions depend only the latter figure.

YAY!!!

This seems right to me. In fact I'd say these are both specific cases of a more general problem where Nexus thinks that some sled is a viable host for an instance, but in fact it's not for $REASONS. We should certainly handle this problem more robustly, but I don't think I'd make splitting the desired/actual reservoir states contingent on fixing it, unless I've missed something and splitting those responsibilities will make that problem significantly worse.

Totally agree this seems like a more general problem, and the split won't make it any worse. I just wanted to point it out to show that we are already kinda screwed but can improve incrementally and fix what you mentioned.

smklein · 2024-02-26T20:46:03Z

I realized the reservoir has to be reallocated on each sled-agent startup. This is a particular problem on sled reboot. Don't we already have the problem after RSS where Nexus knows of the sled and will try to allocate instanced to it even if it restarts and the allocation fails?

I think it's possible for sleds to disappear at any time, from Nexus' point-of-view -- I think the concern I have is "could this request appear to succeed, but actually fail later because of a condition that we aren't able to overcome (reservoir allocation failure).

Namely, if we return "instance launched successfully", but we know we're over-provisioning the reservoir, that seems like it'll just make a large gap between "expected invariant violated" (overprovisioning) and "observed misbehavior" (misbehaving behavior in guest, due to insufficient memory, maybe).

In contrast, if we fail to put an instance on a sled because the sled is offline we at least can unwind the stack, give a reason to the user ("sled didn't respond!") and retry somewhere else. That at least gives an operator a chance to intervene, and IMO doesn't present as a harder-to-understand error later due to violated system constraints.

smklein · 2024-02-26T20:40:10Z

sled-agent/src/vmm_reservoir.rs

+
+    /// Return a [`tokio::sync::watch::Receiver`] whose value changes when the
+    /// reservoir size is updated.
+    #[allow(unused)]


This is used now, right?

smklein · 2024-02-26T21:05:44Z

sled-agent/src/instance.rs

@@ -1179,6 +1189,20 @@ impl InstanceRunner {
        &mut self,
        migration_params: Option<InstanceMigrationTargetParams>,
    ) -> Result<(), Error> {
+        // Wait for enough allocated reservoir space for at least this VM.


I'm a little concerned by this, since it feels kinda like a partial solution? Yes, this would be correct if we only provisioned a single VM on this sled, but that case seems unlikely to me.

Should we just wait for the entire reservoir to be "provisioned according to what Nexus expects", for now?

Example with this "wait for single VM's worth of RAM" case that I think would be bad

Sled Agent boots, starts prepping reservoir (slowly). Assume total reservoir size of 1 TiB.

Nexus tries to provision 50 VMs to this sled. Each wants to use 10 GiB of RAM.

At the time Nexus sends the "instance ensure" requests to the sled agent, the reservoir is only 1% set-up. There's 10 GiB RAM total in there.

This check succeeds!

In this case: we over-provision the sled! The VMs might misbehave, since they're operating with a reservoir that doesn't have enough space.

If we're concerned about Nexus being able to successfully provision instances to sleds quickly -- this can still fail! From a user's perspective, this would look like "I tried to start a VM, Nexus thought it was ready, the sled said no, so my allocation failed". So it's kinda a roll of the dice on "does my request fail because nexus was too eager", or "does my request succeed but violate system variants because sled agent is too eager".

Alternatives?

There seem like a few other options here that avoid this issue, but they have other tradeoffs.

Option 1: The Instance Manager refuses to provision VMs until the reservoir is "at least the size that Nexus expected it to be". This means that we'd never overprovision, but we would make it more likely for instance ensure requests to fail during boot. This could be mitigated in a few ways - blocking the instance provision request until the reservoir is sized, using HTTP error codes and a retry loop during Nexus instance provision saga, etc.

Option 2: The "Instance Manager" part of Sled Agent refuses to be online until it gets this sizing information from Nexus, and as a part of that handshake, Nexus updates how it perceives the Sled. The interaction could look like:

Sled Agent -> Nexus: I'm booting, what size should my reservoir be?

Nexus -> Sled Agent: It should be XXX % of your RAM. I'm going to flag you as booting (temporarily non-provisionable).

Sled Agent -> Nexus: My reservoir is ready! (Flag self as "no longer booting" for provisioning).

There are certainly other options in this space too. But I think that we should be very careful to avoid overprovisioning, even in boot-only edge cases, if that's a system property we claim to have.

I'm a little concerned by this, since it feels kinda like a partial solution? Yes, this would be correct if we only provisioned a single VM on this sled, but that case seems unlikely to me.

Should we just wait for the entire reservoir to be "provisioned according to what Nexus expects", for now?

I treated this check as a proxy for the entire reservoir being ready. The reservoir doesn't report getting partially filled right now and the size only gets set once at RSS time. However, I fully agree with what you said that this is only a partial solution and Nexus can increase the reservoir size. The old size may still be big enough for each individual new instance, but not big enough for all instances intended to run on the sled. Your option 1, would fix this immediate overprovisioning problem. This is also the reason I proposed splitting the current reservoir allocation into "what Nexus wants the next reservation to be" and "what the sled-agent tells it it has allocated". If the real change in size hasn't reached Nexus yet, then nexus won't provision any extra VMs. Nexus will always know how much total space it has and what it's capable of provisioning. This is similar to your option 2.

Unfortunately, the above solution that splits between desired and actual states of the reservoir doesn't work when a sled is rebooted, as reservoir allocation may fail on the reboot and nexus will not know about it until after it tries to allocate a VM and it fails because the reservoir hasn't been allocated. I think this is also a problem with Option 2, as there is a TOCTOU where the sled-agent can say "hey I'm ready", and then crash and restart and fail allocation. All in all, I think I prefer option 1.

FWIW, I don't think sled agent will end up overprovisioning memory for VMs if an instance start request arrives while the reservoir is smaller than Nexus expects: if I understand bhyve's behavior correctly, attempting to start a VM with insufficient memory in the reservoir will fail outright and won't use any non-reservoir memory. The instance has still failed to start, of course, and we have to decide how to deal with that, but I don't think the change as written will overcommit sled memory.

Gotcha, okay, that's good to know! The problem still exists in the form of "will this instance provisioning request fail in a visible way for a client", but that's good that we wouldn't end up violating that constraint either way.

…ances" This reverts commit 710b0ce.

andrewjstone · 2024-02-27T18:47:05Z

I'm changing a bunch of stuff related to pushing notifications to nexus, so turning this back into a draft temporarily so I can push up wip changes.

This is broken. pushing up for discussion.

andrewjstone · 2024-03-05T20:03:15Z

I believe this PR is ready for a real review now. It still needs testing on real hardware, and also, maybe a nexus test to ensure sled-picking behavior changes. Looking at the code though, I can't see how it won't do the right thing, and this PR is getting relatively large already.

andrewjstone · 2024-03-06T19:22:11Z

John and I tested this on madrid. We see the generation number getting bumped as expected and the reservoir starting at size 0 and then increasing in value after 2 minutes. Instances deployed to the sled fine after that.

gjcolombo

This looks great--thanks for putting the work in on the notification worker (and for taking the time to discuss it with me on Matrix!). I have a few small nits/questions but LGTM overall.

nexus/db-model/src/sled.rs

nexus/db-queries/src/db/datastore/sled.rs

sled-agent/src/nexus.rs

sled-agent/src/vmm_reservoir.rs

andrewjstone · 2024-03-07T00:42:36Z

A couple of things still to do after my most recent changes before I merge this in. I added a new column to the sled-table, so I need to test again on madrid. Ideally I'd test an upgrade to make sure nothing breaks, followed by the same testing done earlier today.

Also, I changed the code to in sled_upsert in Nexus to error on a decommissioned sled, but the code in sled-agent will keep trying and erroring currently. I need to ensure on an error, after a Get request for SledAgentInfo that the sled-agent just turns itself into a pumpkin.

andrewjstone · 2024-03-07T06:31:49Z

Sled-agent now handles decommissioned sleds in the NexusNotifierTask. Ready for testing again.

andrewjstone · 2024-03-07T21:37:40Z

John and I successfully tested all the changes. Hopefully I'll get this built and committed before yet another merge conflict.

Setup the VMM reservoir in a background task

1d0ed8c

Fixes the first part of #5121

andrewjstone requested review from davepacheco, smklein and jordanhendricks February 22, 2024 22:45

andrewjstone commented Feb 22, 2024

View reviewed changes

sled-agent/src/instance_manager.rs Show resolved Hide resolved

comments

bbbe079

gjcolombo reviewed Feb 22, 2024

View reviewed changes

sled-agent/src/vmm_reservoir.rs Outdated Show resolved Hide resolved

sled-agent/src/vmm_reservoir.rs Outdated Show resolved Hide resolved

jgallagher reviewed Feb 23, 2024

View reviewed changes

sled-agent/src/vmm_reservoir.rs Outdated Show resolved Hide resolved

sled-agent/src/vmm_reservoir.rs Outdated Show resolved Hide resolved

sled-agent/src/vmm_reservoir.rs Outdated Show resolved Hide resolved

review fixes

bdea2f6

Wait for VMM reservoir allocation when ensuring propolis instances

710b0ce

smklein reviewed Feb 26, 2024

View reviewed changes

andrewjstone added 2 commits February 27, 2024 15:54

Revert "Wait for VMM reservoir allocation when ensuring propolis inst…

233f2d1

…ances" This reverts commit 710b0ce.

remove watch from vmm_reservoir

bedf0c1

andrewjstone marked this pull request as draft February 27, 2024 18:48

davepacheco assigned andrewjstone Feb 27, 2024

andrewjstone added 3 commits February 27, 2024 21:40

Add sled_agent_get to nexus internal API

7bdf2e3

wip

7c4a2de

wip

58a84fa

extra rcgen check

0488749

andrewjstone marked this pull request as ready for review March 1, 2024 23:54

andrewjstone marked this pull request as draft March 1, 2024 23:58

andrewjstone added 6 commits March 4, 2024 18:01

plumb through reservoir size changes

8a7d297

typo

f379afb

Merge branch 'main' into ajs/vmm-reservoir-bg-thread

9c130e2

This is broken. pushing up for discussion.

Return an result for sled upserts, with error for decommissioned

f49208e

fix warnings

5b1914b

clippy

e5795c9

andrewjstone marked this pull request as ready for review March 5, 2024 20:01

Merge branch 'main' into ajs/vmm-reservoir-bg-thread

8d1ae0c

jgallagher mentioned this pull request Mar 6, 2024

sled table rcgen has higher-than-expected numbers #5207

Closed

gjcolombo approved these changes Mar 6, 2024

View reviewed changes

andrewjstone added 2 commits March 7, 2024 00:12

Give sled-agent its own generation number

0f904f4

review fixes

4fd1afd

andrewjstone added 2 commits March 7, 2024 05:27

Merge branch 'main' into ajs/vmm-reservoir-bg-thread

edbf0a0

Handle sled decommissioning

5b01ef0

andrewjstone added 4 commits March 7, 2024 18:19

Merge branch 'main' into ajs/vmm-reservoir-bg-thread

332832e

type fix for Baseboard

e9167a8

COMMENTS

c611a81

Merge branch 'main' into ajs/vmm-reservoir-bg-thread

9027e5b

andrewjstone enabled auto-merge (squash) March 7, 2024 22:41

andrewjstone merged commit 87c9b13 into main Mar 7, 2024
22 checks passed

andrewjstone deleted the ajs/vmm-reservoir-bg-thread branch March 7, 2024 22:57

andrewjstone mentioned this pull request Mar 8, 2024

Setting VMM Reservoir Takes a While - How do we cope? #5121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup the VMM reservoir in a background task #5124

Setup the VMM reservoir in a background task #5124

andrewjstone commented Feb 22, 2024

andrewjstone commented Feb 22, 2024

gjcolombo left a comment

andrewjstone commented Feb 23, 2024

andrewjstone commented Feb 23, 2024

gjcolombo commented Feb 24, 2024

andrewjstone commented Feb 24, 2024

gjcolombo commented Feb 24, 2024 •

edited

Loading

andrewjstone commented Feb 26, 2024 •

edited

Loading

andrewjstone commented Feb 26, 2024

gjcolombo commented Feb 26, 2024

andrewjstone commented Feb 26, 2024

smklein commented Feb 26, 2024

smklein Feb 26, 2024

smklein Feb 26, 2024 •

edited

Loading

andrewjstone Feb 26, 2024

gjcolombo Feb 26, 2024

smklein Feb 26, 2024

andrewjstone commented Feb 27, 2024

andrewjstone commented Mar 5, 2024

andrewjstone commented Mar 6, 2024

gjcolombo left a comment

andrewjstone commented Mar 7, 2024

andrewjstone commented Mar 7, 2024

andrewjstone commented Mar 7, 2024

Setup the VMM reservoir in a background task #5124

Setup the VMM reservoir in a background task #5124

Conversation

andrewjstone commented Feb 22, 2024

andrewjstone commented Feb 22, 2024

gjcolombo left a comment

Choose a reason for hiding this comment

andrewjstone commented Feb 23, 2024

andrewjstone commented Feb 23, 2024

gjcolombo commented Feb 24, 2024

andrewjstone commented Feb 24, 2024

gjcolombo commented Feb 24, 2024 • edited Loading

andrewjstone commented Feb 26, 2024 • edited Loading

andrewjstone commented Feb 26, 2024

gjcolombo commented Feb 26, 2024

andrewjstone commented Feb 26, 2024

smklein commented Feb 26, 2024

smklein Feb 26, 2024

Choose a reason for hiding this comment

smklein Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

Example with this "wait for single VM's worth of RAM" case that I think would be bad

Alternatives?

andrewjstone Feb 26, 2024

Choose a reason for hiding this comment

gjcolombo Feb 26, 2024

Choose a reason for hiding this comment

smklein Feb 26, 2024

Choose a reason for hiding this comment

andrewjstone commented Feb 27, 2024

andrewjstone commented Mar 5, 2024

andrewjstone commented Mar 6, 2024

gjcolombo left a comment

Choose a reason for hiding this comment

andrewjstone commented Mar 7, 2024

andrewjstone commented Mar 7, 2024

andrewjstone commented Mar 7, 2024

gjcolombo commented Feb 24, 2024 •

edited

Loading

andrewjstone commented Feb 26, 2024 •

edited

Loading

smklein Feb 26, 2024 •

edited

Loading