trigger inventory collection after blueprint execution #5130

davepacheco · 2024-02-23T19:09:41Z

This came up in the context of #5111 and here in particular:

FWIW, there were 2 periods where I ended up waiting on a collection:
* After add sled, as discussed above and we now understand

* In between making the blueprint that added an NTP zone to the new sled and regenerating the next blueprint
In the second case, even though the sled had already started the NTP zone and successfully timesync'd, I was still waiting for an inventory collection because the planner won't add anything else to the sled until the latest collection shows the NTP zone. I'm not sure where we could trigger a collection here, though - if sled-agent told us in the response whether the PUT /omicron-zones actually started new zones, could the reconfigurator task trigger a collection if new zones were added?

I figured that more generally, any time we execute a blueprint, successfully or otherwise, we may have made some changes and we ought to re-collect inventory.

jgallagher · 2024-02-23T19:22:17Z

nexus/src/app/background/blueprint_execution.rs

@@ -71,6 +78,10 @@ impl BackgroundTask for BlueprintExecutor {
            )
            .await;

+            // Trigger anybody waiting for this to finish.
+            self.count = self.count + 1;
+            self.tx.send(self.count);


I'll take "watch is hard to use correctly for $200, Alex"; this should probably be send_replace or send_modify so it doesn't spuriously fail if there are no subscribers.

Actually, if we used send_modify could we drop self.count altogether?

self.tx.send_modify(|count| *count += 1)

Fixed in 70dfed3

jgallagher · 2024-02-23T19:29:02Z

nexus/src/app/background/init.rs

+                config.inventory.period_secs,
+                Box::new(collector),
+                opctx.child(BTreeMap::new()),
+                vec![Box::new(rx_blueprint_exec)],


I think we currently have blueprint exec set to every 60s, and inventory collection set to every 600s. Is this going to 10x the inventory collection frequency?

If a target is set, yes. You could vie the 600s after this as "if there's nothing going on, collect at least this often".

I imagine we'll want to revisit these triggers and timeouts when we get to fully automating this but I think this is okay for now.

Even as we're doing this by hand, we'll go from 600s to 60s as soon as we set a target, but it will never go back, right? Once we've set a target there will always be a target.

I guess what I'm kinda wishing for here is "only trigger the collection if blueprint realization actually changed something", but that might be quite difficult.

Agreed, but yeah, that seems hard to know. Certainly if we get almost any errors, we have to assume something may have changed. We could have each module in the execution part return information about whether it might have done anything that would require re-inventorying. Concretely, we'd need sled agent to tell us whether it made any changes when we did PUT /omicron-zones. This seems like a potentially fine optimization, but also just an optimization (and bugs in this area would be really annoying). I think it makes sense to wait until this becomes more of a problem. What do you think?

Waiting seems fine. I just have vague concerns around "we thought inventory is relatively expensive, so we put it on a pretty long timer, but now we're speeding that up by a factor of 10".

I guess while we're still in manual land, we could disable the current target after we get to the desired state, and that would let us turn this back down, since the executor bails out for a disabled target the same way it does for no target.

andrewjstone · 2024-02-23T22:33:43Z

This is great. Thanks @davepacheco. While it still may be worthy it to implement #5058, it is now much less of a priority!

jgallagher · 2024-02-26T18:35:10Z

Auto-merge failed because of #5027. Rerunning that job...

trigger inventory collection after blueprint execution

ef298f5

davepacheco requested a review from jgallagher February 23, 2024 19:09

jgallagher reviewed Feb 23, 2024

View reviewed changes

fix

70dfed3

jgallagher mentioned this pull request Feb 23, 2024

Failed to fully add a new sled on madrid #5111

Closed

jgallagher approved these changes Feb 23, 2024

View reviewed changes

davepacheco enabled auto-merge (squash) February 23, 2024 20:10

davepacheco merged commit d7db26d into main Feb 26, 2024
20 checks passed

davepacheco deleted the dap/more-inventory-collection branch February 26, 2024 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trigger inventory collection after blueprint execution #5130

trigger inventory collection after blueprint execution #5130

davepacheco commented Feb 23, 2024

jgallagher Feb 23, 2024

jgallagher Feb 23, 2024

davepacheco Feb 23, 2024

davepacheco Feb 23, 2024 •

edited

Loading

jgallagher Feb 23, 2024

davepacheco Feb 23, 2024

jgallagher Feb 23, 2024

jgallagher Feb 23, 2024

davepacheco Feb 23, 2024

jgallagher Feb 23, 2024

andrewjstone commented Feb 23, 2024

jgallagher commented Feb 26, 2024

trigger inventory collection after blueprint execution #5130

trigger inventory collection after blueprint execution #5130

Conversation

davepacheco commented Feb 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davepacheco Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewjstone commented Feb 23, 2024

jgallagher commented Feb 26, 2024

davepacheco Feb 23, 2024 •

edited

Loading