Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trigger inventory collection after blueprint execution #5130

Merged
merged 2 commits into from
Feb 26, 2024

Conversation

davepacheco
Copy link
Collaborator

This came up in the context of #5111 and here in particular:

FWIW, there were 2 periods where I ended up waiting on a collection:

* After add sled, as discussed above and we now understand

* In between making the blueprint that added an NTP zone to the new sled and regenerating the next blueprint

In the second case, even though the sled had already started the NTP zone and successfully timesync'd, I was still waiting for an inventory collection because the planner won't add anything else to the sled until the latest collection shows the NTP zone. I'm not sure where we could trigger a collection here, though - if sled-agent told us in the response whether the PUT /omicron-zones actually started new zones, could the reconfigurator task trigger a collection if new zones were added?

I figured that more generally, any time we execute a blueprint, successfully or otherwise, we may have made some changes and we ought to re-collect inventory.

@@ -71,6 +78,10 @@ impl BackgroundTask for BlueprintExecutor {
)
.await;

// Trigger anybody waiting for this to finish.
self.count = self.count + 1;
self.tx.send(self.count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take "watch is hard to use correctly for $200, Alex"; this should probably be send_replace or send_modify so it doesn't spuriously fail if there are no subscribers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if we used send_modify could we drop self.count altogether?

self.tx.send_modify(|count| *count += 1)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

Copy link
Collaborator Author

@davepacheco davepacheco Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 70dfed3

config.inventory.period_secs,
Box::new(collector),
opctx.child(BTreeMap::new()),
vec![Box::new(rx_blueprint_exec)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we currently have blueprint exec set to every 60s, and inventory collection set to every 600s. Is this going to 10x the inventory collection frequency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a target is set, yes. You could vie the 600s after this as "if there's nothing going on, collect at least this often".

I imagine we'll want to revisit these triggers and timeouts when we get to fully automating this but I think this is okay for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even as we're doing this by hand, we'll go from 600s to 60s as soon as we set a target, but it will never go back, right? Once we've set a target there will always be a target.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm kinda wishing for here is "only trigger the collection if blueprint realization actually changed something", but that might be quite difficult.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but yeah, that seems hard to know. Certainly if we get almost any errors, we have to assume something may have changed. We could have each module in the execution part return information about whether it might have done anything that would require re-inventorying. Concretely, we'd need sled agent to tell us whether it made any changes when we did PUT /omicron-zones. This seems like a potentially fine optimization, but also just an optimization (and bugs in this area would be really annoying). I think it makes sense to wait until this becomes more of a problem. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting seems fine. I just have vague concerns around "we thought inventory is relatively expensive, so we put it on a pretty long timer, but now we're speeding that up by a factor of 10".

I guess while we're still in manual land, we could disable the current target after we get to the desired state, and that would let us turn this back down, since the executor bails out for a disabled target the same way it does for no target.

@davepacheco davepacheco enabled auto-merge (squash) February 23, 2024 20:10
@andrewjstone
Copy link
Contributor

This is great. Thanks @davepacheco. While it still may be worthy it to implement #5058, it is now much less of a priority!

@jgallagher
Copy link
Contributor

Auto-merge failed because of #5027. Rerunning that job...

@davepacheco davepacheco merged commit d7db26d into main Feb 26, 2024
20 checks passed
@davepacheco davepacheco deleted the dap/more-inventory-collection branch February 26, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants