Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trigger inventory collection after blueprint execution #5130

Merged
merged 2 commits into from
Feb 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion nexus/src/app/background/blueprint_execution.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ pub struct BlueprintExecutor {
datastore: Arc<DataStore>,
rx_blueprint: watch::Receiver<Option<Arc<(BlueprintTarget, Blueprint)>>>,
nexus_label: String,
tx: watch::Sender<usize>,
}

impl BlueprintExecutor {
Expand All @@ -30,7 +31,12 @@ impl BlueprintExecutor {
>,
nexus_label: String,
) -> BlueprintExecutor {
BlueprintExecutor { datastore, rx_blueprint, nexus_label }
let (tx, _) = watch::channel(0);
BlueprintExecutor { datastore, rx_blueprint, nexus_label, tx }
}

pub fn watcher(&self) -> watch::Receiver<usize> {
self.tx.subscribe()
}
}

Expand Down Expand Up @@ -71,6 +77,9 @@ impl BackgroundTask for BlueprintExecutor {
)
.await;

// Trigger anybody waiting for this to finish.
self.tx.send_modify(|count| *count = *count + 1);

// Return the result as a `serde_json::Value`
match result {
Ok(()) => json!({}),
Expand Down
56 changes: 32 additions & 24 deletions nexus/src/app/background/init.rs
Original file line number Diff line number Diff line change
Expand Up @@ -170,30 +170,6 @@ impl BackgroundTasks {
)
};

// Background task: inventory collector
let task_inventory_collection = {
let collector = inventory_collection::InventoryCollector::new(
datastore.clone(),
resolver,
&nexus_id.to_string(),
config.inventory.nkeep,
config.inventory.disable,
);
let task = driver.register(
String::from("inventory_collection"),
String::from(
"collects hardware and software inventory data from the \
whole system",
),
config.inventory.period_secs,
Box::new(collector),
opctx.child(BTreeMap::new()),
vec![],
);

task
};

// Background task: phantom disk detection
let task_phantom_disks = {
let detector =
Expand Down Expand Up @@ -230,6 +206,7 @@ impl BackgroundTasks {
rx_blueprint.clone(),
nexus_id.to_string(),
);
let rx_blueprint_exec = blueprint_executor.watcher();
let task_blueprint_executor = driver.register(
String::from("blueprint_executor"),
String::from("Executes the target blueprint"),
Expand All @@ -239,6 +216,37 @@ impl BackgroundTasks {
vec![Box::new(rx_blueprint)],
);

// Background task: inventory collector
//
// This currently depends on the "output" of the blueprint executor in
// order to automatically trigger inventory collection whenever the
// blueprint executor runs. In the limit, this could become a problem
// because the blueprint executor might also depend indirectly on the
// inventory collector. In that case, we may need to do something more
// complicated. But for now, this works.
let task_inventory_collection = {
let collector = inventory_collection::InventoryCollector::new(
datastore.clone(),
resolver,
&nexus_id.to_string(),
config.inventory.nkeep,
config.inventory.disable,
);
let task = driver.register(
String::from("inventory_collection"),
String::from(
"collects hardware and software inventory data from the \
whole system",
),
config.inventory.period_secs,
Box::new(collector),
opctx.child(BTreeMap::new()),
vec![Box::new(rx_blueprint_exec)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we currently have blueprint exec set to every 60s, and inventory collection set to every 600s. Is this going to 10x the inventory collection frequency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a target is set, yes. You could vie the 600s after this as "if there's nothing going on, collect at least this often".

I imagine we'll want to revisit these triggers and timeouts when we get to fully automating this but I think this is okay for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even as we're doing this by hand, we'll go from 600s to 60s as soon as we set a target, but it will never go back, right? Once we've set a target there will always be a target.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm kinda wishing for here is "only trigger the collection if blueprint realization actually changed something", but that might be quite difficult.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but yeah, that seems hard to know. Certainly if we get almost any errors, we have to assume something may have changed. We could have each module in the execution part return information about whether it might have done anything that would require re-inventorying. Concretely, we'd need sled agent to tell us whether it made any changes when we did PUT /omicron-zones. This seems like a potentially fine optimization, but also just an optimization (and bugs in this area would be really annoying). I think it makes sense to wait until this becomes more of a problem. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting seems fine. I just have vague concerns around "we thought inventory is relatively expensive, so we put it on a pretty long timer, but now we're speeding that up by a factor of 10".

I guess while we're still in manual land, we could disable the current target after we get to the desired state, and that would let us turn this back down, since the executor bails out for a disabled target the same way it does for no target.

);

task
};

let task_service_zone_nat_tracker = {
driver.register(
"service_zone_nat_tracker".to_string(),
Expand Down
Loading