Shutdown switch zone when sidecar disappears #9181

Nieuwejaar · 2025-10-09T14:31:40Z

Use a device contract to detect when a sidecar goes away.
Trigger a shutdown of the switch zone, so the tofino driver can be detached cleanly.

Nieuwejaar · 2025-10-09T14:45:57Z

This PR should allow us to cleanly handle the disappearance and/or re-appearance of a sidecar. The only known exception is if the transition happens while the switch zone is being set up. See: #9182.

jgallagher · 2025-10-09T18:22:37Z

sled-hardware/src/illumos/mod.rs

+                    let rt = tokio::runtime::Runtime::new().unwrap();
+                    rt.block_on(block_on_switch_zone());


This seems a little sketchy. It looks like we're kinda twisting ourselves in knots to bounce between sync and async, but IIUC we expect this monitor_tofino function to run practically forever, right? (It would only exit when the device disappears?) Should we spawn a real thread instead of using tokio::spawn_blocking? I believe that's what we do for the contract reaper bits on startup.

I don't see any reason not to spawn a new thread rather than a blocking task. That's not going to make this rt.block_on() go away though. The zone API is all async, so I need to jump back into async mode to watch for the zone disappearance. Either that, or write my own blocking version of zoneadm list.

Either that, or write my own blocking version of zoneadm list.

This seems prefereable - spawning a new runtime here (a) shouldn't .unwrap(), but handling errors seems painful and (b) is just using all the defaults, which means it would create a 128-thread worker pool just for this one task. We do already have a blocking zoneadm list: https://docs.rs/zone/latest/zone/struct.Adm.html#method.list_blocking

I think we ought to be one world or the other: either

Make this a thread, and get rid of all of the async, or

Make this monitor itself an async task, and get rid of any blocking calls it makes (which means it could call block_on_switch_zone().await without creating a new runtime)

Fair enough. I'll switch to a blocking zoneadm list.

We can't get rid of the task's blocking calls, because its core operation is a blocking call to ct_event_read_critical(). Unfortunately, despite the name, the underlying system call is ioctl rather than read, so we can't turn this into an AsyncFd and poll() on it.

I think that's the way I'd go, but a couple other possible options if there's some problem with that down the road:

I think (?) we could pass in a handle to the exiting tokio runtime, and use that instead of having to create our own. (I still don't love mixing sync / async in that way, but at least we'd solve the issues with creating a new runtime.)

If we wanted to make this an async task, we could spawn_blocking() the ioctl (and any other blocking calls).

Nieuwejaar added 5 commits October 6, 2025 22:34

Handle surprise removal of a sidecar

18731b9

fix linux build

1e819ce

try again

0fd9ae8

cleanup

c61e606

cleanup

df846ca

Nieuwejaar requested a review from jgallagher October 9, 2025 14:31

jgallagher reviewed Oct 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shutdown switch zone when sidecar disappears #9181

Shutdown switch zone when sidecar disappears #9181

Uh oh!

Nieuwejaar commented Oct 9, 2025

Uh oh!

Nieuwejaar commented Oct 9, 2025

Uh oh!

jgallagher Oct 9, 2025

Uh oh!

Nieuwejaar Oct 13, 2025

Uh oh!

jgallagher Oct 13, 2025 •

edited

Loading

Uh oh!

Nieuwejaar Oct 13, 2025

Uh oh!

jgallagher Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		let rt = tokio::runtime::Runtime::new().unwrap();
		rt.block_on(block_on_switch_zone());

Shutdown switch zone when sidecar disappears #9181

Are you sure you want to change the base?

Shutdown switch zone when sidecar disappears #9181

Uh oh!

Conversation

Nieuwejaar commented Oct 9, 2025

Uh oh!

Nieuwejaar commented Oct 9, 2025

Uh oh!

jgallagher Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Nieuwejaar Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

jgallagher Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nieuwejaar Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

jgallagher Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jgallagher Oct 13, 2025 •

edited

Loading