Skip to content

Conversation

Nieuwejaar
Copy link
Contributor

Use a device contract to detect when a sidecar goes away.
Trigger a shutdown of the switch zone, so the tofino driver can be detached cleanly.

@Nieuwejaar Nieuwejaar requested a review from jgallagher October 9, 2025 14:31
@Nieuwejaar
Copy link
Contributor Author

This PR should allow us to cleanly handle the disappearance and/or re-appearance of a sidecar. The only known exception is if the transition happens while the switch zone is being set up. See: #9182.

Comment on lines +784 to +785
let rt = tokio::runtime::Runtime::new().unwrap();
rt.block_on(block_on_switch_zone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little sketchy. It looks like we're kinda twisting ourselves in knots to bounce between sync and async, but IIUC we expect this monitor_tofino function to run practically forever, right? (It would only exit when the device disappears?) Should we spawn a real thread instead of using tokio::spawn_blocking? I believe that's what we do for the contract reaper bits on startup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason not to spawn a new thread rather than a blocking task. That's not going to make this rt.block_on() go away though. The zone API is all async, so I need to jump back into async mode to watch for the zone disappearance. Either that, or write my own blocking version of zoneadm list.

Copy link
Contributor

@jgallagher jgallagher Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either that, or write my own blocking version of zoneadm list.

This seems prefereable - spawning a new runtime here (a) shouldn't .unwrap(), but handling errors seems painful and (b) is just using all the defaults, which means it would create a 128-thread worker pool just for this one task. We do already have a blocking zoneadm list: https://docs.rs/zone/latest/zone/struct.Adm.html#method.list_blocking

I think we ought to be one world or the other: either

  • Make this a thread, and get rid of all of the async, or
  • Make this monitor itself an async task, and get rid of any blocking calls it makes (which means it could call block_on_switch_zone().await without creating a new runtime)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I'll switch to a blocking zoneadm list.

We can't get rid of the task's blocking calls, because its core operation is a blocking call to ct_event_read_critical(). Unfortunately, despite the name, the underlying system call is ioctl rather than read, so we can't turn this into an AsyncFd and poll() on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the way I'd go, but a couple other possible options if there's some problem with that down the road:

  • I think (?) we could pass in a handle to the exiting tokio runtime, and use that instead of having to create our own. (I still don't love mixing sync / async in that way, but at least we'd solve the issues with creating a new runtime.)
  • If we wanted to make this an async task, we could spawn_blocking() the ioctl (and any other blocking calls).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants