-
Notifications
You must be signed in to change notification settings - Fork 58
Shutdown switch zone when sidecar disappears #9181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR should allow us to cleanly handle the disappearance and/or re-appearance of a sidecar. The only known exception is if the transition happens while the switch zone is being set up. See: #9182. |
let rt = tokio::runtime::Runtime::new().unwrap(); | ||
rt.block_on(block_on_switch_zone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a little sketchy. It looks like we're kinda twisting ourselves in knots to bounce between sync and async, but IIUC we expect this monitor_tofino
function to run practically forever, right? (It would only exit when the device disappears?) Should we spawn a real thread instead of using tokio::spawn_blocking
? I believe that's what we do for the contract reaper bits on startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason not to spawn a new thread rather than a blocking task. That's not going to make this rt.block_on()
go away though. The zone API is all async, so I need to jump back into async mode to watch for the zone disappearance. Either that, or write my own blocking version of zoneadm list
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either that, or write my own blocking version of
zoneadm list
.
This seems prefereable - spawning a new runtime here (a) shouldn't .unwrap()
, but handling errors seems painful and (b) is just using all the defaults, which means it would create a 128-thread worker pool just for this one task. We do already have a blocking zoneadm list
: https://docs.rs/zone/latest/zone/struct.Adm.html#method.list_blocking
I think we ought to be one world or the other: either
- Make this a thread, and get rid of all of the async, or
- Make this monitor itself an async task, and get rid of any blocking calls it makes (which means it could call
block_on_switch_zone().await
without creating a new runtime)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I'll switch to a blocking zoneadm list
.
We can't get rid of the task's blocking calls, because its core operation is a blocking call to ct_event_read_critical()
. Unfortunately, despite the name, the underlying system call is ioctl
rather than read
, so we can't turn this into an AsyncFd
and poll()
on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the way I'd go, but a couple other possible options if there's some problem with that down the road:
- I think (?) we could pass in a handle to the exiting tokio runtime, and use that instead of having to create our own. (I still don't love mixing sync / async in that way, but at least we'd solve the issues with creating a new runtime.)
- If we wanted to make this an async task, we could
spawn_blocking()
theioctl
(and any other blocking calls).
Use a device contract to detect when a sidecar goes away.
Trigger a shutdown of the switch zone, so the tofino driver can be detached cleanly.