Skip to content

Conversation

@Brian-Perkins
Copy link
Contributor

There have been several issues with starting the primary worker before the last message was processed (specifically the message to switch the data path). These have been resolved as discovered, but there is still at least one race, and generally this should be made harder to get wrong. Change the protocol to put the primary worker in a "waiting" state, and only after the message has been processed is it returned to 'ready'. The state is used to determine if it should be restarted, so it should never actually start running in the 'waiting' state, but just in case that will pend instead of running the main_loop.

I was not able to hit the latest race condition via unit test, but the test code has been updated with fixes and behavior updates that more closely match real devices.

…er thread by changing its state to waiting instead of ready.

Test modifications/fixes.
@Brian-Perkins Brian-Perkins requested a review from a team as a code owner November 8, 2025 02:43
Copilot AI review requested due to automatic review settings November 8, 2025 02:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the Virtual Function (VF) state management and worker coordination in the netvsp device to address race conditions and improve state transitions. The key changes introduce a new WaitingForCoordinator worker state that suspends the primary worker while the coordinator processes messages, ensuring proper synchronization.

Key changes:

  • Introduces WorkerState::WaitingForCoordinator state to suspend workers during coordinator message processing
  • Refactors VF state tracking to separate current state from state change notifications
  • Improves test infrastructure with VF state tracking and send offset management for more accurate testing

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
vm/devices/net/netvsp/src/lib.rs Adds WaitingForCoordinator worker state, refactors coordinator message handling, and improves worker-coordinator synchronization
vm/devices/net/netvsp/src/test.rs Adds VF state tracking to test endpoint, splits VF ready state into current and update tracking, fixes test assertions, and improves send buffer offset handling
Comments suppressed due to low confidence (1)

vm/devices/net/netvsp/src/lib.rs:257

  • The ready() and ready_mut() methods should handle the WaitingForCoordinator state variant, which can contain a ReadyState. Currently, these methods return None when the worker is in WaitingForCoordinator(Some(ready)), but callers like line 1778 (save state), line 3858 (inspect), line 4365 (restart_queues), line 4425 (get_queues), line 4535 (clear pending rx), and line 4549 (primary_mut) may need access to the ready state even when waiting for the coordinator. Consider updating these methods to also match WaitingForCoordinator(Some(state)).
impl WorkerState {
    fn ready(&self) -> Option<&ReadyState> {
        if let Self::Ready(state) = self {
            Some(state)
        } else {
            None
        }
    }

    fn ready_mut(&mut self) -> Option<&mut ReadyState> {
        if let Self::Ready(state) = self {
            Some(state)
        } else {
            None
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant