Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock possible between BalancedThreadSched and RatedUnqueue #394

Open
mihaibrodschi opened this issue Aug 14, 2024 · 3 comments
Open
Labels
bug vanilla Not related to FastClick's specific code

Comments

@mihaibrodschi
Copy link
Contributor

For elements which have both a task and a timer which schedule each other (such as RatedUnqueue), if we use BalancedThreadScheduler, it will move the task to the least busy thread (T2), but the timer won't be moved (remains on T1). In this case, this deadlock can occur:

Thread 1
RouterThread::driver
 |_ TimerSet::run_timers -> locks the _timer_lock for T1
     |_ TimerSet::run_one_timer
         |_ BalancedThreadSched::run_timer
             |_ RouterThread::scheduled_tasks -> wants to check the scheduled tasks for T2
                 |_ RouterThread::block_tasks
                     |_ waits for T2's _task_blocker to become >= 0.

Thread 2
RouterThread::driver
 |_ RouterThread::driver_lock_tasks -> sets T2's _task_blocker to -1
 |  (...)
 |_ RouterThread::run_tasks
     |_ BandwidthRatedUnqueue::run_task
         |_ Timer::schedule_after
             |_ Timer::schedule_at_steady -> gets the TimerSet this timer belongs to (which is T1's).
                 |_ TimerSet::lock_timers -> waits to lock the _timer_lock for T1. DEADLOCK

A potential solution would be to ensure that the timer is always moved to the same thread as the task.

@tbarbette
Copy link
Owner

Is it something you ran into? Do you have a test case to demonstrate?

In practice though I would recommend using the RSS++ solution instead of the thread balancer.

@mihaibrodschi
Copy link
Contributor Author

Yes, I ran into this with a simple setup involving two instances of my application communicating through click. I'll see if I can reproduce it using dpdk-testpmd.
I used the thread balancer because it looked easy. Will check out RSS++.

@tbarbette
Copy link
Owner

The thing is this bug comes straight from Click, I haven't used or modified those parts (scheduling and threadbalancer) myself so it's hard to help here. You can try to post that on kohler/click repo as it's not related to FastClick parts. Maybe Eddie will like this bug and solve it, happens from time to time :)

@tbarbette tbarbette added bug vanilla Not related to FastClick's specific code labels Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug vanilla Not related to FastClick's specific code
Projects
None yet
Development

No branches or pull requests

2 participants