You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AFAIK stealing_loop_backoff::pause is called every iteration of task_dispatcher::local_wait_for_all loop, and interally it calles std::this_thread::yield().
However, if I comment lined0::yield() inside stealing_loop_backoff::pause, when I have N cores on my system and I run N tasks in parallel -- performance degrades by an order of magnitude. How so? Why does hinting the OS scheduler to move my thread away from core increases performance in case of N threads N tasks?
The text was updated successfully, but these errors were encountered:
Basically, stealing_loop_backoff::pause is designed to decrease threads contention in the scheduler. The pause will be called when thread cannot find work in the arena so instead of busing system with another attempt to find a task we call pause.
For the start we will call just processor pauses because we assume that there are tasks in the system and they will be distributed across threads in short time so we call a short wait. But after the 2 * P attempts we will start calling yield and we have 100 attempts to find a task until thread will try to transfer arena to Empty state and leave.
To summarize stealing_loop_backoff::pause might help performance by:
Decreasing contention in the system (that might end-up with lowering frequency of cores which threads are waiting -> that in turn might help to increase frequency of a core that is doing serial part of the job)
Keep threads in scheduler until new portion of the work will arrive for example start of next parallel_for:
tbb::parallel_for();
/* serial work */tbb::parallel_for();
AFAIK
stealing_loop_backoff::pause
is called every iteration oftask_dispatcher::local_wait_for_all
loop, and interally it callesstd::this_thread::yield()
.However, if I comment line
d0::yield()
insidestealing_loop_backoff::pause
, when I have N cores on my system and I run N tasks in parallel -- performance degrades by an order of magnitude. How so? Why does hinting the OS scheduler to move my thread away from core increases performance in case of N threads N tasks?The text was updated successfully, but these errors were encountered: