You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
task.cur + ((task.stop - task.cur + task.stride-1) div task.stride) shr1
Test case:
funcsplitHalfBuggy*(cur, stop, stride: int): int {.inline.} =## Split loop iteration range in half
cur + ((stop - cur + stride-1) div stride) shr1echosplitHalfBuggy(32, 128, 32) # 33 <---- the caller only keeps a single iteration# fixedfuncsplitHalf*(cur, stop, stride: int): int {.inline.} =## Split loop iteration range in half
cur + (stop - cur) shr1echosplitHalf(32, 128, 32) # 80
SplitHalf is fairly easy to fix. Explanation of splitGuided, splitAdaptative, splitAdaptativeDelegated to have proper tests
splitGuided
Split-guided is similar to OpenMP guided split. Assuming N iterations and P workers, you first deal thieves work chunks of size N/P. When the iterations left are less than N/P you deal exponentially decreasing work chunks.
In practice, if a victim is at iteration 19 of a [0, 100) task
we have task.cur = 20 (task.cur is the next splittable iteration, so you don't give up your current work)
https://github.com/mratsim/weave/blob/5d9017239ca9792cc37e3995f422f86ac57043ab/weave/parallel_for.nim#L24-L49
Assuming we have approximately 7 thiefs (concurrency, we only have a lower-bound) + the victim we want to distribute 10 iterations each.
But the split is done one thief at a time in a loop so the algorithm does:
task.cur = 20, task.stop = 100, thieves = 7 => split at 90
task.cur = 20, task.stop = 90, thieves = 6 => split at 80
task.cur = 20, task.stop = 80, thieves = 5 => split at 70
task.cur = 20, task.stop = 70, thieves = 4 => split at 60
task.cur = 20, task.stop = 60, thieves = 3 => split at 50
task.cur = 20, task.stop = 50, thieves = 2 => split at 40
task.cur = 20, task.stop = 40, thieves = 1 => split at 30
No thieves: we do [20, 30)
And if there is only one thief, it is equivalent to split half.
splitAdaptativeDelegated
When Weave is compiled with Backoff, workers that backed off from stealing are sleeping and cannot respond to steal requests.
They have a parent that will check their steal requests queue and their children's on their behalf.
When a parent has a loop task it can wake up a child worker with, it can't just do splitAdaptative because of the following, assuming we have a leftChild sleeping with 6 steal requests (7 thieves total):
task.cur = 20, task.stop = 100, leftsubtreeThieves = 7 => split at 90
# oops the leftChild is woken up, we can't check its thief queue anymore
task.cur = 20, task.stop = 90, leftsubtreeThieves = 0 => we are left with work imbalance
# And now there is communication overhead because the left child cannot satisfy all steal requests of its tree.
So the parent sends enough work to the whole subtree before waking the left child which will do the same, avoiding latency and reducing the number of messages to log(n) tasks instead of many recirculated steal requests.
Note that the parent has its own thieves and also another child so it needs to keep enough for them as well.
The text was updated successfully, but these errors were encountered:
So while introducing support for loop strides, I also broke splitHalf.
AFAIK splitAdaptative is working fine but it's an untested part of the runtime.
The splitting bugs should be fixed with an anti-regression added.
This is completely self-contained in the loop-splitting file.
Offending code:
weave/weave/loop_splitting.nim
Lines 22 to 24 in 5d90172
Test case:
SplitHalf is fairly easy to fix. Explanation of splitGuided, splitAdaptative, splitAdaptativeDelegated to have proper tests
splitGuided
Split-guided is similar to OpenMP guided split. Assuming N iterations and P workers, you first deal thieves work chunks of size N/P. When the iterations left are less than N/P you deal exponentially decreasing work chunks.
weave/weave/loop_splitting.nim
Lines 36 to 46 in 5d90172
splitAdaptative
SplitAdaptative is described here p120: https://epub.uni-bayreuth.de/2990/
In practice, if a victim is at iteration 19 of a [0, 100) task
we have task.cur = 20 (task.cur is the next splittable iteration, so you don't give up your current work)
https://github.com/mratsim/weave/blob/5d9017239ca9792cc37e3995f422f86ac57043ab/weave/parallel_for.nim#L24-L49
Assuming we have approximately 7 thiefs (concurrency, we only have a lower-bound) + the victim we want to distribute 10 iterations each.
But the split is done one thief at a time in a loop so the algorithm does:
And if there is only one thief, it is equivalent to split half.
splitAdaptativeDelegated
When Weave is compiled with Backoff, workers that backed off from stealing are sleeping and cannot respond to steal requests.
They have a parent that will check their steal requests queue and their children's on their behalf.
When a parent has a loop task it can wake up a child worker with, it can't just do
splitAdaptative
because of the following, assuming we have a leftChild sleeping with 6 steal requests (7 thieves total):So the parent sends enough work to the whole subtree before waking the left child which will do the same, avoiding latency and reducing the number of messages to log(n) tasks instead of many recirculated steal requests.
Note that the parent has its own thieves and also another child so it needs to keep enough for them as well.
The text was updated successfully, but these errors were encountered: