Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silly bug on splitHalf / splitGuided #89

Open
mratsim opened this issue Jan 2, 2020 · 0 comments
Open

Silly bug on splitHalf / splitGuided #89

mratsim opened this issue Jan 2, 2020 · 0 comments
Labels
bug 🪲 Something isn't working good first issue 🔧 Good for newcomers Testing 🛂

Comments

@mratsim
Copy link
Owner

mratsim commented Jan 2, 2020

So while introducing support for loop strides, I also broke splitHalf.
AFAIK splitAdaptative is working fine but it's an untested part of the runtime.

The splitting bugs should be fixed with an anti-regression added.
This is completely self-contained in the loop-splitting file.

Offending code:

func splitHalf*(task: Task): int {.inline.} =
## Split loop iteration range in half
task.cur + ((task.stop - task.cur + task.stride-1) div task.stride) shr 1

Test case:

func splitHalfBuggy*(cur, stop, stride: int): int {.inline.} =
  ## Split loop iteration range in half
  cur + ((stop - cur + stride-1) div stride) shr 1

echo splitHalfBuggy(32, 128, 32) # 33 <---- the caller only keeps a single iteration

# fixed
func splitHalf*(cur, stop, stride: int): int {.inline.} =
  ## Split loop iteration range in half
  cur + (stop - cur) shr 1

echo splitHalf(32, 128, 32) # 80

SplitHalf is fairly easy to fix. Explanation of splitGuided, splitAdaptative, splitAdaptativeDelegated to have proper tests

splitGuided

Split-guided is similar to OpenMP guided split. Assuming N iterations and P workers, you first deal thieves work chunks of size N/P. When the iterations left are less than N/P you deal exponentially decreasing work chunks.

func splitGuided*(task: Task): int {.inline.} =
## Split iteration range based on the number of workers
let stepsLeft = (task.stop - task.cur + task.stride-1) div task.stride
preCondition: stepsLeft > 0
{.noSideEffect.}:
let numWorkers = workforce()
let chunk = max(((task.stop - task.start + task.stride-1) div task.stride) div numWorkers, 1)
if stepsLeft <= chunk:
return task.splitHalf()
return roundPrevMultipleOf(task.stop - chunk*task.stride, task.stride)

splitAdaptative

SplitAdaptative is described here p120: https://epub.uni-bayreuth.de/2990/
image

In practice, if a victim is at iteration 19 of a [0, 100) task
we have task.cur = 20 (task.cur is the next splittable iteration, so you don't give up your current work)
https://github.com/mratsim/weave/blob/5d9017239ca9792cc37e3995f422f86ac57043ab/weave/parallel_for.nim#L24-L49

Assuming we have approximately 7 thiefs (concurrency, we only have a lower-bound) + the victim we want to distribute 10 iterations each.
But the split is done one thief at a time in a loop so the algorithm does:

task.cur = 20, task.stop = 100, thieves = 7 => split at 90
task.cur = 20, task.stop = 90, thieves = 6 => split at 80
task.cur = 20, task.stop = 80, thieves = 5 => split at 70
task.cur = 20, task.stop = 70, thieves = 4 => split at 60
task.cur = 20, task.stop = 60, thieves = 3 => split at 50
task.cur = 20, task.stop = 50, thieves = 2 => split at 40
task.cur = 20, task.stop = 40, thieves = 1 => split at 30
No thieves: we do [20, 30)

And if there is only one thief, it is equivalent to split half.

splitAdaptativeDelegated

When Weave is compiled with Backoff, workers that backed off from stealing are sleeping and cannot respond to steal requests.
They have a parent that will check their steal requests queue and their children's on their behalf.

When a parent has a loop task it can wake up a child worker with, it can't just do splitAdaptative because of the following, assuming we have a leftChild sleeping with 6 steal requests (7 thieves total):

task.cur = 20, task.stop = 100, leftsubtreeThieves = 7 => split at 90
# oops the leftChild is woken up, we can't check its thief queue anymore
task.cur = 20, task.stop = 90, leftsubtreeThieves = 0 => we are left with work imbalance
# And now there is communication overhead because the left child cannot satisfy all steal requests of its tree.

So the parent sends enough work to the whole subtree before waking the left child which will do the same, avoiding latency and reducing the number of messages to log(n) tasks instead of many recirculated steal requests.

Note that the parent has its own thieves and also another child so it needs to keep enough for them as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working good first issue 🔧 Good for newcomers Testing 🛂
Projects
None yet
Development

No branches or pull requests

1 participant