Silly bug on splitHalf / splitGuided #89

mratsim · 2020-01-02T20:30:50Z

So while introducing support for loop strides, I also broke splitHalf.
AFAIK splitAdaptative is working fine but it's an untested part of the runtime.

The splitting bugs should be fixed with an anti-regression added.
This is completely self-contained in the loop-splitting file.

Offending code:

weave/weave/loop_splitting.nim

Lines 22 to 24 in 5d90172

    
           func splitHalf*(task: Task): int {.inline.} = 
        
             ## Split loop iteration range in half 
        
             task.cur + ((task.stop - task.cur + task.stride-1) div task.stride) shr 1

Test case:

func splitHalfBuggy*(cur, stop, stride: int): int {.inline.} =
  ## Split loop iteration range in half
  cur + ((stop - cur + stride-1) div stride) shr 1

echo splitHalfBuggy(32, 128, 32) # 33 <---- the caller only keeps a single iteration

# fixed
func splitHalf*(cur, stop, stride: int): int {.inline.} =
  ## Split loop iteration range in half
  cur + (stop - cur) shr 1

echo splitHalf(32, 128, 32) # 80

SplitHalf is fairly easy to fix. Explanation of splitGuided, splitAdaptative, splitAdaptativeDelegated to have proper tests

splitGuided

Split-guided is similar to OpenMP guided split. Assuming N iterations and P workers, you first deal thieves work chunks of size N/P. When the iterations left are less than N/P you deal exponentially decreasing work chunks.

weave/weave/loop_splitting.nim

Lines 36 to 46 in 5d90172

    
           func splitGuided*(task: Task): int {.inline.} = 
        
             ## Split iteration range based on the number of workers 
        
             let stepsLeft = (task.stop - task.cur + task.stride-1) div task.stride 
        
             preCondition: stepsLeft > 0 
        
             {.noSideEffect.}: 
        
               let numWorkers = workforce() 
        
             let chunk = max(((task.stop - task.start + task.stride-1) div task.stride) div numWorkers, 1) 
        
             if stepsLeft <= chunk: 
        
               return task.splitHalf() 
        
             return roundPrevMultipleOf(task.stop - chunk*task.stride, task.stride)

splitAdaptative

SplitAdaptative is described here p120: https://epub.uni-bayreuth.de/2990/

In practice, if a victim is at iteration 19 of a [0, 100) task
we have task.cur = 20 (task.cur is the next splittable iteration, so you don't give up your current work)
https://github.com/mratsim/weave/blob/5d9017239ca9792cc37e3995f422f86ac57043ab/weave/parallel_for.nim#L24-L49

Assuming we have approximately 7 thiefs (concurrency, we only have a lower-bound) + the victim we want to distribute 10 iterations each.
But the split is done one thief at a time in a loop so the algorithm does:

task.cur = 20, task.stop = 100, thieves = 7 => split at 90
task.cur = 20, task.stop = 90, thieves = 6 => split at 80
task.cur = 20, task.stop = 80, thieves = 5 => split at 70
task.cur = 20, task.stop = 70, thieves = 4 => split at 60
task.cur = 20, task.stop = 60, thieves = 3 => split at 50
task.cur = 20, task.stop = 50, thieves = 2 => split at 40
task.cur = 20, task.stop = 40, thieves = 1 => split at 30
No thieves: we do [20, 30)

And if there is only one thief, it is equivalent to split half.

splitAdaptativeDelegated

When Weave is compiled with Backoff, workers that backed off from stealing are sleeping and cannot respond to steal requests.
They have a parent that will check their steal requests queue and their children's on their behalf.

When a parent has a loop task it can wake up a child worker with, it can't just do splitAdaptative because of the following, assuming we have a leftChild sleeping with 6 steal requests (7 thieves total):

task.cur = 20, task.stop = 100, leftsubtreeThieves = 7 => split at 90
# oops the leftChild is woken up, we can't check its thief queue anymore
task.cur = 20, task.stop = 90, leftsubtreeThieves = 0 => we are left with work imbalance
# And now there is communication overhead because the left child cannot satisfy all steal requests of its tree.

So the parent sends enough work to the whole subtree before waking the left child which will do the same, avoiding latency and reducing the number of messages to log(n) tasks instead of many recirculated steal requests.

Note that the parent has its own thieves and also another child so it needs to keep enough for them as well.

The text was updated successfully, but these errors were encountered:

mratsim added bug 🪲 Something isn't working good first issue 🔧 Good for newcomers Testing 🛂 labels Jan 2, 2020

mratsim mentioned this issue Jan 2, 2020

load balancing on backed off child wake-up. #90

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silly bug on splitHalf / splitGuided #89

Silly bug on splitHalf / splitGuided #89

mratsim commented Jan 2, 2020

Silly bug on splitHalf / splitGuided #89

Silly bug on splitHalf / splitGuided #89

Comments

mratsim commented Jan 2, 2020

splitGuided

splitAdaptative

splitAdaptativeDelegated