ThreadPool cleanup (3/n): Switch to vectorized API & remove unused/co… #32

saeta · 2020-05-22T17:42:07Z

…nfusing extensions and implementations.

This change takes a first^H^H^H^H^H^Hthird whack at a bunch of tech debt:

Removes the Naive thread pool implementation from PJoin.
Removes the unnecessary TypedComputeThreadPool protocol refinement.
Removes the badly implemented extensions that implemented parallelFor
in terms of join.
Removes use of rethrows, as the rethrows language feature is not
expressive enough to allow the performance optimizations for the non-throwing
case.
Adds a vectorized API (which improves performance).

Performance measurements:

After:

name                                                                   time         std                   iterations  
--------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level                                 700.0 ns     ± 70289.84998218225   127457      
NonBlockingThreadPool: join, two levels                                2107.0 ns    ± 131041.5070696377   31115       
NonBlockingThreadPool: join, three levels                              4960.0 ns    ± 178122.9562964306   15849       
NonBlockingThreadPool: join, four levels, three on thread pool thread  5893.0 ns    ± 224021.47900401088  13763       
NonBlockingThreadPool: parallel for, one level                         22420.0 ns   ± 203689.69689780468  7581        
NonBlockingThreadPool: parallel for, two levels                        500985.5 ns  ± 642136.0139757036   1390

Before:

name                                                                   time          std                   iterations  
---------------------------------------------------------------------------------------------------------------------
NonBlockingThreadPool: join, one level                                 728.0 ns      ± 78662.43173968921   115554      
NonBlockingThreadPool: join, two levels                                2149.0 ns     ± 144611.11773139169  30425       
NonBlockingThreadPool: join, three levels                              5049.0 ns     ± 188450.6773907647   15157       
NonBlockingThreadPool: join, four levels, three on thread pool thread  5951.0 ns     ± 229270.51587738466  10255       
NonBlockingThreadPool: parallel for, one level                         4919427.5 ns  ± 887590.5386061076   302         
NonBlockingThreadPool: parallel for, two levels                        4327151.0 ns  ± 855302.611386676    313

…nfusing extensions and implementations. This change takes a first^H^H^H^H^H^Hthird whack at a bunch of tech debt: 1. Removes the Naive thread pool implementation from `PJoin`. 2. Removes the unnecessary `TypedComputeThreadPool` protocol refinement. 3. Removes the badly implemented extensions that implemented `parallelFor` in terms of `join`. 4. Removes use of `rethrows`, as the rethrows language feature is not expressive enough to allow the performance optimizations for the non-throwing case.

saeta · 2020-05-22T17:47:36Z

Note: I'm trying to break up a very large refactoring I've been working on in #11 (and related branches) into more easily reviewable pieces. Happy to explain how they all fit together out-of-band as appropriate.

dabrahams · 2020-05-22T18:02:28Z

IIUC the descriptions of parts 3/4 aren't quite right

you don't just remove parallelFor but reimplement it.
You replace rethrows with overloads rather than simply removing it

I know it isn't always easy to break up a big stack of work, but personally, I'd have put parts 3, 4, 5 each in its own PR. If you want a magit tutorial I could probably show you how to make this sort of thing go more easily.

dabrahams

Approving to unblock progress but please consider all the suggestions and file issues if addressing them needs to be delayed.

dabrahams · 2020-05-22T18:06:58Z

Sources/PenguinParallel/NonblockingThreadPool/NonBlockingThreadPool.swift

@@ -292,6 +292,40 @@ public class NonBlockingThreadPool<Environment: ConcurrencyPlatform>: ComputeThr
    if let e = err { throw e }
  }

+  public func parallelFor(n: Int, _ fn: VectorizedParallelForFunction) {


Doc comment please!!!

Unless "parallel for on a thread pool" is a very well-established concept, I'd consider renaming these. for in Swift is something we do over a Sequence, and there's no sequence here. Is this something that could be written as an extension on Collection that accepts a thread pool as an argument? Your n repetitions could be well-represented by 0..<n.

+1 to doc comment. PTAL?

I believe that this should most often be accessed as an operation on a random access (or likely some form of "splittable") collection. But in any case, that will have to be generic over the thread pool itself, so we don't get away from having this method and coming up with a name for it.

Note: I started going in this direction a while back but I think that direction needs a "reboot". For now, I'd like to focus on getting this low-level API implemented correctly and efficiently, and we can then refactor and/or stack on the further abstractions.

FWIW: I started out by having VectorizedParallelForFunction take a range instead of 2 integers representing the start and end, but that makes type inference not work as well (as code requires annotations because the alternative API induces an ambiguity between the non-vectorized and vectorized APIs).

Will comment on the doc comment separately (GitHub doesn't make this super convenient)

All collections are "splittable" for reasonable definitions of the term, but maybe you mean collections whose disjoint slices can be mutated in parallel. The more general concept is those that “have disjoint parts that can be projected for mutation in parallel.” You could imagine a collection of pairs where you mutate the first of each pair in one thread and the second in another.

Not quite sure what you wanted me to notice at that link. The main thing I took away was, “why is he rebasing those slices?“ which probably wasn't the point 😉

IMO it's questionable whether we really want the non-vectorized ones and whether they should have the same spelling anyway.

dabrahams · 2020-05-22T18:07:34Z

Sources/PenguinParallel/NonblockingThreadPool/NonBlockingThreadPool.swift

+    executeParallelFor(0, n)
+  }
+
+  public func parallelFor(n: Int, _ fn: ThrowingVectorizedParallelForFunction) throws {


Doc comment please!

+1; done. (Although I suspect that this comment could be improved...)

dabrahams · 2020-05-22T18:10:15Z

Sources/PenguinParallel/NonblockingThreadPool/NonBlockingThreadPool.swift

+        // Divide into 2 & recurse.
+        let rangeSize = end - start
+        let midPoint = start + (rangeSize / 2)
+        try self.join({ try executeParallelFor(start, midPoint) }, { try executeParallelFor(midPoint, end) })


? Your change description gives the impression you are removing the implementation of parallelFor in terms of join, yet here it is.

Ah, good point. That description is getting ahead of the actual implementation in this patch set. I'll update the description in the PR shortly.

dabrahams · 2020-05-22T18:10:44Z

Sources/PenguinParallel/NonblockingThreadPool/NonBlockingThreadPool.swift

+
+    try executeParallelFor(0, n)
+  }
+
  /// Shuts down the thread pool.


Meaningful summary please. What does it mean to shut a thread pool down?