Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linear slowdown with parallel execution? #786

Open
namin opened this issue Jan 3, 2024 · 1 comment
Open

linear slowdown with parallel execution? #786

namin opened this issue Jan 3, 2024 · 1 comment

Comments

@namin
Copy link

namin commented Jan 3, 2024

Hi,

I defined pcall like Kent Dybvig on slide 10 of https://web.archive.org/web/20170626072601/http://www.ccs.neu.edu/events/wand-symposium/talks/mitchfest-09-dybvig.pdf
and was surprised by how it seems to have a linear slowdown with the number of expressions in parallel.

(define-syntax pcall
  (syntax-rules ()
    [(_ e0 e1 ...)
     (let ([m (make-mutex)]
           [c (make-condition)]
           [ls (list (lambda () e0) (lambda () e1) ...)]
           [n (length '(e0 e1 ...))])
       (with-mutex m
         (do ([ls ls (cdr ls)])
             ((null? ls))
           (fork-thread
            (lambda ()
              (set-car! ls ((car ls)))
              (with-mutex m
                (set! n (- n 1))
                (when (= n 0) (condition-signal c))))))
         (condition-wait c m))
       (apply (car ls) (cdr ls)))]))

(define thunk (lambda () (let loop ((i 0)) (if (= i 3000000000) (lambda args 'done) (loop (+ i 1))))))

(printf "1\n")
(time (pcall (thunk)))

(printf "3\n")
(time (pcall (thunk) (thunk) (thunk)))

(printf "10\n")
(time (pcall (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk)))

(printf "20\n")
(time (pcall (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk) (thunk)))
Chez Scheme Version 9.6.4
Copyright 1984-2023 Cisco Systems, Inc.

1
(time (pcall (thunk)))
    no collections
    0.000057763s elapsed cpu time
    5.575509000s elapsed real time
    84336 bytes allocated
3
(time (pcall (thunk) ...))
    no collections
    0.000065180s elapsed cpu time
    6.022160000s elapsed real time
    252464 bytes allocated
10
(time (pcall (thunk) ...))
    no collections
    0.000395990s elapsed cpu time
    7.696845000s elapsed real time
    840912 bytes allocated
20
(time (pcall (thunk) ...))
    no collections
    0.000328870s elapsed cpu time
    11.988880000s elapsed real time
    1681552 bytes allocated

I am using a modified version which executes expressions in parallel and returns the first value that succeeds, stopping the others (each thunk is wrapped in an engine).
https://github.com/metareflection/synthesis-scheme/blob/main/threads.scm
It also suffers from this linear slowdown.

Is this linear slowdown for parallel execution expected?

Thanks. Best wishes, ~n

@jltaylor-us
Copy link
Contributor

This is almost certainly not a linear "slowdown", but rather a polynomial growth in overhead. The overhead also appears to grow with the amount of work done in thunk (based on running some more tests with an extra zero in the loop condition), which I would expect if the work in thunk included some contention for a shared resource. That could easily happen in non-contrived code, since there are internal mutexes for allocation and garbage collection. In this case, however, it looks like the code inside thunk performs no allocation, and the results of time show that no GC occurred. I haven't quite managed to figure out whether the "trap check" happening in each thread that determines whether to fire a GC is a synchronization point if no allocation is being done. That would explain the extra overhead, but seems like it shouldn't be necessary when no allocation has happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants