feat: support intra-operator parallelism #856

wangrunji0408 · 2024-11-24T13:59:25Z

This PR adds data partitioning and intra-operator parallelism.

The performance of TPC-H improved on my M1 Pro (10 cores):

Seems resolve #748

Signed-off-by: Runji Wang <[email protected]>

skyzh · 2024-11-27T05:22:05Z

two quick questions: what is the schema plan node? and what is the definition of exchange node? is it the distribution of the child, or the expected distribution of the output node?

wangrunji0408 · 2024-11-27T14:01:44Z

what is the schema plan node?

The schema node is a virtual node that only changes the output schema of the child node. It was introduced to resolve a tricky issue in 2-phase aggregation.

Let's say we have a query: select sum(a) * 2 from t;

The original plan is:

Proj: sum(a) * 2
    Agg: sum(a)
        Scan: t(a)

After parallelization (by pushing down the ToParallel node), the Agg is transformed into a 2-phase aggregation:

Proj: sum(a) * 2
    Agg: sum(sum(a))
        Exchange: merge
            Agg: sum(a)
                Scan: t(a)

You may notice that the output schema of the Agg node is changed from sum(a) to sum(sum(a)). Therefore, the Proj node will throw an error when trying to resolve the physical column index of its expression sum(a).

So, in order to keep the schema unchanged, we can insert a Schema node between Proj and Agg:

Proj: sum(a) * 2
    Schema: sum(a)
        Agg: sum(sum(a))
            Exchange: merge
                Agg: sum(a)
                    Scan: t(a)

And the Schema node will be simply ignored when building executors.

wangrunji0408 · 2024-11-27T14:05:23Z

what is the definition of exchange node? is it the distribution of the child, or the expected distribution of the output node?

(exchange dist child)
where dist is the expected distribution of the output.
The child can have any distribution.

wangrunji0408 · 2024-11-27T14:12:59Z

By the way, after this optimization, the bottleneck of some queries (such as Q6) has shifted to table scan.
Next step it's critical to support parallel partition scan in the storage. 🥹

wangrunji0408 added 30 commits April 20, 2024 18:25

stash

10d7faa

Signed-off-by: Runji Wang <[email protected]>

basic support for converting to distributed plan

f608270

Signed-off-by: Runji Wang <[email protected]>

rename distributed to parallel

344a5f8

Signed-off-by: Runji Wang <[email protected]>

hash partition executor

d51e7c4

Signed-off-by: Runji Wang <[email protected]>

fix

5a25a2e

Signed-off-by: Runji Wang <[email protected]>

Merge remote-tracking branch 'origin/main' into wrj/mpp

c8f0b68

Signed-off-by: Runji Wang <[email protected]>

fix metrics and improve debug info

8fe3343

Signed-off-by: Runji Wang <[email protected]>

add a pragma to control parallel plan

cdea038

Signed-off-by: Runji Wang <[email protected]>

two-phase aggregation

fe2ee6c

Signed-off-by: Runji Wang <[email protected]>

update rust toolchain and dependencies

67bf63b

Signed-off-by: Runji Wang <[email protected]>

upgrade dependencies

be64142

Signed-off-by: Runji Wang <[email protected]>

fix warnings

bff93fe

Signed-off-by: Runji Wang <[email protected]>

support keyword completion

68f0ec3

Signed-off-by: Runji Wang <[email protected]>

support cursor in completed line

9710023

Signed-off-by: Runji Wang <[email protected]>

fix clippy

c77c0b0

Signed-off-by: Runji Wang <[email protected]>

Merge branch 'wrj/update-toolchain' into wrj/partition

0f6f712

Signed-off-by: Runji Wang <[email protected]>

Merge branch 'wrj/completion' into wrj/partition

c90782b

fix to_parallel for left outer join and DDL statements

685b148

Signed-off-by: Runji Wang <[email protected]>

fix hash exchange

322d8f1

Signed-off-by: Runji Wang <[email protected]>

replace pragma enable_parallel_execution by set variable parallelism

0961aa1

Signed-off-by: Runji Wang <[email protected]>

fix 2-phase count agg

0588dff

Signed-off-by: Runji Wang <[email protected]>

enable partitioning in unit test. fix bugs

db3a019

Signed-off-by: Runji Wang <[email protected]>

fix DDL to parallel

7f56f92

Signed-off-by: Runji Wang <[email protected]>

add unit test for Expr size

87a7fb9

Signed-off-by: Runji Wang <[email protected]>

Merge remote-tracking branch 'origin/main' into wrj/partition

fac57a1

Signed-off-by: Runji Wang <[email protected]>

fix timing

85d131b

Signed-off-by: Runji Wang <[email protected]>

add counted instrument

4ff9450

Signed-off-by: Runji Wang <[email protected]>

correctly show the time of exchange operator

a47bc49

Signed-off-by: Runji Wang <[email protected]>

use ahash to optimize hash

cea7429

Signed-off-by: Runji Wang <[email protected]>

decouple rows and time of exchange operator

35c56fd

Signed-off-by: Runji Wang <[email protected]>

do not eliminate duplicate exchange

4a2d2ad

Signed-off-by: Runji Wang <[email protected]>

wangrunji0408 requested a review from skyzh November 24, 2024 13:59

wangrunji0408 added 2 commits November 24, 2024 23:01

fix clippy

321e330

Signed-off-by: Runji Wang <[email protected]>

fix unit test

1bc5611

Signed-off-by: Runji Wang <[email protected]>

wangrunji0408 force-pushed the wrj/partition branch from 3943c98 to 1bc5611 Compare November 24, 2024 15:30

wangrunji0408 requested a review from TennyZhuang December 5, 2024 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support intra-operator parallelism #856

feat: support intra-operator parallelism #856

wangrunji0408 commented Nov 24, 2024 •

edited

Loading

skyzh commented Nov 27, 2024

wangrunji0408 commented Nov 27, 2024

wangrunji0408 commented Nov 27, 2024

wangrunji0408 commented Nov 27, 2024

feat: support intra-operator parallelism #856

Are you sure you want to change the base?

feat: support intra-operator parallelism #856

Conversation

wangrunji0408 commented Nov 24, 2024 • edited Loading

skyzh commented Nov 27, 2024

wangrunji0408 commented Nov 27, 2024

wangrunji0408 commented Nov 27, 2024

wangrunji0408 commented Nov 27, 2024

wangrunji0408 commented Nov 24, 2024 •

edited

Loading