Feat: Revive to use upstream arrow coalesce #17105

zhuqi-lucas · 2025-08-09T02:43:54Z

Which issue does this PR close?

Revive Draft: Use upstream arrow coalesce kernel in DataFusion #16249
Related to Optimize take/filter/concat from multiple input arrays to a single large output array arrow-rs#6692
Related to Enable parquet filter pushdown (filter_pushdown) by default #3463

Rationale for this change

Revive Draft: Use upstream arrow coalesce kernel in DataFusion #16249

And fix conflicts

Related to Optimize take/filter/concat from multiple input arrays to a single large output array arrow-rs#6692
Related to Enable parquet filter pushdown (filter_pushdown) by default #3463

What changes are included in this PR?

This PR refactors the BatchCoalescer in DataFusion to use the proposed upstream API to show that it

Can be used (api is complete enough)
Is not any slower

Are these changes tested?

Yes

Are there any user-facing changes?

No

…oalesce

…eam_arrow_coalesce

zhuqi-lucas · 2025-08-09T02:47:34Z

FYI @alamb @Dandandan

I try to revive the PR #16249, and we may run the benchmark for this PR to see if any changes since then, thanks!

alamb · 2025-08-09T09:55:54Z

I was just thinking about this PR last night -- thank you @zhuqi-lucas -- I'll kick off the benchmarks just to make sure

alamb · 2025-08-09T09:56:56Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing revive_to_use_upstream_arrow_coalesce (bac0197) to 9264bb8 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb

Thank you @zhuqi-lucas -- this looks great

As long as the benchmarks look good (I expect no substantial change) I think we should merge this

alamb · 2025-08-09T10:00:34Z

datafusion/physical-plan/src/coalesce_batches.rs

-                        Poll::Ready(Some(Ok(batch)))
-                    };
+                Some(Ok(batch)) => {
+                    if self.coalescer.push_batch(batch)? {


I found this API to be somewhat confusing (the fact that a true return value means limit was reached)

Maybe returning an enum would be clearer here

I don't think this is a correctness issue, just a readability thing I noticed

Thank you @alamb for good suggestion, this is a better way i agree.

Addressed it in latest PR, it returns an enum now.

zhuqi-lucas · 2025-08-09T10:08:44Z

Thank you @zhuqi-lucas -- this looks great

As long as the benchmarks look good (I expect no substantial change) I think we should merge this

Thank you @alamb for review, my follow-up plan, correct me if i am wrong:

I will try to address this comments from @Dandandan Draft: Use upstream arrow coalesce kernel in DataFusion #16249 (comment) for this PR or follow-up to improve performance.
Optimize the push_batch_with_filter performance, i can start from primitive type:
[coalesce] Implement specialized push_batch_with_filter for primitive array arrow-rs#7762
I can investigate try to change filter_exec with coalesce_batch_exec to coalesce_batch_exec(with filter) in single operator which can use upstream push_batch_with_filter, we may can remove coalesce_batch_exec operator in future?
Apply to other combination with coalesce_batch_exec, such HashJoinExec, RepartitionExec, etc. But we also need to implement corresponding upstream logic before this.

alamb · 2025-08-09T10:37:32Z

🤖: Benchmark completed

Details

Comparing HEAD and revive_to_use_upstream_arrow_coalesce
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ revive_to_use_upstream_arrow_coalesce ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  1978.72 ms │                            1944.77 ms │     no change │
│ QQuery 1     │   771.03 ms │                             643.75 ms │ +1.20x faster │
│ QQuery 2     │  1504.59 ms │                            1283.66 ms │ +1.17x faster │
│ QQuery 3     │   618.98 ms │                             618.31 ms │     no change │
│ QQuery 4     │  1307.14 ms │                            1350.94 ms │     no change │
│ QQuery 5     │ 13852.75 ms │                           13996.89 ms │     no change │
│ QQuery 6     │  2285.06 ms │                            2155.42 ms │ +1.06x faster │
│ QQuery 7     │  1795.16 ms │                            1819.77 ms │     no change │
└──────────────┴─────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                    │ 24113.42ms │
│ Total Time (revive_to_use_upstream_arrow_coalesce)   │ 23813.51ms │
│ Average Time (HEAD)                                  │  3014.18ms │
│ Average Time (revive_to_use_upstream_arrow_coalesce) │  2976.69ms │
│ Queries Faster                                       │          3 │
│ Queries Slower                                       │          0 │
│ Queries with No Change                               │          5 │
│ Queries with Failure                                 │          0 │
└──────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ revive_to_use_upstream_arrow_coalesce ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.35 ms │                               2.64 ms │  1.12x slower │
│ QQuery 1     │    27.83 ms │                              29.15 ms │     no change │
│ QQuery 2     │    74.71 ms │                              73.36 ms │     no change │
│ QQuery 3     │    91.03 ms │                              89.84 ms │     no change │
│ QQuery 4     │   595.96 ms │                             602.12 ms │     no change │
│ QQuery 5     │   860.37 ms │                             835.65 ms │     no change │
│ QQuery 6     │     2.20 ms │                               2.20 ms │     no change │
│ QQuery 7     │    31.81 ms │                              32.72 ms │     no change │
│ QQuery 8     │   842.59 ms │                             846.27 ms │     no change │
│ QQuery 9     │  1160.15 ms │                            1133.52 ms │     no change │
│ QQuery 10    │   223.96 ms │                             213.82 ms │     no change │
│ QQuery 11    │   251.04 ms │                             242.97 ms │     no change │
│ QQuery 12    │   826.48 ms │                             815.01 ms │     no change │
│ QQuery 13    │  1193.58 ms │                            1188.55 ms │     no change │
│ QQuery 14    │   778.58 ms │                             771.01 ms │     no change │
│ QQuery 15    │   765.85 ms │                             772.44 ms │     no change │
│ QQuery 16    │  1576.56 ms │                            1578.18 ms │     no change │
│ QQuery 17    │  1580.58 ms │                            1578.56 ms │     no change │
│ QQuery 18    │  2820.92 ms │                            2842.70 ms │     no change │
│ QQuery 19    │    80.25 ms │                              79.64 ms │     no change │
│ QQuery 20    │  1152.98 ms │                            1100.78 ms │     no change │
│ QQuery 21    │  1302.14 ms │                            1259.29 ms │     no change │
│ QQuery 22    │  2136.74 ms │                            2064.58 ms │     no change │
│ QQuery 23    │  7499.03 ms │                            7311.46 ms │     no change │
│ QQuery 24    │   395.37 ms │                             376.85 ms │     no change │
│ QQuery 25    │   274.14 ms │                             254.31 ms │ +1.08x faster │
│ QQuery 26    │   395.58 ms │                             375.71 ms │ +1.05x faster │
│ QQuery 27    │  1519.29 ms │                            1518.21 ms │     no change │
│ QQuery 28    │ 11788.81 ms │                           12181.37 ms │     no change │
│ QQuery 29    │   526.17 ms │                             513.72 ms │     no change │
│ QQuery 30    │   753.70 ms │                             744.32 ms │     no change │
│ QQuery 31    │   768.23 ms │                             769.84 ms │     no change │
│ QQuery 32    │  2363.83 ms │                            2389.26 ms │     no change │
│ QQuery 33    │  3147.88 ms │                            3116.51 ms │     no change │
│ QQuery 34    │  3196.56 ms │                            3154.66 ms │     no change │
│ QQuery 35    │  1213.64 ms │                            1200.73 ms │     no change │
│ QQuery 36    │   123.40 ms │                             124.99 ms │     no change │
│ QQuery 37    │    54.61 ms │                              50.30 ms │ +1.09x faster │
│ QQuery 38    │   121.09 ms │                             118.44 ms │     no change │
│ QQuery 39    │   198.96 ms │                             192.82 ms │     no change │
│ QQuery 40    │    41.06 ms │                              40.60 ms │     no change │
│ QQuery 41    │    40.71 ms │                              40.11 ms │     no change │
│ QQuery 42    │    35.87 ms │                              36.53 ms │     no change │
└──────────────┴─────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                    │ 52836.59ms │
│ Total Time (revive_to_use_upstream_arrow_coalesce)   │ 52665.74ms │
│ Average Time (HEAD)                                  │  1228.76ms │
│ Average Time (revive_to_use_upstream_arrow_coalesce) │  1224.78ms │
│ Queries Faster                                       │          3 │
│ Queries Slower                                       │          1 │
│ Queries with No Change                               │         39 │
│ Queries with Failure                                 │          0 │
└──────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ revive_to_use_upstream_arrow_coalesce ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │  94.69 ms │                              93.93 ms │    no change │
│ QQuery 2     │  20.98 ms │                              22.54 ms │ 1.07x slower │
│ QQuery 3     │  32.02 ms │                              37.29 ms │ 1.16x slower │
│ QQuery 4     │  18.45 ms │                              20.03 ms │ 1.09x slower │
│ QQuery 5     │  48.49 ms │                              54.68 ms │ 1.13x slower │
│ QQuery 6     │  11.94 ms │                              11.63 ms │    no change │
│ QQuery 7     │  85.30 ms │                             100.53 ms │ 1.18x slower │
│ QQuery 8     │  26.25 ms │                              28.29 ms │ 1.08x slower │
│ QQuery 9     │  55.43 ms │                              58.77 ms │ 1.06x slower │
│ QQuery 10    │  39.79 ms │                              43.39 ms │ 1.09x slower │
│ QQuery 11    │  10.97 ms │                              11.70 ms │ 1.07x slower │
│ QQuery 12    │  29.20 ms │                              30.29 ms │    no change │
│ QQuery 13    │  25.57 ms │                              26.68 ms │    no change │
│ QQuery 14    │   9.50 ms │                              10.18 ms │ 1.07x slower │
│ QQuery 15    │  18.37 ms │                              18.72 ms │    no change │
│ QQuery 16    │  17.21 ms │                              18.79 ms │ 1.09x slower │
│ QQuery 17    │  94.62 ms │                              98.44 ms │    no change │
│ QQuery 18    │ 175.22 ms │                             174.39 ms │    no change │
│ QQuery 19    │  23.86 ms │                              25.42 ms │ 1.07x slower │
│ QQuery 20    │  30.88 ms │                              33.14 ms │ 1.07x slower │
│ QQuery 21    │ 140.56 ms │                             157.40 ms │ 1.12x slower │
│ QQuery 22    │  15.25 ms │                              16.53 ms │ 1.08x slower │
└──────────────┴───────────┴───────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                    ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                    │ 1024.57ms │
│ Total Time (revive_to_use_upstream_arrow_coalesce)   │ 1092.74ms │
│ Average Time (HEAD)                                  │   46.57ms │
│ Average Time (revive_to_use_upstream_arrow_coalesce) │   49.67ms │
│ Queries Faster                                       │         0 │
│ Queries Slower                                       │        15 │
│ Queries with No Change                               │         7 │
│ Queries with Failure                                 │         0 │
└──────────────────────────────────────────────────────┴───────────┘

zhuqi-lucas · 2025-08-09T10:49:23Z

🤖: Benchmark completed

Details

Comparing HEAD and revive_to_use_upstream_arrow_coalesce
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ revive_to_use_upstream_arrow_coalesce ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  1978.72 ms │                            1944.77 ms │     no change │
│ QQuery 1     │   771.03 ms │                             643.75 ms │ +1.20x faster │
│ QQuery 2     │  1504.59 ms │                            1283.66 ms │ +1.17x faster │
│ QQuery 3     │   618.98 ms │                             618.31 ms │     no change │
│ QQuery 4     │  1307.14 ms │                            1350.94 ms │     no change │
│ QQuery 5     │ 13852.75 ms │                           13996.89 ms │     no change │
│ QQuery 6     │  2285.06 ms │                            2155.42 ms │ +1.06x faster │
│ QQuery 7     │  1795.16 ms │                            1819.77 ms │     no change │
└──────────────┴─────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                    │ 24113.42ms │
│ Total Time (revive_to_use_upstream_arrow_coalesce)   │ 23813.51ms │
│ Average Time (HEAD)                                  │  3014.18ms │
│ Average Time (revive_to_use_upstream_arrow_coalesce) │  2976.69ms │
│ Queries Faster                                       │          3 │
│ Queries Slower                                       │          0 │
│ Queries with No Change                               │          5 │
│ Queries with Failure                                 │          0 │
└──────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ revive_to_use_upstream_arrow_coalesce ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.35 ms │                               2.64 ms │  1.12x slower │
│ QQuery 1     │    27.83 ms │                              29.15 ms │     no change │
│ QQuery 2     │    74.71 ms │                              73.36 ms │     no change │
│ QQuery 3     │    91.03 ms │                              89.84 ms │     no change │
│ QQuery 4     │   595.96 ms │                             602.12 ms │     no change │
│ QQuery 5     │   860.37 ms │                             835.65 ms │     no change │
│ QQuery 6     │     2.20 ms │                               2.20 ms │     no change │
│ QQuery 7     │    31.81 ms │                              32.72 ms │     no change │
│ QQuery 8     │   842.59 ms │                             846.27 ms │     no change │
│ QQuery 9     │  1160.15 ms │                            1133.52 ms │     no change │
│ QQuery 10    │   223.96 ms │                             213.82 ms │     no change │
│ QQuery 11    │   251.04 ms │                             242.97 ms │     no change │
│ QQuery 12    │   826.48 ms │                             815.01 ms │     no change │
│ QQuery 13    │  1193.58 ms │                            1188.55 ms │     no change │
│ QQuery 14    │   778.58 ms │                             771.01 ms │     no change │
│ QQuery 15    │   765.85 ms │                             772.44 ms │     no change │
│ QQuery 16    │  1576.56 ms │                            1578.18 ms │     no change │
│ QQuery 17    │  1580.58 ms │                            1578.56 ms │     no change │
│ QQuery 18    │  2820.92 ms │                            2842.70 ms │     no change │
│ QQuery 19    │    80.25 ms │                              79.64 ms │     no change │
│ QQuery 20    │  1152.98 ms │                            1100.78 ms │     no change │
│ QQuery 21    │  1302.14 ms │                            1259.29 ms │     no change │
│ QQuery 22    │  2136.74 ms │                            2064.58 ms │     no change │
│ QQuery 23    │  7499.03 ms │                            7311.46 ms │     no change │
│ QQuery 24    │   395.37 ms │                             376.85 ms │     no change │
│ QQuery 25    │   274.14 ms │                             254.31 ms │ +1.08x faster │
│ QQuery 26    │   395.58 ms │                             375.71 ms │ +1.05x faster │
│ QQuery 27    │  1519.29 ms │                            1518.21 ms │     no change │
│ QQuery 28    │ 11788.81 ms │                           12181.37 ms │     no change │
│ QQuery 29    │   526.17 ms │                             513.72 ms │     no change │
│ QQuery 30    │   753.70 ms │                             744.32 ms │     no change │
│ QQuery 31    │   768.23 ms │                             769.84 ms │     no change │
│ QQuery 32    │  2363.83 ms │                            2389.26 ms │     no change │
│ QQuery 33    │  3147.88 ms │                            3116.51 ms │     no change │
│ QQuery 34    │  3196.56 ms │                            3154.66 ms │     no change │
│ QQuery 35    │  1213.64 ms │                            1200.73 ms │     no change │
│ QQuery 36    │   123.40 ms │                             124.99 ms │     no change │
│ QQuery 37    │    54.61 ms │                              50.30 ms │ +1.09x faster │
│ QQuery 38    │   121.09 ms │                             118.44 ms │     no change │
│ QQuery 39    │   198.96 ms │                             192.82 ms │     no change │
│ QQuery 40    │    41.06 ms │                              40.60 ms │     no change │
│ QQuery 41    │    40.71 ms │                              40.11 ms │     no change │
│ QQuery 42    │    35.87 ms │                              36.53 ms │     no change │
└──────────────┴─────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                    │ 52836.59ms │
│ Total Time (revive_to_use_upstream_arrow_coalesce)   │ 52665.74ms │
│ Average Time (HEAD)                                  │  1228.76ms │
│ Average Time (revive_to_use_upstream_arrow_coalesce) │  1224.78ms │
│ Queries Faster                                       │          3 │
│ Queries Slower                                       │          1 │
│ Queries with No Change                               │         39 │
│ Queries with Failure                                 │          0 │
└──────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ revive_to_use_upstream_arrow_coalesce ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │  94.69 ms │                              93.93 ms │    no change │
│ QQuery 2     │  20.98 ms │                              22.54 ms │ 1.07x slower │
│ QQuery 3     │  32.02 ms │                              37.29 ms │ 1.16x slower │
│ QQuery 4     │  18.45 ms │                              20.03 ms │ 1.09x slower │
│ QQuery 5     │  48.49 ms │                              54.68 ms │ 1.13x slower │
│ QQuery 6     │  11.94 ms │                              11.63 ms │    no change │
│ QQuery 7     │  85.30 ms │                             100.53 ms │ 1.18x slower │
│ QQuery 8     │  26.25 ms │                              28.29 ms │ 1.08x slower │
│ QQuery 9     │  55.43 ms │                              58.77 ms │ 1.06x slower │
│ QQuery 10    │  39.79 ms │                              43.39 ms │ 1.09x slower │
│ QQuery 11    │  10.97 ms │                              11.70 ms │ 1.07x slower │
│ QQuery 12    │  29.20 ms │                              30.29 ms │    no change │
│ QQuery 13    │  25.57 ms │                              26.68 ms │    no change │
│ QQuery 14    │   9.50 ms │                              10.18 ms │ 1.07x slower │
│ QQuery 15    │  18.37 ms │                              18.72 ms │    no change │
│ QQuery 16    │  17.21 ms │                              18.79 ms │ 1.09x slower │
│ QQuery 17    │  94.62 ms │                              98.44 ms │    no change │
│ QQuery 18    │ 175.22 ms │                             174.39 ms │    no change │
│ QQuery 19    │  23.86 ms │                              25.42 ms │ 1.07x slower │
│ QQuery 20    │  30.88 ms │                              33.14 ms │ 1.07x slower │
│ QQuery 21    │ 140.56 ms │                             157.40 ms │ 1.12x slower │
│ QQuery 22    │  15.25 ms │                              16.53 ms │ 1.08x slower │
└──────────────┴───────────┴───────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                    ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                    │ 1024.57ms │
│ Total Time (revive_to_use_upstream_arrow_coalesce)   │ 1092.74ms │
│ Average Time (HEAD)                                  │   46.57ms │
│ Average Time (revive_to_use_upstream_arrow_coalesce) │   49.67ms │
│ Queries Faster                                       │         0 │
│ Queries Slower                                       │        15 │
│ Queries with No Change                               │         7 │
│ Queries with Failure                                 │         0 │
└──────────────────────────────────────────────────────┴───────────┘

The clickbench result is good, but tpch_mem seems some regression from the benchmark result. 🤔

alamb · 2025-08-11T11:48:53Z

The clickbench result is good, but tpch_mem seems some regression from the benchmark result. 🤔

Weird, I will rerun and see if we can see it again

alamb · 2025-08-11T11:49:31Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing revive_to_use_upstream_arrow_coalesce (8bbadaf) to ab794d2 diff using: tpch_mem
Results will be posted here when complete

alamb · 2025-08-11T12:16:09Z

🤖: Benchmark completed

Details

Comparing HEAD and revive_to_use_upstream_arrow_coalesce
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ revive_to_use_upstream_arrow_coalesce ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │  95.61 ms │                              92.56 ms │    no change │
│ QQuery 2     │  20.85 ms │                              22.44 ms │ 1.08x slower │
│ QQuery 3     │  31.93 ms │                              36.28 ms │ 1.14x slower │
│ QQuery 4     │  18.15 ms │                              19.48 ms │ 1.07x slower │
│ QQuery 5     │  48.85 ms │                              53.88 ms │ 1.10x slower │
│ QQuery 6     │  12.03 ms │                              11.62 ms │    no change │
│ QQuery 7     │  88.27 ms │                              93.17 ms │ 1.06x slower │
│ QQuery 8     │  24.33 ms │                              24.10 ms │    no change │
│ QQuery 9     │  53.84 ms │                              55.02 ms │    no change │
│ QQuery 10    │  40.33 ms │                              41.29 ms │    no change │
│ QQuery 11    │  11.20 ms │                              11.65 ms │    no change │
│ QQuery 12    │  29.78 ms │                              29.46 ms │    no change │
│ QQuery 13    │  25.89 ms │                              26.05 ms │    no change │
│ QQuery 14    │   9.66 ms │                              10.02 ms │    no change │
│ QQuery 15    │  19.14 ms │                              18.82 ms │    no change │
│ QQuery 16    │  17.42 ms │                              17.74 ms │    no change │
│ QQuery 17    │  96.53 ms │                              95.86 ms │    no change │
│ QQuery 18    │ 179.04 ms │                             179.63 ms │    no change │
│ QQuery 19    │  24.01 ms │                              26.11 ms │ 1.09x slower │
│ QQuery 20    │  31.57 ms │                              33.97 ms │ 1.08x slower │
│ QQuery 21    │ 139.56 ms │                             151.41 ms │ 1.08x slower │
│ QQuery 22    │  13.78 ms │                              14.77 ms │ 1.07x slower │
└──────────────┴───────────┴───────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                    ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                    │ 1031.76ms │
│ Total Time (revive_to_use_upstream_arrow_coalesce)   │ 1065.32ms │
│ Average Time (HEAD)                                  │   46.90ms │
│ Average Time (revive_to_use_upstream_arrow_coalesce) │   48.42ms │
│ Queries Faster                                       │         0 │
│ Queries Slower                                       │         9 │
│ Queries with No Change                               │        13 │
│ Queries with Failure                                 │         0 │
└──────────────────────────────────────────────────────┴───────────┘

alamb · 2025-08-11T12:20:55Z

🤔 the new kernel seems to slow down. I wonder if the overhead of precisely sized output batches is causing the issue

zhuqi-lucas · 2025-08-11T12:33:08Z

🤔 the new kernel seems to slow down. I wonder if the overhead of precisely sized output batches is causing the issue

Good point @alamb , i agree this is the only difference. I can add a test PR to make upstream do not generate precisely sized output batches, but when we ensure capacity for the increment buffer size, it seems we need to make the size change since we do not keep the same target size for this change.

The latest benchmark seems a little better.

🤖: Benchmark completed

Details

Comparing HEAD and revive_to_use_upstream_arrow_coalesce
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ revive_to_use_upstream_arrow_coalesce ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │  95.61 ms │                              92.56 ms │    no change │
│ QQuery 2     │  20.85 ms │                              22.44 ms │ 1.08x slower │
│ QQuery 3     │  31.93 ms │                              36.28 ms │ 1.14x slower │
│ QQuery 4     │  18.15 ms │                              19.48 ms │ 1.07x slower │
│ QQuery 5     │  48.85 ms │                              53.88 ms │ 1.10x slower │
│ QQuery 6     │  12.03 ms │                              11.62 ms │    no change │
│ QQuery 7     │  88.27 ms │                              93.17 ms │ 1.06x slower │
│ QQuery 8     │  24.33 ms │                              24.10 ms │    no change │
│ QQuery 9     │  53.84 ms │                              55.02 ms │    no change │
│ QQuery 10    │  40.33 ms │                              41.29 ms │    no change │
│ QQuery 11    │  11.20 ms │                              11.65 ms │    no change │
│ QQuery 12    │  29.78 ms │                              29.46 ms │    no change │
│ QQuery 13    │  25.89 ms │                              26.05 ms │    no change │
│ QQuery 14    │   9.66 ms │                              10.02 ms │    no change │
│ QQuery 15    │  19.14 ms │                              18.82 ms │    no change │
│ QQuery 16    │  17.42 ms │                              17.74 ms │    no change │
│ QQuery 17    │  96.53 ms │                              95.86 ms │    no change │
│ QQuery 18    │ 179.04 ms │                             179.63 ms │    no change │
│ QQuery 19    │  24.01 ms │                              26.11 ms │ 1.09x slower │
│ QQuery 20    │  31.57 ms │                              33.97 ms │ 1.08x slower │
│ QQuery 21    │ 139.56 ms │                             151.41 ms │ 1.08x slower │
│ QQuery 22    │  13.78 ms │                              14.77 ms │ 1.07x slower │
└──────────────┴───────────┴───────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                    ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                    │ 1031.76ms │
│ Total Time (revive_to_use_upstream_arrow_coalesce)   │ 1065.32ms │
│ Average Time (HEAD)                                  │   46.90ms │
│ Average Time (revive_to_use_upstream_arrow_coalesce) │   48.42ms │
│ Queries Faster                                       │         0 │
│ Queries Slower                                       │         9 │
│ Queries with No Change                               │        13 │
│ Queries with Failure                                 │         0 │
└──────────────────────────────────────────────────────┴───────────┘

alamb · 2025-08-11T13:29:59Z

🤔 the new kernel seems to slow down. I wonder if the overhead of precisely sized output batches is causing the issue

Good point @alamb , i agree this is the only difference. I can add a test PR to make upstream do not generate precisely sized output batches, but when we ensure capacity for the increment buffer size, it seems we need to make the size change since we do not keep the same target size for this change.

The latest benchmark seems a little better.

Thanks @zhuqi-lucas -- what I was thinking about was something like the following

let target_batch_size = 4;
let mut coalescer = BatchCoalescer::new(batch1.schema(), 4)
  .with_exact_size(false)

Before we spend a lot of time polishing / testing a PR for that it would probably be good to hack up a POC and verify it actually improves performance

Thank you for your willingness to help along with this project. It is something I have thought was important (but not critical) for a long time and so having someone to help really makes a big difference

zhuqi-lucas · 2025-08-11T13:43:51Z

🤔 the new kernel seems to slow down. I wonder if the overhead of precisely sized output batches is causing the issue

Good point @alamb , i agree this is the only difference. I can add a test PR to make upstream do not generate precisely sized output batches, but when we ensure capacity for the increment buffer size, it seems we need to make the size change since we do not keep the same target size for this change.
The latest benchmark seems a little better.

Thanks @zhuqi-lucas -- what I was thinking about was something like the following
let target_batch_size = 4;
let mut coalescer = BatchCoalescer::new(batch1.schema(), 4)
  .with_exact_size(false)
Before we spend a lot of time polishing / testing a PR for that it would probably be good to hack up a POC and verify it actually improves performance

Thank you for your willingness to help along with this project. It is something I have thought was important (but not critical) for a long time and so having someone to help really makes a big difference

Thank you @alamb for good suggestion! It looks pretty cool to me, and a config for this is very clever idea.

let target_batch_size = 4;
let mut coalescer = BatchCoalescer::new(batch1.schema(), 4)
  .with_exact_size(false)

I will try to address this for upstream first, so we can easily testing it for datafusion.

zhuqi-lucas · 2025-08-12T05:18:37Z

Updated @alamb , i created the PR for non-exact size now:

#17136

still working on performance

alamb and others added 23 commits June 4, 2025 09:01

Pin to apache/arrow-rs#7597

9a161d2

Update pin

083931d

Use upstream BatchCoalescer

e79454f

Update the pin

4e8e1ce

Update tests

9e20973

Update rev

8918b3c

Update rev

49cb62e

New rev

140ee9c

New rev

5d5683c

New rev

f79dd09

cargo fmt

1c44c5c

update pin

ea8b700

Merge branch 'main' into alamb/test_upstream_coalesce

f2fc00b

Merge branch 'main' into alamb/test_upstream_coalesce

a36065e

Merge remote-tracking branch 'apache/main' into alamb/test_upstream_c…

423137a

…oalesce

Temp pin to apache/arrow-rs#7650

1c61513

Update plans for smaller parquet files

ed31ce1

Merge remote-tracking branch 'apache/main' into alamb/test_upstream_c…

5349c73

…oalesce

update pin

c5bb25e

Merge remote-tracking branch 'upstream/main' into revive_to_use_upstr…

2f94a22

…eam_arrow_coalesce

fix test

fe7e6a3

fix

0832ff4

Merge branch 'main' into revive_to_use_upstream_arrow_coalesce

396ef3c

github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Aug 9, 2025

zhuqi-lucas changed the title ~~Revive to use upstream arrow coalesce~~ Feat: Revive to use upstream arrow coalesce (https://github.com/apache/datafusion/pull/16249) Aug 9, 2025

zhuqi-lucas changed the title ~~Feat: Revive to use upstream arrow coalesce (https://github.com/apache/datafusion/pull/16249)~~ Feat: Revive to use upstream arrow coalesce [original PR](https://github.com/apache/datafusion/pull/16249) Aug 9, 2025

zhuqi-lucas changed the title ~~Feat: Revive to use upstream arrow coalesce [original PR](https://github.com/apache/datafusion/pull/16249)~~ Feat: Revive to use upstream arrow coalesce (original PR)[https://github.com/apache/datafusion/pull/16249] Aug 9, 2025

zhuqi-lucas changed the title ~~Feat: Revive to use upstream arrow coalesce (original PR)[https://github.com/apache/datafusion/pull/16249]~~ Feat: Revive to use upstream arrow coalesce Aug 9, 2025

fix test

7e6ced0

github-actions bot removed the core Core DataFusion crate label Aug 9, 2025

zhuqi-lucas added 2 commits August 9, 2025 11:32

fix

3ccea48

fix

bac0197

github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Aug 9, 2025

alamb previously approved these changes Aug 9, 2025

View reviewed changes

zhuqi-lucas added 2 commits August 10, 2025 11:30

Merge branch 'main' into revive_to_use_upstream_arrow_coalesce

64fc038

Address comments

8bbadaf

alamb mentioned this pull request Aug 11, 2025

[coalesce] Implement specialized push_batch_with_filter for primitive array apache/arrow-rs#7762

Open

alamb self-requested a review August 11, 2025 19:19

This was referenced Aug 12, 2025

feat: Support exact size config for BatchCoalescer apache/arrow-rs#8112

Open

Draft: Test non exact size batch #17136

Open

Feat: Revive to use upstream arrow coalesce #17105

Are you sure you want to change the base?

Feat: Revive to use upstream arrow coalesce #17105

Uh oh!

Conversation

zhuqi-lucas commented Aug 9, 2025 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

zhuqi-lucas commented Aug 9, 2025

Uh oh!

alamb commented Aug 9, 2025

Uh oh!

alamb commented Aug 9, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Aug 9, 2025

Uh oh!

zhuqi-lucas commented Aug 9, 2025

Uh oh!

alamb commented Aug 11, 2025

Uh oh!

alamb commented Aug 11, 2025

Uh oh!

alamb commented Aug 11, 2025

Uh oh!

alamb commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhuqi-lucas commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Aug 11, 2025

Uh oh!

zhuqi-lucas commented Aug 11, 2025

Uh oh!

zhuqi-lucas commented Aug 12, 2025

Uh oh!

Uh oh!

zhuqi-lucas commented Aug 9, 2025 •

edited by alamb

Loading

zhuqi-lucas commented Aug 9, 2025 •

edited

Loading

alamb commented Aug 11, 2025 •

edited

Loading

zhuqi-lucas commented Aug 11, 2025 •

edited

Loading