[GLUTEN-7261][CORE] Use pushedFilters instead of dataFilters to offload scan #8082

zml1206 · 2024-11-28T08:37:36Z

What changes were proposed in this pull request?

DataFilters may contain complex expressions such as UDF. Vanilla spark uses cheap expressions in dataFilter as pushedFilter. Currently, dataFilters is used as native scan filter in gluten. When dataFilters causes fallback, we can use pushedFilters as native scan filter to improve performance.

(Fixes: #7261)

How was this patch tested?

UT

github-actions · 2024-11-28T08:37:53Z

#7261

github-actions · 2024-11-28T08:38:07Z

Run Gluten Clickhouse CI on x86

github-actions · 2024-11-28T08:55:50Z

Run Gluten Clickhouse CI on x86

zml1206 · 2024-11-28T09:00:08Z

gluten-substrait/src/main/scala/org/apache/gluten/execution/BatchScanExecTransformer.scala

+    val runtimeFiltersString = s"RuntimeFilters: ${filterExprs().mkString("[", ",", "]")}"
+    val result = s"$nodeName$truncatedOutputString ${scan.description()} $runtimeFiltersString"
+    redact(result)
+  }


filterExprs is the real RuntimeFilters.

zml1206 · 2024-11-28T09:02:08Z

cc @FelixYBW it can resolve #7261

zhztheplayer · 2024-12-04T02:07:14Z

cc @rui-mo

zhztheplayer · 2024-12-04T02:12:27Z

gluten-substrait/src/main/scala/org/apache/gluten/execution/ScanTransformerFactory.scala

+          transform.copy(dataFilters = PushDownUtil.pushFilters(scanExec.dataFilters))
+        } else {
+          transform
+        }


The code in ScanTransformerFactory is used by validator and offload rules. It feels a little weird to do validation in it? Do we have better choices?

How about use only pushedFilter here and rely on PushDownFilterToScan for subsequent pushdown?

Sounds feasible to me. Thanks.

github-actions · 2024-12-04T06:15:38Z

Run Gluten Clickhouse CI on x86

zml1206 · 2024-12-04T08:31:43Z

Test failure seems unrelated.

rui-mo

Thanks. Added some questions.

rui-mo · 2024-12-04T10:02:12Z

gluten-substrait/src/main/scala/org/apache/spark/sql/utils/PushDownUtil.scala

+    val translatedFilters = mutable.ArrayBuffer.empty[sources.Filter]
+    for (filterExpr <- dataFilters) {
+      val translated =
+        DataSourceStrategy.translateFilterWithMapping(


Would you elaborate on how this translation happens and how the pushed filters differ from Spark in most cases? If it is based on Spark rules, we cannot control the expressions that are to be pushed down. Is it more reasonable to adopt specific rules according to the backend status?

This logic is the same as that of vanilla spark to generate pushedFilters from dataFilters, and then convert Seq[Filter] to Seq[Expression]. dataFilter does not contain non-deterministic expressions, but contains expensive expressions, such as udf. pushedFilter only contains cheap expressions, such as a>1, a in (1,2).
Refer https://github.com/apache/spark/blob/1eb558c3a6fbdd59e5a305bc3ab12ce748f6511f/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L374
https://github.com/apache/spark/blob/1eb558c3a6fbdd59e5a305bc3ab12ce748f6511f/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScanBuilder.scala#L72

This logic is the same as that of vanilla spark to generate pushedFilters from dataFilters, and then convert Seq[Filter] to Seq[Expression]. dataFilter does not contain non-deterministic expressions, but contains expensive expressions, such as udf. pushedFilter only contains cheap expressions, such as a>1, a in (1,2). Refer https://github.com/apache/spark/blob/1eb558c3a6fbdd59e5a305bc3ab12ce748f6511f/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L374 https://github.com/apache/spark/blob/1eb558c3a6fbdd59e5a305bc3ab12ce748f6511f/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScanBuilder.scala#L72

Perhaps the above information could be documented. Also, I have an idea: can we try if the expressions can be converted as Gluten expression transformers? The unsupported ones will not be in the pushed filters.

https://github.com/apache/incubator-gluten/blob/main/gluten-substrait/src/main/scala/org/apache/gluten/expression/ExpressionConverter.scala

[CORE] Use pushedFilters to offload scan when filter need fallbac

cf5982c

github-actions bot added CORE works for Gluten Core VELOX labels Nov 28, 2024

zml1206 changed the title ~~[GLUTEN-7261][CORE] Use pushedFilters to offload scan when filter need fallbac~~ [GLUTEN-7261][CORE] Use pushedFilters to offload scan when filter need fallback Nov 28, 2024

fix style

ca66d7b

zml1206 commented Nov 28, 2024

View reviewed changes

zml1206 requested a review from zhztheplayer November 29, 2024 00:57

zhztheplayer reviewed Dec 4, 2024

View reviewed changes

[CORE] Use pushedFilters to offload scan

eaeb8f2

zml1206 changed the title ~~[GLUTEN-7261][CORE] Use pushedFilters to offload scan when filter need fallback~~ [GLUTEN-7261][CORE] Use pushedFilters instead of dataFilters to offload scan Dec 4, 2024

rui-mo reviewed Dec 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-7261][CORE] Use pushedFilters instead of dataFilters to offload scan #8082

[GLUTEN-7261][CORE] Use pushedFilters instead of dataFilters to offload scan #8082

zml1206 commented Nov 28, 2024 •

edited

Loading

github-actions bot commented Nov 28, 2024

github-actions bot commented Nov 28, 2024

github-actions bot commented Nov 28, 2024

zml1206 Nov 28, 2024

zml1206 commented Nov 28, 2024

zhztheplayer commented Dec 4, 2024

zhztheplayer Dec 4, 2024 •

edited

Loading

zml1206 Dec 4, 2024

zhztheplayer Dec 4, 2024

github-actions bot commented Dec 4, 2024

zml1206 commented Dec 4, 2024

rui-mo left a comment

rui-mo Dec 4, 2024

zml1206 Dec 4, 2024 •

edited

Loading

rui-mo Dec 5, 2024

[GLUTEN-7261][CORE] Use pushedFilters instead of dataFilters to offload scan #8082

Are you sure you want to change the base?

[GLUTEN-7261][CORE] Use pushedFilters instead of dataFilters to offload scan #8082

Conversation

zml1206 commented Nov 28, 2024 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Nov 28, 2024

github-actions bot commented Nov 28, 2024

github-actions bot commented Nov 28, 2024

zml1206 Nov 28, 2024

Choose a reason for hiding this comment

zml1206 commented Nov 28, 2024

zhztheplayer commented Dec 4, 2024

zhztheplayer Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

zml1206 Dec 4, 2024

Choose a reason for hiding this comment

zhztheplayer Dec 4, 2024

Choose a reason for hiding this comment

github-actions bot commented Dec 4, 2024

zml1206 commented Dec 4, 2024

rui-mo left a comment

Choose a reason for hiding this comment

rui-mo Dec 4, 2024

Choose a reason for hiding this comment

zml1206 Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

rui-mo Dec 5, 2024

Choose a reason for hiding this comment

zml1206 commented Nov 28, 2024 •

edited

Loading

zhztheplayer Dec 4, 2024 •

edited

Loading

zml1206 Dec 4, 2024 •

edited

Loading