Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] offload table scan when filter need fallback #7261

Open
FelixYBW opened this issue Sep 18, 2024 · 5 comments · May be fixed by #8082
Open

[VL] offload table scan when filter need fallback #7261

FelixYBW opened this issue Sep 18, 2024 · 5 comments · May be fixed by #8082
Labels
enhancement New feature or request

Comments

@FelixYBW
Copy link
Contributor

Description

currently when a filter has UDF and fallback, the table scan is also fallback. We should offload table scan since it can benefit from native and much more expensive than filter and new C2R.

In this query, parquet scan output int and string only.

image

@FelixYBW FelixYBW added the enhancement New feature or request label Sep 18, 2024
@zhztheplayer
Copy link
Member

zhztheplayer commented Sep 18, 2024

There is a pending PR possibly related to this topic #7215

@zml1206 Would we still be able to optimize against such issues after that PR gets landed?

@FelixYBW
Copy link
Contributor Author

Should be. Let me do a quick test once @zml1206 resolve conflicts

@zml1206
Copy link
Contributor

zml1206 commented Sep 18, 2024

Conflicts resolved. But I think it can not be resoved by #7215, it need rewrite datafilters of FileSourceScanExec. After this pr, I can add a Rewrite rule to rewrite datafilters to resolve it. @FelixYBW @zhztheplayer

@zml1206
Copy link
Contributor

zml1206 commented Sep 18, 2024

Can we determine whether expression supports offload? @zhztheplayer There are three layers of filters here. The first layer is pushedDownFilters of FileSourceScanExec, which is also used by vanilla spark scan. The second layer is the dataFilters of FileSourceScanExec, which is used to generate the pushedDownFilters of FileSourceScanExec, as well as the filters used after the current offload. The third layer is the condition of the filter. In theory, the filter condition contains dataFilters, and dataFilters contains pushedDownFilters.
The current implementation will push down dataFilters or condition filters. Simply rewrite dataFilters to pushedDownFilters may cause only pushedDownFilters can be pushed down, but dataFilters to be pushed down before.

@zhztheplayer
Copy link
Member

zhztheplayer commented Sep 18, 2024

@zml1206 Thanks for explanation.

Can we determine whether expression supports offload?

Let's see if #6754 on which @rui-mo is working could finally help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants