Open
Description
Describe the bug
This is a follow-up to a discussion in #16325 (comment), which is not directly related to table sampling but could affect it.
I'd like to double-check if a volatile filter pushdown to a Parquet executor is expected. I had implemented the disabling of volatile pushdown filters for a logical plan in #13268. But it seems like the physical optimiser still pushes this predicate to an executor. Should we implement a similar mechanism to make volatile predicates as unsupported filters? In a current physical plan implementation, there is a concept of "unsupported" filters, which can be easily reused for it.
Current behaviour:
Before:
[2025-06-18T18:20:07Z TRACE datafusion::physical_planner] Optimized physical plan by LimitedDistinctAggregation:
OutputRequirementExec
ProjectionExec: expr=[count(Int64(1))@0 as count(*)]
AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))]
AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))]
FilterExec: random() < 0.1
DataSourceExec: file_groups={1 group: [[sample.parquet]]}, file_type=parquet
After:
[2025-06-18T18:20:07Z TRACE datafusion::physical_planner] Optimized physical plan by FilterPushdown:
OutputRequirementExec
ProjectionExec: expr=[count(Int64(1))@0 as count(*)]
AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))]
AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))]
DataSourceExec: file_groups={1 group: [[sample.parquet]]}, file_type=parquet, predicate=random() < 0.1
To Reproduce
set datafusion.execution.parquet.pushdown_filters=true;
create external table data stored as parquet location 'sample.parquet';
SELECT count(*) FROM data WHERE random() < 0.1;
Expected behavior
I expect the physical plan optimiser doesn't perform pushdown of volatile predicates.
Additional context
No response