Skip to content

fix: skip predicates on struct unnest in PushDownFilter #16790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

akoshchiy
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Added new test case in push_down_filter.slt

Are there any user-facing changes?

No.

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jul 15, 2025
@alamb alamb requested a review from adriangb July 15, 2025 17:55
@alamb
Copy link
Contributor

alamb commented Jul 15, 2025

@adriangb is there any chance you have time to review this PR?

@akoshchiy akoshchiy changed the title fix: skip predicates on struct unnest in FilterPushdown fix: skip predicates on struct unnest in PushDownFilter Jul 15, 2025
Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that we should not be pushing down filters that touch unnested columns. This seems to achieve that goal.

CREATE TABLE d AS VALUES (named_struct('a', 1, 'b', 2)), (named_struct('a', 3, 'b', 4)), (named_struct('a', 5, 'b', 6));

query II
select * from (select unnest(column1) from d) where "__unnest_placeholder(d.column1).b" > 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akoshchiy is there documentation anywhere on what "__unnest_placeholder" is? Is it a private implementation detail we are using to test, or something that users could / should use in their queries?

Copy link
Contributor Author

@akoshchiy akoshchiy Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I can see, it's a result of struct unnest preprocessing in SqlToRel::try_process_unnest. unnest(struct_col) will be transformed into __unnest_placeholder(struct_col).field1, __unnest_placeholder(struct_col).field2 etc.
To be honest, I'm not sure, that it was supposed to be like that or not.

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a little more docs / context / comments are needed otherwise this is good to merge.

@adriangb
Copy link
Contributor

I think a little more docs / context / comments are needed otherwise this is good to merge.

@akoshchiy could you add some comments explaining what's going on for future reference, and documenting what the __unnest_placeholder is? I realize that's been around and is used for other things already but I think it's important to improve documentation and understanding as we go along so that next time we have an easier time fixing things.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jul 18, 2025
@akoshchiy
Copy link
Contributor Author

I've added some comments to the docs.

btw, I've checked behaviour on duckdb, and it looks more clearly - there is no prefixes at all. Maybe we can do the same?

cat nested_2.ndjson 
{"metadata": {"product": "Product Name 1", "price": 1}}
{"metadata": {"product": "Product Name 2", "price": 2}}
{"metadata": {"product": "Product Name 3", "price": 3}}
{"metadata": {"product": "Product Name 4", "price": 4}}
{"metadata": {"product": "Product Name 5", "price": 5}}
{"metadata": {"product": "Product Name 6", "price": 6}}
{"metadata": {"product": "Product Name 7", "price": 7}}
{"metadata": {"product": "Product Name 8", "price": 8}}
{"metadata": {"product": "Product Name 9", "price": 9}}


duckdb 
DuckDB v1.3.2 (Ossivalis) 0b83e5d2f6
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D SELECT unnest(metadata) FROM 'nested_2.ndjson';
┌────────────────┬───────┐
│    product     │ price │
│    varchar     │ int64 │
├────────────────┼───────┤
│ Product Name 1 │     1 │
│ Product Name 2 │     2 │
│ Product Name 3 │     3 │
│ Product Name 4 │     4 │
│ Product Name 5 │     5 │
│ Product Name 6 │     6 │
│ Product Name 7 │     7 │
│ Product Name 8 │     8 │
│ Product Name 9 │     9 │
└────────────────┴───────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unnested fields are not filterable when using subqueries.
3 participants