Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add total files filtered #123

Merged
merged 2 commits into from
Nov 20, 2024

Conversation

samansmink
Copy link
Collaborator

This allows easier debugging of performance issues since we can now communicate how many files are skipped by the delta kernel. This also helps when writing tests, which I did update now that I can actually test that file skipping is done correctly

The feature is enabled with a new option delta_scan_explain_files_filtered which is ON by default. The feature is really useful and I feel it outweighs the risk of people getting tripped up by a slightly different timing of the delta scan operator when running with explain analyze.

Example

EXPLAIN ANALYZE SELECT value1, value2, value3
FROM delta_scan('./data/generated/test_file_skipping/int/delta_lake')
WHERE
    value1 > 1 and
    value2 > 2 and
    value3 < 4

wil now print:

┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         TABLE_SCAN        │
│    ────────────────────   │
│        Projections:       │
│           value1          │
│           value2          │
│           value3          │
│                           │
│          Filters:         │
│ value1>1 AND value1 IS NOT│
│            NULL           │
│ value2>2 AND value2 IS NOT│
│            NULL           │
│ value3<4 AND value3 IS NOT│
│            NULL           │
│                           │
│       File Filters:       │
│ value1>1 AND value1 IS NOT│
│            NULL           │
│ value2>2 AND value2 IS NOT│
│            NULL           │
│ value3<4 AND value3 IS NOT│
│            NULL           │
│                           │
│    Scanning Files: 1/5    │
│                           │
│           1 Rows          │
│          (0.00s)          │
└───────────────────────────┘

@samansmink samansmink merged commit 0c815b9 into duckdb:feature Nov 20, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant