Support attaching delta tables as catalogs #110

samansmink · 2024-10-24T14:43:22Z

This draft PR adds support for attaching delta tables as catalogs.

This feature relies on several recent changes made to DuckDB recently. Therefore this PR is made to the feature branch of the delta extension.

How to use

With this PR, you can now do:

ATTACH './some/delta/table' AS my_delta_table (TYPE delta)
SELECT * FROM my_delta_table;

The attach statement allows setting the PIN_SNAPSHOT option, which will enforce caching the snapshot. Caching the snapshot will result in DuckDB only reading the snapshot once, which can offer a little performance boost at the cost of potentially querying an old snapshot:

ATTACH './some/delta/table' AS my_delta_table (TYPE delta, PIN_SNAPSHOT)
SELECT * FROM my_delta_table;

Performance implications

With this PR, using ATTACH should generally be as fast as, or faster, as using the delta_scan function. Note that even without using the PIN_SNAPSHOT function, using ATTACH can result in a significant performance increase when a table is scanned more than once in the same query, for example:

ATTACH './some/delta/table' AS my_delta_table (TYPE delta)
FROM my_delta_table UNION ALL FROM my_delta_table

will only scan the delta metadata once, whereas

FROM delta_scan('./some/delta/table') UNION ALL FROM delta_scan('./some/delta/table')

will scan the metadata twice

Optimizations in this PR

Initial naive implementation resulting in a lot of snapshot metadata being read unnecessarily.

Caching the Snapshot in the Transaction

Snapshots are cached in transaction, this both ensures that every time a delta table is scanned within the same transaction, not only will they see the same version of the table, the delta metadata is reused instead of re-scanning the table.

Sharing kernel snapshots

The raw ffi::SharedSnapshot that the delta kernel returns, is wrapped in a thread-safe wrapper. This means that when making a copy of a DuckDB DeltaSnapshot object, we simply copy the reference to the thread-safe wrapper instead of reconstructing the snapshot

Not emitting a FileExpandResult

DeltaSnapshot::GetExpandResult will no longer materialize the file list to return FileExpandResult::NO_FILES, FileExpandResult::SINGLE_FILE or FileExpandResult::MULTIPLE_FILES.

TODOs

This PR is still draft because of some remaining TODOs

Fix the current regression in the parquet reader in duckdb/duckdb that is causing this CI to fail
Fix an issue in DuckDB's clients where rebinding is causing the bind of a delta snapshot to be called too many times

djouallah · 2024-10-24T18:01:27Z

what's the type of this item, is it a table or a view, that's seems like an external table to me ?
ATTACH './some/delta/table' AS my_delta_table (TYPE delta)

samansmink · 2024-10-24T18:08:19Z

@djouallah great question! its sortof a new thing: it's a catalog with a default schema and table:

see duckdb/duckdb#14118 where the feature is added to duckdb and
https://github.com/duckdb/duckdb_delta/blob/1261ea985c0b9707b3924bec2c64ddecd35efbb6/test/sql/dat/attach.test#L72 for how you can actually switch the default catalog to this delta catalog.

Note that this is the first step towards write support: through the newly added DeltaCatalog we can start adding appends, updates etc.

samansmink · 2024-11-08T12:55:30Z

Ci Failure is expected: it's due to a regression test comparing to a previous version that doesn't support the attach statement

Before this PR, various codepaths would internally use a separate call to `Connection::Prepare` followed by `PreparedStatement::Execute` even though they would be executed right after eachother. This could lead to expensive rebinding in queries on tables in Catalogs like the new Delta Catalog. ## Background The Prepare - Execute model as DuckDB uses is it sensible when called by a users, but not always the best option when they immediately follow eachother. The reason for this is that in most cases, duckdb will be running in so-called auto-commit mode. This mode means that DuckDB will wrap each query in a transaction for automatically. What this mean the the Prepare Execute model is that the prepare and execute will each run in their own transaction. To properly understand lets consider the following: ```c++ DuckDB db(nullptr); Connection con1(db); Connection con2(db); // Create some test table con->Query("CREATE TABLE a(i TINYINT)"); con->Query("INSERT INTO a VALUES (11), (12), (13)"); // We prepare our statement auto prepared = con->Prepare("SELECT * FROM a WHERE i=$1"); // The table is altered in the meantime by a different connection con2->Query("ALTER TABLE a ADD COLUMN b VARCHAR"); // Now during execution, our binding from prepare is incorrect! However DuckDB will rebind to fix this. auto result = prepared->Execute(12); ``` So as we can see in the example rebinding is used to recover from situations where the binding has become invalidated between preparing and executing. ## The problem Rebinding is in general a good thing: it allows recovery from situations like the one above. However, it can get expensive. For the (draft) implementation of the new Delta Catalog feature (see duckdb/duckdb-delta#110), I would like to ensure that we don't bind delta tables more than absolutely necessary, because we currently can't efficiently verify if the table version we have read in the bind phase is still the latest during execution phase. This means for every delta table you would need to bind twice (unless we use aggressive and user-unfriendly caching mechanisms). I verified that the sqlite extension also suffer from the same problem. ## Solution The fix is to ensure we use a single transaction across the Prepare and Execute step where possible. This means we need to be careful with implementing `Query` functions in clients using the `Prepare` and `Execute`, and instead going through `Connection::SendQuery`, `Connection::Query`, and `Connection::PendingQuery` as much as possible. This allows Catalogs to cache state in their transactions avoiding expensive rebinds. In the case of Delta this allows me to cache the Delta Snapshot in the DeltaTransaction: a safe and unobtrusive way to ensure the snapshot stays pinned for the duration of the transaction. Even if a rebind would be triggered, the delta catalog can simply use the cached snapshot and avoid redoing IO. In the end I added the following in this PR: - Added new `Connection::PendingQuery` methods that allow passing parameters - Python clients `Connection::Execute` by using the new `PendingQuery` method - Fixed the CLI by switching to use `PendingQuery` instead of prepare execute when there are no parameters in the statement - Fixed C++ APIs variadic `Connection::QueryParamsRecursive` by using new `PendingQuery` method ## Testing I wrote some minor extra tests for the C++ API, I've also built the aforementioned draft delta catalog PR with this branch and can confirm that this allows us fetch the delta snapshot only once per table per query. ## Todo's - double check other clients

Support attaching delta tables as catalogs

1261ea9

samansmink mentioned this pull request Oct 24, 2024

Delta+Azure is slow #104

Open

minor fixes

cc4eea6

samansmink mentioned this pull request Oct 29, 2024

Avoiding unnecessary rebinding duckdb/duckdb#14616

Merged

samansmink added 2 commits November 6, 2024 10:57

bump duckdb to main

3c19a20

bump to recent nightly

2374e61

samansmink marked this pull request as ready for review November 7, 2024 15:57

samansmink added 5 commits November 8, 2024 11:09

disable mingw build

56200f8

small ci fixes

ecfca60

make python3 available

5cf23c8

bump vcpkg

77373bc

fix mismatching duckdb version

ca12e53

samansmink merged commit 49c902b into duckdb:feature Nov 8, 2024
18 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support attaching delta tables as catalogs #110

Support attaching delta tables as catalogs #110

samansmink commented Oct 24, 2024 •

edited

Loading

djouallah commented Oct 24, 2024

samansmink commented Oct 24, 2024

samansmink commented Nov 8, 2024

Support attaching delta tables as catalogs #110

Support attaching delta tables as catalogs #110

Conversation

samansmink commented Oct 24, 2024 • edited Loading

How to use

Performance implications

Optimizations in this PR

Caching the Snapshot in the Transaction

Sharing kernel snapshots

Not emitting a FileExpandResult

TODOs

djouallah commented Oct 24, 2024

samansmink commented Oct 24, 2024

samansmink commented Nov 8, 2024

samansmink commented Oct 24, 2024 •

edited

Loading