-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support attaching delta tables as catalogs #110
Conversation
what's the type of this item, is it a table or a view, that's seems like an external table to me ? |
@djouallah great question! its sortof a new thing: it's a catalog with a default schema and table: see duckdb/duckdb#14118 where the feature is added to duckdb and Note that this is the first step towards write support: through the newly added DeltaCatalog we can start adding appends, updates etc. |
Ci Failure is expected: it's due to a regression test comparing to a previous version that doesn't support the attach statement |
Before this PR, various codepaths would internally use a separate call to `Connection::Prepare` followed by `PreparedStatement::Execute` even though they would be executed right after eachother. This could lead to expensive rebinding in queries on tables in Catalogs like the new Delta Catalog. ## Background The Prepare - Execute model as DuckDB uses is it sensible when called by a users, but not always the best option when they immediately follow eachother. The reason for this is that in most cases, duckdb will be running in so-called auto-commit mode. This mode means that DuckDB will wrap each query in a transaction for automatically. What this mean the the Prepare Execute model is that the prepare and execute will each run in their own transaction. To properly understand lets consider the following: ```c++ DuckDB db(nullptr); Connection con1(db); Connection con2(db); // Create some test table con->Query("CREATE TABLE a(i TINYINT)"); con->Query("INSERT INTO a VALUES (11), (12), (13)"); // We prepare our statement auto prepared = con->Prepare("SELECT * FROM a WHERE i=$1"); // The table is altered in the meantime by a different connection con2->Query("ALTER TABLE a ADD COLUMN b VARCHAR"); // Now during execution, our binding from prepare is incorrect! However DuckDB will rebind to fix this. auto result = prepared->Execute(12); ``` So as we can see in the example rebinding is used to recover from situations where the binding has become invalidated between preparing and executing. ## The problem Rebinding is in general a good thing: it allows recovery from situations like the one above. However, it can get expensive. For the (draft) implementation of the new Delta Catalog feature (see duckdb/duckdb-delta#110), I would like to ensure that we don't bind delta tables more than absolutely necessary, because we currently can't efficiently verify if the table version we have read in the bind phase is still the latest during execution phase. This means for every delta table you would need to bind twice (unless we use aggressive and user-unfriendly caching mechanisms). I verified that the sqlite extension also suffer from the same problem. ## Solution The fix is to ensure we use a single transaction across the Prepare and Execute step where possible. This means we need to be careful with implementing `Query` functions in clients using the `Prepare` and `Execute`, and instead going through `Connection::SendQuery`, `Connection::Query`, and `Connection::PendingQuery` as much as possible. This allows Catalogs to cache state in their transactions avoiding expensive rebinds. In the case of Delta this allows me to cache the Delta Snapshot in the DeltaTransaction: a safe and unobtrusive way to ensure the snapshot stays pinned for the duration of the transaction. Even if a rebind would be triggered, the delta catalog can simply use the cached snapshot and avoid redoing IO. In the end I added the following in this PR: - Added new `Connection::PendingQuery` methods that allow passing parameters - Python clients `Connection::Execute` by using the new `PendingQuery` method - Fixed the CLI by switching to use `PendingQuery` instead of prepare execute when there are no parameters in the statement - Fixed C++ APIs variadic `Connection::QueryParamsRecursive` by using new `PendingQuery` method ## Testing I wrote some minor extra tests for the C++ API, I've also built the aforementioned draft delta catalog PR with this branch and can confirm that this allows us fetch the delta snapshot only once per table per query. ## Todo's - double check other clients
This draft PR adds support for attaching delta tables as catalogs.
This feature relies on several recent changes made to DuckDB recently. Therefore this PR is made to the feature branch of the delta extension.
How to use
With this PR, you can now do:
The attach statement allows setting the
PIN_SNAPSHOT
option, which will enforce caching the snapshot. Caching the snapshot will result in DuckDB only reading the snapshot once, which can offer a little performance boost at the cost of potentially querying an old snapshot:Performance implications
With this PR, using
ATTACH
should generally be as fast as, or faster, as using thedelta_scan
function. Note that even without using thePIN_SNAPSHOT
function, using ATTACH can result in a significant performance increase when a table is scanned more than once in the same query, for example:will only scan the delta metadata once, whereas
will scan the metadata twice
Optimizations in this PR
Initial naive implementation resulting in a lot of snapshot metadata being read unnecessarily.
Caching the Snapshot in the Transaction
Snapshots are cached in transaction, this both ensures that every time a delta table is scanned within the same transaction, not only will they see the same version of the table, the delta metadata is reused instead of re-scanning the table.
Sharing kernel snapshots
The raw
ffi::SharedSnapshot
that the delta kernel returns, is wrapped in a thread-safe wrapper. This means that when making a copy of a DuckDBDeltaSnapshot
object, we simply copy the reference to the thread-safe wrapper instead of reconstructing the snapshotNot emitting a FileExpandResult
DeltaSnapshot::GetExpandResult
will no longer materialize the file list to returnFileExpandResult::NO_FILES
,FileExpandResult::SINGLE_FILE
orFileExpandResult::MULTIPLE_FILES
.TODOs
This PR is still draft because of some remaining TODOs