updates duckdb/motherduck load job, adds full ci for motherduck and updates docs #1674

rudolfix · 2024-08-08T13:22:46Z

Description

adds full CI for Motherduck
updates docs
enables multi statement txs and sets parallelism to 8 threads for Motherduck
matches parquet by column name to table columns in load job. fixes Fix: handle column order gracefully when loading into ddb from pqt #1553
loads jsonl with read_json without using COPY FROM which allows skipping of the fields
drops internal locks for loading parquet files to the same table from multiple threads

netlify · 2024-08-08T13:23:01Z

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`1112687`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/66b5ff666a3c150008e8f085

burnash · 2024-08-08T13:27:44Z

docs/website/docs/dlt-ecosystem/destinations/motherduck.md

@@ -1,11 +1,10 @@
 ---
-title: 🧪 MotherDuck
+title: MotherDuck


…allows full jsonl loading

sh-rp · 2024-08-12T07:22:18Z

dlt/destinations/impl/duckdb/duck.py

-                    qualified_table_name, threading.Lock()
-                )
+            source_format = "read_parquet"
+            options = ", union_by_name=true"
        elif self._file_path.endswith("jsonl"):
            # NOTE: loading JSON does not work in practice on duckdb: the missing keys fail the load instead of being interpreted as NULL


Judging from the work on my datasets PR this is not or no longer true. I have a test there that migrates a table and it still works in duckdb with json and parquet.

sh-rp · 2024-08-12T07:27:13Z

tests/load/pipeline/test_duckdb.py

+    # we will use a different pipeline with a separate schema but writing to the same dataset and to the same table
+    # the table schema is identical to the previous one with a single field ("time") added
+    # this will create a different order of columns than in the destination database ("time" will map to "_dlt_id")
+    # duckdb copies columns by column index so that will fail


I don't understand the last line, afaik you are testing the union_by_name here and this should NOT fail.

sh-rp

LGTM

adds full ci for motherduck and updates docs

91adec8

rudolfix self-assigned this Aug 8, 2024

burnash reviewed Aug 8, 2024

View reviewed changes

docs/website/docs/dlt-ecosystem/destinations/motherduck.md

@@ -1,11 +1,10 @@

---

title: 🧪 MotherDuck

title: MotherDuck

Copy link

Collaborator

burnash Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

rudolfix added 3 commits August 8, 2024 19:56

drops parquet locks from duckdb, matches parquet to columns by name, …

042c533

…allows full jsonl loading

fixes basic job and sql client tests so motherduck+parquet runs

fc4b8dd

adds parallel parquet loading test

1112687

rudolfix changed the title ~~adds full ci for motherduck and updates docs~~ updates duckdb/motherduck load job, adds full ci for motherduck and updates docs Aug 9, 2024

rudolfix mentioned this pull request Aug 9, 2024

Fix: handle column order gracefully when loading into ddb from pqt #1553

Closed

sh-rp reviewed Aug 12, 2024

View reviewed changes

sh-rp approved these changes Aug 12, 2024

View reviewed changes

rudolfix merged commit 250c2e2 into devel Aug 12, 2024
56 checks passed

rudolfix deleted the feat/runs-motherduck-ci branch August 12, 2024 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updates duckdb/motherduck load job, adds full ci for motherduck and updates docs #1674

updates duckdb/motherduck load job, adds full ci for motherduck and updates docs #1674

rudolfix commented Aug 8, 2024 •

edited

Loading

netlify bot commented Aug 8, 2024 •

edited

Loading

burnash Aug 8, 2024

sh-rp Aug 12, 2024 •

edited

Loading

sh-rp Aug 12, 2024

sh-rp left a comment

updates duckdb/motherduck load job, adds full ci for motherduck and updates docs #1674

updates duckdb/motherduck load job, adds full ci for motherduck and updates docs #1674

Conversation

rudolfix commented Aug 8, 2024 • edited Loading

Description

netlify bot commented Aug 8, 2024 • edited Loading

✅ Deploy Preview for dlt-hub-docs canceled.

burnash Aug 8, 2024

Choose a reason for hiding this comment

sh-rp Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

sh-rp Aug 12, 2024

Choose a reason for hiding this comment

sh-rp left a comment

Choose a reason for hiding this comment

rudolfix commented Aug 8, 2024 •

edited

Loading

netlify bot commented Aug 8, 2024 •

edited

Loading

sh-rp Aug 12, 2024 •

edited

Loading