Skip to content

Commit

Permalink
enables duckdb 0.9.1 and improves motherduck docs
Browse files Browse the repository at this point in the history
  • Loading branch information
rudolfix committed Oct 16, 2023
1 parent d3db284 commit 5ac4007
Show file tree
Hide file tree
Showing 5 changed files with 68 additions and 63 deletions.
9 changes: 5 additions & 4 deletions dlt/destinations/motherduck/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,14 @@ def _token_to_password(self) -> None:
self.password = TSecretValue(self.query.pop("token"))

def borrow_conn(self, read_only: bool) -> Any:
from duckdb import HTTPException
from duckdb import HTTPException, InvalidInputException
try:
return super().borrow_conn(read_only)
except HTTPException as http_ex:
if http_ex.status_code == 403 and 'Failed to download extension "motherduck"' in str(http_ex):
except (InvalidInputException, HTTPException) as ext_ex:
if 'Failed to download extension' in str(ext_ex) and "motherduck" in str(ext_ex):
from importlib.metadata import version as pkg_version
raise MotherduckLocalVersionNotSupported(pkg_version("duckdb")) from http_ex
raise MotherduckLocalVersionNotSupported(pkg_version("duckdb")) from ext_ex

raise

def parse_native_representation(self, native_value: Any) -> None:
Expand Down
17 changes: 16 additions & 1 deletion docs/website/docs/dlt-ecosystem/destinations/motherduck.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,15 @@ keywords: [MotherDuck, duckdb, destination, data warehouse]

> 🧪 MotherDuck is still invitation only and intensively tested. Please see the limitations / problems at the end.
:::tip
Decrease the number of load workers to 3-5 depending on the quality of your internet connection if you see a lot of retries in your logs with various timeout, add the following to your `config.toml`:
```toml
[load]
workers=3
```
or export **LOAD__WORKERS=3** env variable. See more in [performance](../../reference/performance.md)
:::

## Setup Guide

**1. Initialize a project with a pipeline that loads to MotherDuck by running**
Expand Down Expand Up @@ -60,8 +69,14 @@ Each destination must pass few hundred automatic tests. MotherDuck is passing th

## Troubleshooting / limitations

### I see a lot of errors in the log like DEADLINE_EXCEEDED or Connection timed out
Motherduck is very sensitive to quality of the internet connection and **number of workers used to load data**. Decrease the number of workers and make sure your internet connection really works. We could not find any way to increase those timeouts yet.


### MotherDuck does not support transactions.
Do not use `begin`, `commit` and `rollback` on `dlt` **sql_client** or on duckdb dbapi connection. It has no effect for DML statements (they are autocommit). It is confusing the query engine for DDL (tables not found etc.). It simply does not work
Do not use `begin`, `commit` and `rollback` on `dlt` **sql_client** or on duckdb dbapi connection. It has no effect for DML statements (they are autocommit). It is confusing the query engine for DDL (tables not found etc.).
If your connection if of poor quality and you get a time out when executing DML query it may happen that your transaction got executed,


### I see some exception with home_dir missing when opening `md:` connection.
Some internal component (HTTPS) requires **HOME** env variable to be present. Export such variable to the command line. Here is what we do in our tests:
Expand Down
97 changes: 42 additions & 55 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ psycopg2cffi = {version = ">=2.9.0", optional = true, markers="platform_python_i
grpcio = {version = ">=1.50.0", optional = true}
google-cloud-bigquery = {version = ">=2.26.0", optional = true}
pyarrow = {version = ">=8.0.0", optional = true}
duckdb = {version = ">=0.6.1,<0.9.0", optional = true}
duckdb = {version = ">=0.6.1,<0.10.0", optional = true}
dbt-core = {version = ">=1.2.0", optional = true}
dbt-redshift = {version = ">=1.2.0", optional = true}
dbt-bigquery = {version = ">=1.2.0", optional = true}
Expand Down
6 changes: 4 additions & 2 deletions tests/pipeline/test_dlt_versions.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import os
import tempfile
import pytest
import shutil
from importlib.metadata import version as pkg_version

import dlt
from dlt.common import json
Expand Down Expand Up @@ -32,6 +31,8 @@ def test_pipeline_with_dlt_update(test_storage: FileStorage) -> None:
with custom_environ({"DESTINATION__DUCKDB__CREDENTIALS": "duckdb:///test_github_3.duckdb"}):
# create virtual env with (0.3.0) before the current schema upgrade
with Venv.create(tempfile.mkdtemp(), ["dlt[duckdb]==0.3.0"]) as venv:
# NOTE: we force a newer duckdb into the 0.3.0 dlt version to get compatible duckdb storage
venv._install_deps(venv.context, ["duckdb" + "==" + pkg_version("duckdb")])
# load 20 issues
print(venv.run_script("../tests/pipeline/cases/github_pipeline/github_pipeline.py", "20"))
# load schema and check _dlt_loads definition
Expand Down Expand Up @@ -96,6 +97,7 @@ def test_load_package_with_dlt_update(test_storage: FileStorage) -> None:
with custom_environ({"DESTINATION__DUCKDB__CREDENTIALS": "duckdb:///test_github_3.duckdb"}):
# create virtual env with (0.3.0) before the current schema upgrade
with Venv.create(tempfile.mkdtemp(), ["dlt[duckdb]==0.3.0"]) as venv:
venv._install_deps(venv.context, ["duckdb" + "==" + pkg_version("duckdb")])
# extract and normalize on old version but DO NOT LOAD
print(venv.run_script("../tests/pipeline/cases/github_pipeline/github_extract.py", "70"))
# switch to current version and make sure the load package loads and schema migrates
Expand Down

0 comments on commit 5ac4007

Please sign in to comment.