Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rfix/docs standalone dbt #733

Merged
merged 5 commits into from
Nov 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 28 additions & 2 deletions dlt/helpers/dbt/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,5 +271,31 @@ def create_runner(
package_profile_name: str = None,
auto_full_refresh_when_out_of_sync: bool = None,
config: DBTRunnerConfiguration = None
) -> DBTPackageRunner:
return DBTPackageRunner(venv, credentials, working_dir, credentials.dataset_name, config)
) -> DBTPackageRunner:
"""Creates a Python wrapper over `dbt` package present at specified location, that allows to control it (ie. run and test) from Python code.

The created wrapper minimizes the required effort to run `dbt` packages. It clones the package repo and keeps it up to data,
optionally shares the `dlt` destination credentials with `dbt` and allows the isolated execution with `venv` parameter.

Note that you can pass config and secrets in DBTRunnerConfiguration as configuration in section "dbt_package_runner"

Args:
venv (Venv): A virtual environment with required dbt dependencies. Pass None to use current environment.
credentials (DestinationClientDwhConfiguration): Any configuration deriving from DestinationClientDwhConfiguration ie. ConnectionStringCredentials
working_dir (str): A working dir to which the package will be cloned
package_location (str): A git repository url to be cloned or a local path where dbt package is present
package_repository_branch (str, optional): A branch name, tag name or commit-id to check out. Defaults to None.
package_repository_ssh_key (TSecretValue, optional): SSH key to be used to clone private repositories. Defaults to TSecretValue("").
package_profiles_dir (str, optional): Path to the folder where "profiles.yml" resides
package_profile_name (str, optional): Name of the profile in "profiles.yml"
auto_full_refresh_when_out_of_sync (bool, optional): If set to True (default), the wrapper will automatically fall back to full-refresh mode when schema is out of sync
See: https://docs.getdbt.com/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change_description_. Defaults to None.
config (DBTRunnerConfiguration, optional): Explicit additional configuration for the runner.

Returns:
DBTPackageRunner: A Python `dbt` wrapper
"""
dataset_name = credentials.dataset_name if credentials else ""
if venv is None:
venv = Venv.restore_current()
return DBTPackageRunner(venv, credentials, working_dir, dataset_name, config)
1 change: 1 addition & 0 deletions docs/website/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
.docusaurus
.cache-loader
docs/api_reference
jaffle_shop

# Misc
.DS_Store
Expand Down
13 changes: 12 additions & 1 deletion docs/website/docs/dlt-ecosystem/destinations/snowflake.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Now you can use the user named `LOADER` to access database `DLT_DATA` and log in
You can also decrease the suspend time for your warehouse to 1 minute (**Admin**/**Warehouses** in Snowflake UI)

### Authentication types
Snowflake destination accepts two authentication type
Snowflake destination accepts three authentication types
- password authentication
- [key pair authentication](https://docs.snowflake.com/en/user-guide/key-pair-auth)

Expand Down Expand Up @@ -95,6 +95,17 @@ If you pass a passphrase in the connection string, please url encode it.
destination.snowflake.credentials="snowflake://loader:<password>@kgiotue-wn98412/dlt_data?private_key=<base64 encoded pem>&private_key_passphrase=<url encoded passphrase>"
```

In **external authentication** you can use oauth provider like Okta or external browser to authenticate. You pass your authenticator and refresh token as below:
```toml
[destination.snowflake.credentials]
database = "dlt_data"
username = "loader"
authenticator="..."
token="..."
```
or in connection string as query parameters.
Refer to Snowflake [OAuth](https://docs.snowflake.com/en/user-guide/oauth-intro) for more details.

## Write disposition
All write dispositions are supported

Expand Down
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
def run_dbt_standalone_snippet() -> None:
# @@@DLT_SNIPPET_START run_dbt_standalone
import os

from dlt.helpers.dbt import create_runner

runner = create_runner(
None, # use current virtual env to run dlt
None, # we do not need dataset name and we do not pass any credentials in environment to dlt
working_dir=".", # the package below will be cloned to current dir
package_location="https://github.com/dbt-labs/jaffle_shop.git",
package_profiles_dir=os.path.abspath("."), # profiles.yml must be placed in this dir
package_profile_name="duckdb_dlt_dbt_test" # name of the profile
)

models = runner.run_all()
# @@@DLT_SNIPPET_END run_dbt_standalone

for m in models:
print(f"Model {m.model_name} materialized in {m.time} with status {m.status} and message {m.message}")
44 changes: 44 additions & 0 deletions docs/website/docs/dlt-ecosystem/transformations/dbt/dbt.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ pipeline = dlt.pipeline(
)

# make or restore venv for dbt, using latest dbt version
# NOTE: if you have dbt installed in your current environment, just skip this line
# and the `venv` argument to dlt.dbt.package()
venv = dlt.dbt.get_venv(pipeline)

# get runner, optionally pass the venv
Expand All @@ -78,6 +80,48 @@ for m in models:
)
```

## How to run dbt runner without pipeline
You can use dbt runner without dlt pipeline. Example below will clone and run **jaffle shop** using a dbt profile that you supply.
It assumes that dbt is installed in the current Python environment and the `profile.yml` is in the same folder as the Python script.
<!--@@@DLT_SNIPPET_START ./dbt-snippets.py::run_dbt_standalone-->
```py
import os

from dlt.helpers.dbt import create_runner

runner = create_runner(
None, # use current virtual env to run dlt
None, # we do not need dataset name and we do not pass any credentials in environment to dlt
working_dir=".", # the package below will be cloned to current dir
package_location="https://github.com/dbt-labs/jaffle_shop.git",
package_profiles_dir=os.path.abspath("."), # profiles.yml must be placed in this dir
package_profile_name="duckdb_dlt_dbt_test" # name of the profile
)

models = runner.run_all()
```
<!--@@@DLT_SNIPPET_END ./dbt-snippets.py::run_dbt_standalone-->

Here's example **duckdb** profile
```yaml
config:
# do not track usage, do not create .user.yml
send_anonymous_usage_stats: False

duckdb_dlt_dbt_test:
target: analytics
outputs:
analytics:
type: duckdb
# schema: "{{ var('destination_dataset_name', var('source_dataset_name')) }}"
path: "duckdb_dlt_dbt_test.duckdb"
extensions:
- httpfs
- parquet
```
You can run the example with dbt debug log: `RUNTIME__LOG_LEVEL=DEBUG python dbt_standalone.py`


## Other transforming tools

If you want to transform the data before loading, you can use Python. If you want to transform the
Expand Down
14 changes: 14 additions & 0 deletions docs/website/docs/dlt-ecosystem/transformations/dbt/profiles.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
config:
# do not track usage, do not create .user.yml
send_anonymous_usage_stats: False

duckdb_dlt_dbt_test:
target: analytics
outputs:
analytics:
type: duckdb
# schema: "{{ var('destination_dataset_name', var('source_dataset_name')) }}"
path: "duckdb_dlt_dbt_test.duckdb"
extensions:
- httpfs
- parquet
Loading
Loading