-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Bruno Campos <[email protected]>
- Loading branch information
Showing
1 changed file
with
43 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,29 +1,56 @@ | ||
# dbt package: source_db | ||
This is a dbt package that allows the user to specify where the dbt command should read the data from. | ||
# dbt-source_db | ||
|
||
## Use case | ||
For when you have multiple environments/warehouses/accounts/schema's in your work and at times you want to WRITE to one place, but READ from another. | ||
The `dbt-source_db` package allows you to specify the source database that dbt should read from. This enables reading from one database and writing to another. | ||
|
||
dbt already comes with the WRITE solution with the `--target` by creating multiple profiles with the appropriate settings, and then specifying where you want your `run` to READ from and WRITE to. ([Choosing the right Snowflake warehouse when running dbt](https://about.gitlab.com/handbook/business-technology/data-team/platform/dbt-guide/#choosing-the-right-snowflake-warehouse-when-running-dbt)). | ||
## Getting Started | ||
|
||
However, for some cases you may want to READ from a particular warehouse, and WRITE to the warehouse you have specified in your `profile`. Say if you work with a "sandbox" environment before sending the PR that pulls the code into a production environment. | ||
Install the package: | ||
|
||
## Example use | ||
Let's say you want to run a model in your dbt project called `my_model_a`. | ||
```bash | ||
dbt hub install dbt-labs/source_db | ||
``` | ||
|
||
In your `dbt_project.yml` file, add the package: | ||
|
||
```yml | ||
packages: | ||
- package: dbt-labs/source_db | ||
``` | ||
## Usage | ||
Set the `SRC_DB` environment variable to the source database you want dbt to read from: | ||
|
||
```bash | ||
export SRC_DB=dev_db | ||
``` | ||
|
||
Then run dbt as usual. The `ref()` and `source()` macros will read from `SRC_DB` instead of the default target database. | ||
|
||
Or you can set the variable within the same command. | ||
|
||
In your project you have a *sandbox* environment where you are free to develop and try different solutions, and a *prd* environment where once the code has been looked over and approved, those changes get released to *prd*. | ||
For example: | ||
|
||
For you to develop in your *sandbox* environment you need to have the tables copied or cloned from *prd*. This can be easy to do if you only need one or two tables, but when you need multiple, this can become a pain. | ||
```bash | ||
SRC_DB=dev_db dbt run | ||
``` | ||
|
||
The default behaviour of `dbt run --models +my_model_a` is to compile all the dbt code and READ all `ref`s and `source`s from the specified warehouse. | ||
This will read all sources and refs from `dev_db`, but write to the database in your profile/target. | ||
|
||
> So it compiles the code: `SELECT * FROM { ref('my_upstream_model') }` to `SELECT * FROM sandbox.schema.my_upstream_model`. | ||
## Example | ||
|
||
What if we want to develop __only__ our new model but with data from a particular environment like *dev*? With this package, you can run: | ||
You have a _sandbox_ and _production_ database. You want to test a new model `my_model` in _sandbox_, but reading data from _production_. | ||
|
||
Run the model: | ||
|
||
```bash | ||
SRC_DB=DEV dbt run --models +my_model_a | ||
SRC_DB=prod_db dbt run --models my_model | ||
``` | ||
What this will do, is it will compile` SELECT * FROM { ref('my_upstream_model') }` to `SELECT * FROM DEV.schema.my_upstream_model`, and it will write into the profile env as expected: `CREATE OR REPLACE TABLE sandbox.schema.my_model AS ( ... )`. | ||
|
||
This can be really handy for when you need to test something locally without copying everyting one by one, all done directly from within your dbt project. | ||
This will read from `prod_db` but write `my_model` to _sandbox_. | ||
|
||
## Macro reference | ||
|
||
- `ref(model_name)`: Reads `model_name` from `SRC_DB` instead of target database. | ||
|
||
- `source(source_name, table_name)`: Reads `table_name` from `SRC_DB` instead of target database. |