README

Signed-off-by: Bruno Campos <[email protected]>
BfdCampos · Aug 31, 2023 · dfedbc3 · dfedbc3
1 parent bb96a33
commit dfedbc3
Showing 1 changed file with 43 additions and 16 deletions.
diff --git a/README.md b/README.md
@@ -1,29 +1,56 @@
-# dbt package: source_db
-This is a dbt package that allows the user to specify where the dbt command should read the data from.
+# dbt-source_db
 
-## Use case
-For when you have multiple environments/warehouses/accounts/schema's in your work and at times you want to WRITE to one place, but READ from another. 
+The `dbt-source_db` package allows you to specify the source database that dbt should read from. This enables reading from one database and writing to another. 
 
-dbt already comes with the WRITE solution with the `--target` by creating multiple profiles with the appropriate settings, and then specifying where you want your `run` to READ from and WRITE to. ([Choosing the right Snowflake warehouse when running dbt](https://about.gitlab.com/handbook/business-technology/data-team/platform/dbt-guide/#choosing-the-right-snowflake-warehouse-when-running-dbt)).
+## Getting Started 
 
-However, for some cases you may want to READ from a particular warehouse, and WRITE to the warehouse you have specified in your `profile`. Say if you work with a "sandbox" environment before sending the PR that pulls the code into a production environment.
+Install the package:
 
-## Example use
-Let's say you want to run a model in your dbt project called `my_model_a`.
+```bash
+dbt hub install dbt-labs/source_db
+```
+
+In your `dbt_project.yml` file, add the package:
+
+```yml
+packages:
+  - package: dbt-labs/source_db
+```
+
+## Usage
+
+Set the `SRC_DB` environment variable to the source database you want dbt to read from:
+
+```bash
+export SRC_DB=dev_db
+```
+
+Then run dbt as usual. The `ref()` and `source()` macros will read from `SRC_DB` instead of the default target database. 
+
+Or you can set the variable within the same command.
 
-In your project you have a *sandbox* environment where you are free to develop and try different solutions, and a *prd* environment where once the code has been looked over and approved, those changes get released to *prd*.
+For example:
 
-For you to develop in your *sandbox* environment you need to have the tables copied or cloned from *prd*. This can be easy to do if you only need one or two tables, but when you need multiple, this can become a pain.
+```bash
+SRC_DB=dev_db dbt run
+```
 
-The default behaviour of `dbt run --models +my_model_a` is to compile all the dbt code and READ all `ref`s and `source`s from the specified warehouse.
+This will read all sources and refs from `dev_db`, but write to the database in your profile/target.
 
-> So it compiles the code: `SELECT * FROM { ref('my_upstream_model') }` to `SELECT * FROM sandbox.schema.my_upstream_model`.
+## Example
 
-What if we want to develop __only__ our new model but with data from a particular environment like *dev*? With this package, you can run:
+You have a _sandbox_ and _production_ database. You want to test a new model `my_model` in _sandbox_, but reading data from _production_.
+
+Run the model:
 
 ```bash
-SRC_DB=DEV dbt run --models +my_model_a
+SRC_DB=prod_db dbt run --models my_model
 ```
-What this will do, is it will compile` SELECT * FROM { ref('my_upstream_model') }` to `SELECT * FROM DEV.schema.my_upstream_model`, and it will write into the profile env as expected: `CREATE OR REPLACE TABLE sandbox.schema.my_model AS ( ... )`.
 
-This can be really handy for when you need to test something locally without copying everyting one by one, all done directly from within your dbt project.
+This will read from `prod_db` but write `my_model` to _sandbox_.
+
+## Macro reference
+
+- `ref(model_name)`: Reads `model_name` from `SRC_DB` instead of target database.
+
+- `source(source_name, table_name)`: Reads `table_name` from `SRC_DB` instead of target database.