Skip to content

Latest commit

 

History

History
146 lines (95 loc) · 3.18 KB

README.md

File metadata and controls

146 lines (95 loc) · 3.18 KB

Welcome to your new dbt project

Steps to recreate

Note: Notice the variable settings in the .env file.

Start the PostGres database and PostGresAdmin (webinterface to the database - see images further below):

docker-compose up 

In another shell we start the python dev environment container (Python 3.9 on Debian 9 with pandas and dbt-postgres installed):

docker run -it --network=dv_tutorial_dbt_postgress_db-server-network --mount src="$(pwd)",target=/app,type=bind dv_tutorial_dbt_postgress-dbt bash

Now a command line in the dev container will open. In this container we run:

dbt init

You will be asked to provide some information:

  1. name: dv_tutorial
  2. choose 1 -> postgres
  3. as shown below (or save the content to /root/.dbt/profiles.yml in the container):
dv_tutorial:
  outputs:
    dev:
      dbname: dv_test
      host: sqlNode1
      pass: abcd
      port: 5432
      schema: dbo
      threads: 1
      type: postgres
      user: dev
  target: dev

Database connection

Let us now check the database connection and setup:

dbt debug

This should show a working setup and database connection.

debug image

Getting some test data from Kaggle

Now copy the data files:

  • Products.csv
  • SalesOrderItems.csv

from Kaggle (Link) and save them into the seed directory.

Load the raw data into the database

Next we load the raw data in the database run:

dbt seed

This takes a moment and will insert the data into the database.

Note: Use raw stage sqls to generate a combined table from multiple sources. This allows to pull the data in one table and then use the staging sql statements to generate a staging table.

seed image

In the image below we can see that the raw tables are loaded into the database.

seed image db]

dbt allows you to run separate stage scripts so you can trace step by step what you are doing. If you want to run everything at once: dbt run.

Creating the staging tables

# dbt run -s name_of_component
dbt run -s stg_product  --full-refresh # if you put a + in front of stg_product it also runs raw sql if you have that
dbt run -s stg_salesorderitems  --full-refresh 

staging image

Loading Raw Vault

Creating hubs

dbt run -s hub_product  --full-refresh 
dbt run -s hub_salesorderitems  --full-refresh 

raw vault

Creating links

dbt run -s lnk_product_salesorderitem  --full-refresh 

raw vault

Creating satellites

dbt run -s sat_product_details  --full-refresh 
dbt run -s sat_salesorder_details  --full-refresh 

raw vault

dbt auto doc generation

dbt can autogenerate docs.

dbt docs generate
dbt docs serve 

docs main

Data lineage view of the docs

docs lineage

Implementation details

docs detail

docs detail