❗Remember to create a development branch lab-9
at the beginning of the lab and at the end commit your changes to it and then merge the branch back to main
.
- How much of what we just discussed is already in place at your company?
- How much of what we just discussed would be useful to implement at your company?
- What would be the right fields to cluster by in the
customers
model? - What would be the right fields to cluster by in the
orders
model?
Despite making the orders
model incremental, it's still taking too long to build. Add a configuration to the orders model that has it run with the larger COMPUTE_WH_M
warehouse.
Things to think about:
- How do you verify that it's actually building with the new warehouse setting?
👉 Section 1
(1) Change the config in our orders
model by adding the following:
snowflake_warehouse='COMPUTE_WH_M'
(2) Execute dbt run -s orders
. Can you see your query in the Snowflake query history with the larger warehouse?
Because dbt queries can come from a long-list of users (all of you), we want to add a query tag dbt_run
to all queries executed by dbt.
Add the query tag to your project and then verify that it's working.
👉 Section 2
(1) To add this config to all our models, we'll want to make the change in our dbt_project.yml
file. We need it to be under the models
key:
models:
+query_tag: 'dbt_run'
(2) Execute dbt run
. Can you see the query tags in Snowflake?
We want to calculate the total amount of orders placed each day. We want to make this calculation dynamic so that we don't need to run the dbt model every time we want the latest data. To achieve this, we'll create a new model daily_orders
that uses the dynamic_table
materialization. But first, we need to clone the RAW.ECOMM.ORDERS
table to our development schema so that we can emulate new orders being placed without affecting the production data.
👉 Section 3
(1) Create a zero copy clone of the RAW.ECOMM.ORDERS
table in your development schema:
create table analytics.dbt_<first_initial><last_name>.raw_orders clone raw.ecomm.orders;
(2) Create a new model daily_orders
that calculates the total amount of orders placed each day. Use the dynamic_table
materialization:
{{
config(
materialized='dynamic_table',
target_lag='1 minute',
snowflake_warehouse='compute_wh'
)
}}
-- Note that we're directly referring to the cloned table without using the source macro
-- This is a bad practice and you should avoid it in your own projects
-- In this case, we're doing it to simplify the lab
with orders as (
select
*
from analytics.dbt_<first_initial><last_name>.raw_orders
),
daily_orders as (
select
created_at::date as order_date,
count(*) as order_count
from orders
group by 1
),
final as (
select
*
from daily_orders
)
select
*
from final
(3) Run the daily_orders
model and verify that it's working as expected:
dbt run -s daily_orders
(4) Emulate a new order being placed by inserting a row into the raw_orders
table. Note that since we're inserting into the cloned table, we're not affecting the production data.
insert into analytics.dbt_<first_initial><last_name>.raw_orders (total_amount, created_at)
select uniform(5, 100, random()), current_timestamp();
(5) Verify that the new order is reflected in the daily_orders
model. Is it showing up? If not, why?
select
*
from analytics.dbt_<first_initial><last_name>.daily_orders
order by order_date desc
(6) Add a total_amount
column to the daily_orders
model and rerun the model. Can you see the new column? Why not? Fix it!
The daily_orders
dynamic table model is being automatically updated every minute and that would keep the warehouse running 24/7. Your task is to delete the daily_orders
table to avoid incurring unnecessary costs.
👉 Section 4
(1) Drop the daily_orders
table:
drop table analytics.dbt_<first_initial><last_name>.daily_orders;
The following links will be useful for these exercises: