❗Remember to create a development branch lab-2
at the beginning of the lab and at the end commit your changes to it and then merge the branch back to main
.
- What are instances where more testing would have been useful at your company? What was the issue? What test would you have added?
- What are some examples of SQL that you've written recently that might benefit from using Jinja?
When setting up your project, a models/example/
folder was created with models and tests. If you haven't already, please delete it now or it will pose problems during this lab.
We've got two models, orders
and customers
. Each should have a primary key. We want to make sure that the primary keys are unique and not null.
Things to think about:
- If the tests fail, is there a problem with our query?
👉 Section 1
(1) Add unique
and not_null
tests to the schema.yml
files. For the orders
table, it will contain the following information:
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
(2) Execute dbt test
in the console to make sure all the tests pass.
2. Add columns to your customers
model that contain how many orders the customer had in the last 30, 90 and 360 days.
More work for our retention team. They want to easily be able to see how order patterns affect customer behaviour. They want to be able to easily write a query that would tell them how many customers have more than 5 orders in the last 30 days.
Things to think about:
- Are there ways that Jinja could be helpful here?
👉 Section 2
(1) Given the SQL for the three columns will be almost identical, we could use a Jinja for
loop here. Add the following SQL to your customer_metrics
CTE in the customers
model:
{% for days in [30,90,360] %}
count(case when ordered_at > current_date - {{ days }} then 1 end) as count_orders_last_{{ days }}_days{% if not loop.last %},{% endif %}
{% endfor %}
(2) Add your three new columns to the joined
CTE.
(3) Execute dbt run
in the console to make sure everything runs successfully.
In the last lab, we added two columns to each of the orders
and customers
models. In theory, when populated, they should always be greater than zero. We'll need to write a custom schema test that ensures that's always the case.
👉 Section 3
(1) Given this feels like a test that will be broadly re-usable, we'll likely want to create a custom schema test. Create a new file in the macros/
directory called test_greater_than_zero.sql
that contains the following code:
{% test greater_than_zero(model, column_name) %}
select
*
from {{ model }}
where {{ column_name }} <= 0
{% endtest %}
(2) Add the tests to your schema.yml
file. For the column delivery_time_from_collection
in the orders
model, it would look as follows:
- name: orders
columns:
- name: delivery_time_from_collection
tests:
- greater_than_zero
(3) Execute dbt test
in the console to make sure all the tests pass.
4. Add a test to ensure that the number of orders in the last 90 days from our customers
table doesn't exceed the total number of orders in our orders
table.
Having added the new columns in step 2, we want to double-check that the sum of the column on the customers
model doesn't exceed the total number of orders in our orders
model.
Given the specificity of this test, we likely don't want to write a custom schema test. Could we use a data test to do it?
👉 Section 4
(1) Create a new file in the tests/
directory called count_orders_check.sql
that contains the following SQL:
with orders as (
select
count(*) as orders_count
from {{ ref('orders') }}
),
customers as (
select
sum(count_orders_last_90_days) as customers_count
from {{ ref('customers') }}
),
joined as (
select
*
from orders
cross join customers
where customers_count > orders_count
)
select
*
from joined
(2) Execute dbt test
in the console to make sure all the tests pass.
👉 What your dbt project should look like after this lab
analysis/
logs/
macros/
├─ test_greater_than_zero.sql
models/
├─ customers.sql
├─ orders.sql
├─ schema.yml
├─ sources.yml
├─ stg_ecomm__customers.sql
├─ stg_ecomm__deliveries.sql
├─ stg_ecomm__orders.sql
seeds/
snapshots/
target/
tests/
├─ count_orders_check.sql
.gitignore
dbt_project.yml
README.md
The following links will be useful for these exercises: