Skip to content

Latest commit

 

History

History
118 lines (85 loc) · 3.82 KB

lab5.md

File metadata and controls

118 lines (85 loc) · 3.82 KB

Lab 5: Snapshots and Seeds

❗Remember to create a development branch lab-5 at the beginning of the lab and at the end commit your changes to it and then merge the branch back to main.

Kick-off discussion

  • Are there any tables at your company that would benefit from being snapshotted?
  • Are there any instances where having a snapshot table would have solved a problem at your company?
  • Share some of your best case when horror stories.
  • Are there any at your company that would benefit from a seed file?

1. Snapshot the customers table

Our application doesn't keep track of changes to any of the tables. When a record is updated, the prior state is lost as far as the application is concerned. From an analytics point of view, the full history would be really beneficial.

Set up a snapshot model on the customers source table (not the model).

Things to think about:

  • What type of snapshot strategy is best for this table?
  • What database and schema should it get built in?
👉 Section 1

(1) Create a file in the snapshots/ directory called customers_snapshot.sql that contains the following code (replace <initials> with your initials):

  {% snapshot customers_snapshot %}

  {{
      config(
      target_database='analytics',
      target_schema='snapshots_<initials>',
      unique_key='id',

      strategy='check',
      check_cols = 'all',
      )
  }}

  select
    *
  from {{ source('ecomm', 'customers') }}

  {% endsnapshot %}

(2) Execute dbt snapshot in the console to make sure your snapshot run correctly.

2. Snapshot the orders table

Similarly, we want a snapshot of the orders source table. Set another snapshot up for that table.

Things to think about:

  • What type of snapshot strategy is best for this table?
  • Should this snapshot get built in the same database and schema as the other snapshot?
👉 Section 2

(1) Create a file in the snapshots/ directory called orders_snapshot.sql that contains the following code (again, replace <initials> with your initials)):

  {% snapshot orders_snapshot %}

  {{
      config(
      target_database='analytics',
      target_schema='snapshots_<initials>',
      unique_key='id',

      strategy='timestamp',
      updated_at='_synced_at',
      )
  }}

  select
    *
  from {{ source('ecomm', 'orders') }}

  {% endsnapshot %}

(2) Execute dbt snapshot in the console to make sure your snapshots run correctly.

3. Add the stores seed file to our project

There's a store_id column on the orders table that we haven't leveraged yet. It looks like it should join to a stores table, but it doesn't seem to exist in our application database.

It turns out, the engineers haven't yet built that table.

Create a seed file with the store IDs and names. Add a number column to our orders model called store_name.

The store mappings are as follows:

  • Store ID 1: New York
  • Store ID 2: Los Angeles
  • Store ID 3: Dallas
👉 Section 3

(1) Create a file in the seeds/ directory called stores.csv that contains the following data:

  store_id,store_name
  1,New York
  2,London
  3,Tokyo

(2) Execute dbt seed in the console to make sure your seed uploads correctly. (3) You can now reference that data as {{ ref('stores') }}. Add code in your orders model that adds a store_name column. (4) Execute dbt run -s orders to make sure your updates run successfully.

Links and Walkthrough Guides

The following links will be useful for these exercises: