Skip to content

Commit

Permalink
Including 2 new references to the required config block to ensure use…
Browse files Browse the repository at this point in the history
…rs are aware (#434)
  • Loading branch information
jhpyke authored Nov 14, 2024
1 parent 0338cd1 commit e1d8837
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 7 deletions.
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Deploying to Dev

## Important Note - Before you deploy

As detailed in the section on [models](/tools/create-a-derived-table/models), you will need to add the standardised **config block** to your model before attempting to deploy. The block is as follows:

```
{{ config(
external_location=generate_s3_location()
) }}
```

If you do not include this block, your models will attempt to write to an invalid path in S3, and thusly will fail to deploy. You can include other config values in the block if doing so is useful for your model, but all models **must** include the `external_location=generate_s3_location()` statement to deploy succesfully.

You can run any dbt command in the terminal in RStudio (JupyterLab coming soon) to deploy models and seeds, or to run tests. When you deploy models and seeds from RStudio the database they are built in to will be suffixed with `_dev_dbt` and the underlying data that gets generated will be written to the following S3 path:

```
Expand Down Expand Up @@ -68,8 +80,7 @@ To run tests on models with tests defined, run:
dbt test --select models/.../path/to/my/models/
```


## <a id="using-the-plus-prefix"></a>Using the + prefix
## <a id="using-the-plus-prefix"></a>Using the + prefix

The `+` prefix is a `dbt` syntax feature which helps disambiguate between resource paths and configurations in the `dbt_project.yml` file. If you see it used in the `dbt_project.yml` file and wonder what it is, read [dbt's guidance on using the `+` prefix](https://docs.getdbt.com/reference/resource-configs/plus-prefix). It is also used to configure properties in a nested dictionary which take a dictionary of values in a model, seed or test config `.yaml`. For example, use `+column_types` rather than `column_types` since what follows are further key and value pairs defining the column names and the required data type. It doesn't hurt to use `+` prefix so it is recommended to always do so.

Expand All @@ -89,7 +100,6 @@ models:
column_3: string
```


## How to use the incremental materialisation with the append strategy

You may want your final derived table to retain previous versions of itself and not be overwritten each time your table is deployed. The following example will detail how you can achieve creating snapshots of the data and partitioning the table by those snapshots.
Expand Down Expand Up @@ -135,4 +145,4 @@ group by table_snapshot_date

You can also inspect the s3 bucket and folder where your data will be saved. In the case of this example it would be `mojap_derived_tables/dev/models/domain_name=some_domain/database_name=some_database/table_name=final_derived_table/`. You'd expect to see a number of timestamped folders each containing a partition of your table's data (based on how many times you've run your models).

If you want to run your models and disregard all previous snapshots you should add the flag `--full-refresh` to `dbt run`, e.g. `dbt run --select models/some_domain/some_database/some_database__final_dervied_table.sql --full-refresh`.
If you want to run your models and disregard all previous snapshots you should add the flag `--full-refresh` to `dbt run`, e.g. `dbt run --select models/some_domain/some_database/some_database__final_dervied_table.sql --full-refresh`.
18 changes: 15 additions & 3 deletions source/documentation/tools/create-a-derived-table/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ models:
partitioned_by: ['snapshot_date']
+column_types:
column_1: varchar(5)
column_2: int
column_2: int
```

If for some reason it is not possible or reasonable to apply a configuration in a property file, you can use a `config()` Jinja macro within a model or test SQL file. The following example shows how the same configuration above can be applied in a model or test file.
Expand All @@ -70,20 +70,32 @@ If for some reason it is not possible or reasonable to apply a configuration in
materialized='incremental'
incremental_strategy='append'
partitioned_by=['snapshot_date']
external_location=generate_s3_location()
)
}}
```

## Important - Required Config

In order to ensure that data is stored in an orderly manner within the bucket, we enforce a specific naming convention through use of a build in macro to generate the file path your data will be saved in S3 with. This is done through the config block within the SQL definition of the model, and is usually seen as starting the file with the following:

```
{{ config(
external_location=generate_s3_location()
) }}
```

Although it is possible to apply further config values by the methods detailed before as an optional feature, the config block itself **must** feature the `external_location=generate_s3_location()` statement at a minimum. Failing to supply this config value will cause the table to attempt to deploy to an incorrect location and then fail with a generic `access denied` error.

## Config inheritance

Configurations are prioritised in order of specificity, which is generally the inverse of the order above: an in-file `config()` block takes precedence over properties defied in a `.yaml` property file, which takes precedence over a configuration defined in the `dbt_project.yml` file. (Note that generic tests work a little differently when it comes to specificity. See dbt's documentation on [test configs](https://docs.getdbt.com/reference/test-configs).)


## Materialisations

Materialisations are strategies for persisting dbt models in a warehouse. There are four types of materializations built into dbt. They are:

- [table](https://docs.getdbt.com/docs/build/materializations#table)
- [view](https://docs.getdbt.com/docs/build/materializations#view) ⚠️ not currently supported ⚠️
- [incremental](https://docs.getdbt.com/docs/build/materializations#incremental)
- [ephemeral](https://docs.getdbt.com/docs/build/materializations#ephemeral)
- [ephemeral](https://docs.getdbt.com/docs/build/materializations#ephemeral)

0 comments on commit e1d8837

Please sign in to comment.