Skip to content

Commit

Permalink
refactor the way subschemas are handled
Browse files Browse the repository at this point in the history
  • Loading branch information
MaxHalford committed Oct 18, 2023
1 parent d3f92fa commit 15e8cfc
Show file tree
Hide file tree
Showing 26 changed files with 194 additions and 141 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@
*.db
.env
dist/
/*.ipynb
.DS_Store
26 changes: 19 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ lea aims to be simple and opinionated, and yet offers the possibility to be exte
Right now lea is compatible with BigQuery (used at Carbonfact) and DuckDB (quack quack).

- [Example](#example)
- [Teaser](#teaser)
- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
Expand All @@ -53,6 +54,8 @@ Right now lea is compatible with BigQuery (used at Carbonfact) and DuckDB (quack

- [Jaffle shop 🥪](examples/jaffle_shop/)

## Teaser

## Installation

```sh
Expand All @@ -69,7 +72,6 @@ lea is configured by setting environment variables. The following variables are

```sh
# General configuration
LEA_SCHEMA=kaya
LEA_USERNAME=max
LEA_WAREHOUSE=bigquery

Expand All @@ -79,6 +81,7 @@ LEA_DUCKDB_PATH=duckdb.db
# BigQuery 🦏
LEA_BQ_LOCATION=EU
LEA_BQ_PROJECT_ID=carbonfact-dwh
LEA_BQ_DATASET_NAME=kaya
LEA_BQ_SERVICE_ACCOUNT=<a JSON dump of the service account file>
```

Expand Down Expand Up @@ -117,7 +120,7 @@ views/
table_6.sql
```

Each view will be named according to its location, following the warehouse convention. For instance, `schema_1/table_1.sql` will be named `schema_1__table_1` in BigQuery and DuckDB.
Each view will be named according to its location, following the warehouse convention. For instance, `schema_1/table_1.sql` will be named `dataset.schema_1__table_1` in BigQuery and `schema_1.table_1` in DuckDB.

The schemas are expected to be placed under a `views` directory. This can be changed by providing an argument to the `run` command:

Expand Down Expand Up @@ -165,6 +168,12 @@ You can select all views in a schema:
lea run --only core/
```

This also work with sub-schemas:

```sh
lea run --only analytics.finance/
```

There are thus 8 possible operators:

```
Expand Down Expand Up @@ -215,10 +224,10 @@ lea test views
There are two types of tests:

- Singular tests -- these are queries which return failing rows. They are stored in a `tests` directory.
- Annotation tests -- these are comment annotations in the queries themselves:
- Assertion tests -- these are comment annotations in the queries themselves:
- `@UNIQUE` -- checks that a column's values are unique.

As with the `run` command, there is a `--production` flag to disable the `<user>` suffix.
As with the `run` command, there is a `--production` flag to disable the `<user>` suffix and thus test production data.

### `lea docs`

Expand Down Expand Up @@ -304,6 +313,8 @@ lea is meant to be used as a CLI. But you can import it as a Python library too.
>>> for view in sorted(views, key=str):
... print(view)
... print(sorted(view.dependencies))
analytics.finance.kpis
[('core', 'orders')]
analytics.kpis
[('core', 'customers'), ('core', 'orders')]
core.customers
Expand All @@ -328,14 +339,15 @@ staging.payments
>>> views = [v for v in views if v.schema != 'tests']
>>> dag = lea.views.DAGOfViews(views)
>>> while dag.is_active():
... for schema, table in sorted(dag.get_ready()):
... print(f'{schema}.{table}')
... dag.done((schema, table))
... for node in sorted(dag.get_ready()):
... print(dag[node])
... dag.done(node)
staging.customers
staging.orders
staging.payments
core.customers
core.orders
analytics.finance.kpis
analytics.kpis

```
Expand Down
9 changes: 7 additions & 2 deletions examples/jaffle_shop/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,24 @@ The first thing to do is create an `.env` file, as so:

```sh
echo "
LEA_SCHEMA=jaffle_shop
LEA_USERNAME=max
LEA_WAREHOUSE=duckdb
LEA_DUCKDB_PATH=duckdb.db
" > .env
```

Next, run the following command to create the `duckdb.db` file and the `jaffle_shop` schema therein:
Next, run the following command to create the `jaffle_shop.db` file and the schemas therein:

```sh
lea prepare
```

```
Created schema analytics_max
Created schema core_max
Created schema staging_max
```

Now you can run the views:

```sh
Expand Down
3 changes: 2 additions & 1 deletion examples/jaffle_shop/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
## Schemas

- [`analytics`](./analytics)
- [`core`](./core)
- [`staging`](./staging)
- [`core`](./core)

## Schema flowchart

Expand Down Expand Up @@ -32,6 +32,7 @@ flowchart TB
subgraph staging
end
core.orders --> analytics.finance.kpis
core.customers --> analytics.kpis
core.orders --> analytics.kpis
staging.customers --> core.customers
Expand Down
19 changes: 16 additions & 3 deletions examples/jaffle_shop/docs/analytics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,28 @@

## Table of contents

- [kpis](#kpis)
- [analytics.finance.kpis](#analytics.finance.kpis)
- [analytics.kpis](#analytics.kpis)

## Views

### kpis
### analytics.finance.kpis

```sql
SELECT *
FROM jaffle_shop_max.analytics__kpis
FROM analytics_max.finance__kpis
```

| Column | Type | Description | Unique |
|:--------------------|:---------|:--------------|:---------|
| average_order_value | `DOUBLE` | | |
| total_order_value | `DOUBLE` | | |

### analytics.kpis

```sql
SELECT *
FROM analytics_max.kpis
```

| Column | Type | Description | Unique |
Expand Down
12 changes: 6 additions & 6 deletions examples/jaffle_shop/docs/core/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@

## Table of contents

- [customers](#customers)
- [orders](#orders)
- [core.customers](#core.customers)
- [core.orders](#core.orders)

## Views

### customers
### core.customers

```sql
SELECT *
FROM jaffle_shop_max.core__customers
FROM core_max.customers
```

| Column | Type | Description | Unique |
Expand All @@ -24,11 +24,11 @@ FROM jaffle_shop_max.core__customers
| most_recent_order | `VARCHAR` | | |
| number_of_orders | `BIGINT` | | |

### orders
### core.orders

```sql
SELECT *
FROM jaffle_shop_max.core__orders
FROM core_max.orders
```

| Column | Type | Description | Unique |
Expand Down
18 changes: 9 additions & 9 deletions examples/jaffle_shop/docs/staging/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,19 @@

## Table of contents

- [customers](#customers)
- [orders](#orders)
- [payments](#payments)
- [staging.customers](#staging.customers)
- [staging.orders](#staging.orders)
- [staging.payments](#staging.payments)

## Views

### customers
### staging.customers

Docstring for the customers view.

```sql
SELECT *
FROM jaffle_shop_max.staging__customers
FROM staging_max.customers
```

| Column | Type | Description | Unique |
Expand All @@ -23,13 +23,13 @@ FROM jaffle_shop_max.staging__customers
| first_name | `VARCHAR` | | |
| last_name | `VARCHAR` | | |

### orders
### staging.orders

Docstring for the orders view.

```sql
SELECT *
FROM jaffle_shop_max.staging__orders
FROM staging_max.orders
```

| Column | Type | Description | Unique |
Expand All @@ -39,11 +39,11 @@ FROM jaffle_shop_max.staging__orders
| order_id | `BIGINT` | | |
| status | `VARCHAR` | | |

### payments
### staging.payments

```sql
SELECT *
FROM jaffle_shop_max.staging__payments
FROM staging_max.payments
```

| Column | Type | Description | Unique |
Expand Down
4 changes: 4 additions & 0 deletions examples/jaffle_shop/views/analytics/finance/kpis.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
SELECT
SUM(amount) AS total_order_value,
AVG(amount) AS average_order_value
FROM core.orders
4 changes: 2 additions & 2 deletions examples/jaffle_shop/views/analytics/kpis.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@ SELECT
'n_customers' AS metric,
COUNT(*) AS value
FROM
jaffle_shop.core__customers
core.customers

UNION ALL

SELECT
'n_orders' AS metric,
COUNT(*) AS value
FROM
jaffle_shop.core__orders
core.orders
6 changes: 3 additions & 3 deletions examples/jaffle_shop/views/core/customers.sql
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
with customers as (

select * from jaffle_shop.staging__customers
select * from staging.customers

),

orders as (

select * from jaffle_shop.staging__orders
select * from staging.orders

),

payments as (

select * from jaffle_shop.staging__payments
select * from staging.payments

),

Expand Down
4 changes: 2 additions & 2 deletions examples/jaffle_shop/views/core/orders.sql.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

with orders as (

select * from jaffle_shop.staging__orders
select * from staging.orders

),

payments as (

select * from jaffle_shop.staging__payments
select * from staging.payments

),

Expand Down
2 changes: 1 addition & 1 deletion examples/jaffle_shop/views/tests/orders_are_dated.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
SELECT *
FROM jaffle_shop.core__orders
FROM core.orders
WHERE order_date IS NULL
16 changes: 12 additions & 4 deletions lea/app/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
import rich.console
import typer

import lea

app = typer.Typer()
console = rich.console.Console()

Expand All @@ -27,10 +29,12 @@ def env_validate_callback(env_path: str | None):


@app.command()
def prepare(production: bool = False, env: str = EnvPath):
""" """
def prepare(views_dir: str = ViewsDir, production: bool = False, env: str = EnvPath):
client = _make_client(production)
client.prepare(console)
views = lea.views.load_views(views_dir, sqlglot_dialect=client.sqlglot_dialect)
views = [view for view in views if view.schema not in {"tests", "funcs"}]

client.prepare(views, console)


@app.command()
Expand Down Expand Up @@ -64,9 +68,13 @@ def run(
# The client determines where the views will be written
client = _make_client(production)

# Load views
views = lea.views.load_views(views_dir, sqlglot_dialect=client.sqlglot_dialect)
views = [view for view in views if view.schema not in {"tests", "funcs"}]

run(
client=client,
views_dir=views_dir,
views=views,
only=only,
dry=dry,
print_to_cli=print,
Expand Down
Loading

0 comments on commit 15e8cfc

Please sign in to comment.