Refactor idea: Move queries of separate data artifacts into a single dbt model #277

atvaccaro · 2023-08-31T16:13:02Z

Currently, the generate_reports_data.py script queries several different tables (e.g. fct_monthly_reports_site_organization_gtfs_vendors and fct_daily_reports_site_organization_scheduled_service_summary) which are processed and "joined" together by being written into the same output folders. Rather than try to combine these artifacts and/or add validation with something like Pydantic on top of these existing queries, It should be possible to create a single dbt model whose grain is year-month-itp_id so rows are 1:1 with final report pages. BigQuery rows can contain JSON and arrays to represent the nested nature of some of this data.

If this model is implemented, the "data generation" script could consist of just querying this single model and writing a single artifact (with some additional fields added post-query, such as RT feed URLs, that are more difficult to do in BigQuery).

The text was updated successfully, but these errors were encountered:

atvaccaro mentioned this issue Aug 31, 2023

Refactor idea: Combine scripts and use Typer and Pydantic to improve user experience #278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor idea: Move queries of separate data artifacts into a single dbt model #277

Refactor idea: Move queries of separate data artifacts into a single dbt model #277

atvaccaro commented Aug 31, 2023

Refactor idea: Move queries of separate data artifacts into a single dbt model #277

Refactor idea: Move queries of separate data artifacts into a single dbt model #277

Comments

atvaccaro commented Aug 31, 2023