From 96003e5884f8b61aff76e9a29f18c97eab19db0c Mon Sep 17 00:00:00 2001 From: Tatiana Al-Chueyr Date: Fri, 7 Jun 2024 14:25:00 +0100 Subject: [PATCH] Fix docs so it does not reference non-existing `get_dbt_dataset` (#1034) [The documentation](https://astronomer.github.io/astronomer-cosmos/configuration/scheduling.html) was outdated. The method `get_dbt_dataset` no longer exists. It used to exist in older versions of Cosmos (before 1.1) when the URIs respected the format: `Dataset(f"DBT://{connection_id.upper()}/{project_name.upper()}/{model_name.upper()}")` More information on why we changed this: https://github.com/astronomer/astronomer-cosmos/issues/305 Closes: #1032 (cherry picked from commit c47e1049cf4aa6457591a610e12c5dca97e1d169) --- .github/ISSUE_TEMPLATE/01-bug.yml | 2 +- .github/ISSUE_TEMPLATE/02-feature.yml | 3 ++- docs/configuration/scheduling.rst | 19 ++++++++++++------- 3 files changed, 15 insertions(+), 9 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/01-bug.yml b/.github/ISSUE_TEMPLATE/01-bug.yml index 658d0b9cb..4a5517338 100644 --- a/.github/ISSUE_TEMPLATE/01-bug.yml +++ b/.github/ISSUE_TEMPLATE/01-bug.yml @@ -1,7 +1,7 @@ --- name: Bug Report description: File a bug report. -title: "[Bug]: " +title: "[Bug] " labels: ["bug", "triage-needed"] body: - type: markdown diff --git a/.github/ISSUE_TEMPLATE/02-feature.yml b/.github/ISSUE_TEMPLATE/02-feature.yml index e179d357d..f8cd9e24d 100644 --- a/.github/ISSUE_TEMPLATE/02-feature.yml +++ b/.github/ISSUE_TEMPLATE/02-feature.yml @@ -1,7 +1,8 @@ --- name: Feature request description: Suggest an idea for this project -labels: ["enhancement", "needs-triage"] +title: "[Feature] " +labels: ["enhancement", "triage-needed"] body: - type: markdown attributes: diff --git a/docs/configuration/scheduling.rst b/docs/configuration/scheduling.rst index a1275ee19..d96930395 100644 --- a/docs/configuration/scheduling.rst +++ b/docs/configuration/scheduling.rst @@ -24,11 +24,17 @@ To schedule a dbt project on a time-based schedule, you can use Airflow's schedu Data-Aware Scheduling --------------------- -By default, Cosmos emits `Airflow Datasets `_ when running dbt projects. This allows you to use Airflow's data-aware scheduling capabilities to schedule your dbt projects. Cosmos emits datasets in the following format: +Apache Airflow 2.4 introduced the concept of `scheduling based on Datasets `_. + +By default, if Airflow 2.4 or higher is used, Cosmos emits `Airflow Datasets `_ when running dbt projects. This allows you to use Airflow's data-aware scheduling capabilities to schedule your dbt projects. Cosmos emits datasets using the OpenLineage URI format, as detailed in the `OpenLineage Naming Convention `_. + +Cosmos calculates these URIs during the task execution, by using the library `OpenLineage Integration Common `_. + +This block illustrates a Cosmos-generated dataset for Postgres: .. code-block:: python - Dataset("DBT://{connection_id}/{project_name}/{model_name}") + Dataset("postgres://host:5432/database.schema.table") For example, let's say you have: @@ -36,11 +42,13 @@ For example, let's say you have: - A dbt project (``project_one``) with a model called ``my_model`` that runs daily - A second dbt project (``project_two``) with a model called ``my_other_model`` that you want to run immediately after ``my_model`` +We are assuming that the Database used is Postgres, the host is ``host``, the database is ``database`` and the schema is ``schema``. + Then, you can use Airflow's data-aware scheduling capabilities to schedule ``my_other_model`` to run after ``my_model``. For example, you can use the following DAGs: .. code-block:: python - from cosmos import DbtDag, get_dbt_dataset + from cosmos import DbtDag project_one = DbtDag( # ... @@ -49,10 +57,7 @@ Then, you can use Airflow's data-aware scheduling capabilities to schedule ``my_ ) project_two = DbtDag( - # for airflow <=2.3 - # schedule=[get_dbt_dataset("my_conn", "project_one", "my_model")], - # for airflow > 2.3 - schedule=[get_dbt_dataset("my_conn", "project_one", "my_model")], + schedule=[Dataset("postgres://host:5432/database.schema.my_model")], dbt_project_name="project_two", )