diff --git a/.github/ISSUE_TEMPLATE/01-bug.yml b/.github/ISSUE_TEMPLATE/01-bug.yml index 658d0b9cb..4a5517338 100644 --- a/.github/ISSUE_TEMPLATE/01-bug.yml +++ b/.github/ISSUE_TEMPLATE/01-bug.yml @@ -1,7 +1,7 @@ --- name: Bug Report description: File a bug report. -title: "[Bug]: " +title: "[Bug] " labels: ["bug", "triage-needed"] body: - type: markdown diff --git a/.github/ISSUE_TEMPLATE/02-feature.yml b/.github/ISSUE_TEMPLATE/02-feature.yml index e179d357d..f8cd9e24d 100644 --- a/.github/ISSUE_TEMPLATE/02-feature.yml +++ b/.github/ISSUE_TEMPLATE/02-feature.yml @@ -1,7 +1,8 @@ --- name: Feature request description: Suggest an idea for this project -labels: ["enhancement", "needs-triage"] +title: "[Feature] " +labels: ["enhancement", "triage-needed"] body: - type: markdown attributes: diff --git a/docs/configuration/scheduling.rst b/docs/configuration/scheduling.rst index a1275ee19..d96930395 100644 --- a/docs/configuration/scheduling.rst +++ b/docs/configuration/scheduling.rst @@ -24,11 +24,17 @@ To schedule a dbt project on a time-based schedule, you can use Airflow's schedu Data-Aware Scheduling --------------------- -By default, Cosmos emits `Airflow Datasets `_ when running dbt projects. This allows you to use Airflow's data-aware scheduling capabilities to schedule your dbt projects. Cosmos emits datasets in the following format: +Apache Airflow 2.4 introduced the concept of `scheduling based on Datasets `_. + +By default, if Airflow 2.4 or higher is used, Cosmos emits `Airflow Datasets `_ when running dbt projects. This allows you to use Airflow's data-aware scheduling capabilities to schedule your dbt projects. Cosmos emits datasets using the OpenLineage URI format, as detailed in the `OpenLineage Naming Convention `_. + +Cosmos calculates these URIs during the task execution, by using the library `OpenLineage Integration Common `_. + +This block illustrates a Cosmos-generated dataset for Postgres: .. code-block:: python - Dataset("DBT://{connection_id}/{project_name}/{model_name}") + Dataset("postgres://host:5432/database.schema.table") For example, let's say you have: @@ -36,11 +42,13 @@ For example, let's say you have: - A dbt project (``project_one``) with a model called ``my_model`` that runs daily - A second dbt project (``project_two``) with a model called ``my_other_model`` that you want to run immediately after ``my_model`` +We are assuming that the Database used is Postgres, the host is ``host``, the database is ``database`` and the schema is ``schema``. + Then, you can use Airflow's data-aware scheduling capabilities to schedule ``my_other_model`` to run after ``my_model``. For example, you can use the following DAGs: .. code-block:: python - from cosmos import DbtDag, get_dbt_dataset + from cosmos import DbtDag project_one = DbtDag( # ... @@ -49,10 +57,7 @@ Then, you can use Airflow's data-aware scheduling capabilities to schedule ``my_ ) project_two = DbtDag( - # for airflow <=2.3 - # schedule=[get_dbt_dataset("my_conn", "project_one", "my_model")], - # for airflow > 2.3 - schedule=[get_dbt_dataset("my_conn", "project_one", "my_model")], + schedule=[Dataset("postgres://host:5432/database.schema.my_model")], dbt_project_name="project_two", )