Skip to content

Commit

Permalink
Fix docs so it does not reference non-existing get_dbt_dataset (#1034)
Browse files Browse the repository at this point in the history
[The
documentation](https://astronomer.github.io/astronomer-cosmos/configuration/scheduling.html)
was outdated.

The method `get_dbt_dataset` no longer exists. It used to exist in older
versions of Cosmos (before 1.1) when the URIs respected the format:

`Dataset(f"DBT://{connection_id.upper()}/{project_name.upper()}/{model_name.upper()}")`

More information on why we changed this:
#305

Closes: #1032
(cherry picked from commit c47e104)
  • Loading branch information
tatiana authored and pankajkoti committed Jun 7, 2024
1 parent 67071a1 commit 96003e5
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 9 deletions.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/01-bug.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: Bug Report
description: File a bug report.
title: "[Bug]: "
title: "[Bug] "
labels: ["bug", "triage-needed"]
body:
- type: markdown
Expand Down
3 changes: 2 additions & 1 deletion .github/ISSUE_TEMPLATE/02-feature.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
---
name: Feature request
description: Suggest an idea for this project
labels: ["enhancement", "needs-triage"]
title: "[Feature] "
labels: ["enhancement", "triage-needed"]
body:
- type: markdown
attributes:
Expand Down
19 changes: 12 additions & 7 deletions docs/configuration/scheduling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,23 +24,31 @@ To schedule a dbt project on a time-based schedule, you can use Airflow's schedu
Data-Aware Scheduling
---------------------

By default, Cosmos emits `Airflow Datasets <https://airflow.apache.org/docs/apache-airflow/stable/concepts/datasets.html>`_ when running dbt projects. This allows you to use Airflow's data-aware scheduling capabilities to schedule your dbt projects. Cosmos emits datasets in the following format:
Apache Airflow 2.4 introduced the concept of `scheduling based on Datasets <https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html>`_.

By default, if Airflow 2.4 or higher is used, Cosmos emits `Airflow Datasets <https://airflow.apache.org/docs/apache-airflow/stable/concepts/datasets.html>`_ when running dbt projects. This allows you to use Airflow's data-aware scheduling capabilities to schedule your dbt projects. Cosmos emits datasets using the OpenLineage URI format, as detailed in the `OpenLineage Naming Convention <https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md>`_.

Cosmos calculates these URIs during the task execution, by using the library `OpenLineage Integration Common <https://pypi.org/project/openlineage-integration-common/>`_.

This block illustrates a Cosmos-generated dataset for Postgres:

.. code-block:: python
Dataset("DBT://{connection_id}/{project_name}/{model_name}")
Dataset("postgres://host:5432/database.schema.table")
For example, let's say you have:

- A dbt project (``project_one``) with a model called ``my_model`` that runs daily
- A second dbt project (``project_two``) with a model called ``my_other_model`` that you want to run immediately after ``my_model``

We are assuming that the Database used is Postgres, the host is ``host``, the database is ``database`` and the schema is ``schema``.

Then, you can use Airflow's data-aware scheduling capabilities to schedule ``my_other_model`` to run after ``my_model``. For example, you can use the following DAGs:

.. code-block:: python
from cosmos import DbtDag, get_dbt_dataset
from cosmos import DbtDag
project_one = DbtDag(
# ...
Expand All @@ -49,10 +57,7 @@ Then, you can use Airflow's data-aware scheduling capabilities to schedule ``my_
)
project_two = DbtDag(
# for airflow <=2.3
# schedule=[get_dbt_dataset("my_conn", "project_one", "my_model")],
# for airflow > 2.3
schedule=[get_dbt_dataset("my_conn", "project_one", "my_model")],
schedule=[Dataset("postgres://host:5432/database.schema.my_model")],
dbt_project_name="project_two",
)
Expand Down

0 comments on commit 96003e5

Please sign in to comment.