From 1bacd580343cd9f59280b2b72108d869abe07781 Mon Sep 17 00:00:00 2001 From: Tatiana Al-Chueyr Date: Thu, 14 Sep 2023 15:07:21 +0100 Subject: [PATCH] Add docs comparing Airflow and dbt concepts (#523) Recently @ReadytoRocc suggested we had documentation comparing dbt and Airflow concepts. This PR aims to address this gap. --- docs/configuration/lineage.rst | 2 +- docs/getting_started/dbt-airflow-concepts.rst | 37 +++++++++++++++++++ docs/getting_started/index.rst | 7 ++++ 3 files changed, 45 insertions(+), 1 deletion(-) create mode 100644 docs/getting_started/dbt-airflow-concepts.rst diff --git a/docs/configuration/lineage.rst b/docs/configuration/lineage.rst index 54f9ad46d..bf099f344 100644 --- a/docs/configuration/lineage.rst +++ b/docs/configuration/lineage.rst @@ -26,7 +26,7 @@ Otherwise, install Cosmos using ``astronomer-cosmos[openlineage]``. Configuration ------------- -If using Airflow 2.7, follow `these instructions `_ on how to configure OpenLineage. +If using Airflow 2.7, follow `the instructions `_ on how to configure OpenLineage. Otherwise, follow `these instructions `_. diff --git a/docs/getting_started/dbt-airflow-concepts.rst b/docs/getting_started/dbt-airflow-concepts.rst new file mode 100644 index 000000000..8dfe00582 --- /dev/null +++ b/docs/getting_started/dbt-airflow-concepts.rst @@ -0,0 +1,37 @@ +.. _dbt-airflow-concepts: + +Similar dbt & Airflow concepts +============================== + +While dbt is an open source tool for data transformations and analysis, using SQL, Airflow focuses on being a platform +for the development, scheduling and monitoring of batch-oriented workflows, using Python. Although both tools have many +differences, they also share similar concepts. + +This page aims to list some of these concepts and help those +who may be new to Airflow or dbt and are considering to use Cosmos. + + ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Airflow naming | dbt naming | Description | Differences | References | ++================+==============+=================================================================================+=============================================================================+======================================================================================+ +| DAG | Workflow | Pipeline (Direct Acyclic Graph) that contains a group of steps | Airflow expects upstream tasks to have passed to run downstream tasks. | https://airflow.apache.org/docs/apache-airflow/2.7.1/core-concepts/dags.html | +| | | | dbt can run a subset of tasks assuming upstream tasks were run. | https://docs.getdbt.com/docs/introduction | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Task | Node | Step within a pipeline (DAG or workflow) | In dbt, these are usually transformations that run on a remote database. | https://docs.getdbt.com/reference/node-selection/syntax | +| | | | In Airflow, steps can be anything, running locally in Airflow or remotely. | https://airflow.apache.org/docs/apache-airflow/2.7.1/core-concepts/tasks.html | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Language | Language | Programming or declarative language used to define pipelines and steps. | In dbt, users write SQL, YML and Python to define the steps of a pipeline. | https://docs.getdbt.com/docs/introduction#dbt-optimizes-your-workflow | +| | | | Airflow expects steps and pipelines are written in Python. | https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Variables | Variables | Key-value configuration that can be used in steps and avoids hard-coded values | | https://docs.getdbt.com/docs/build/project-variables | +| | | | | https://airflow.apache.org/docs/apache-airflow/2.7.1/core-concepts/variables.html | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Templating | Macros | Jinja templating used to access variables, configuration and reference steps | dbt encourages using jinja templating for control structures (if and for). | https://docs.getdbt.com/docs/build/jinja-macros | +| | | | Native in Airflow/Python, used to define variables, macros and filters. | https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Connection | Profile | Configuration to connect to databases or other services | | https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html | +| | | | | https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Providers | Adapter | Additional Python libraries that support specific databases or services | | https://airflow.apache.org/docs/apache-airflow-providers/ | +| | | | | https://docs.getdbt.com/guides/dbt-ecosystem/adapter-development/1-what-are-adapters | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ diff --git a/docs/getting_started/index.rst b/docs/getting_started/index.rst index 6f0325128..c71589ec2 100644 --- a/docs/getting_started/index.rst +++ b/docs/getting_started/index.rst @@ -11,6 +11,7 @@ Execution Modes Docker Execution Mode Kubernetes Execution Mode + dbt and Airflow Similar Concepts Getting Started @@ -38,3 +39,9 @@ For specific guides, see the following: - `Executing dbt DAGs with Docker Operators `__ - `Executing dbt DAGs with KubernetesPodOperators `__ + + +Concepts Overview +----------------- + +How do dbt and Airflow concepts map to each other? Learn more `in this link `__.