Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating Airflow to Flyte #1481

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions docs/migrating_airflow_to_flyte.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
(migrating_airflow_to_flyte)=

# Migrating Airflow to Flyte

Flyte can compile Airflow tasks into Flyte tasks without changing code, which allows you
to migrate your Airflow DAGs to Flyte with minimal effort. This guide will walk you through
the process of migrating Airflow to Flyte.

## Prerequisites

- Install `flytekitplugins-airflow` in your python environment.
- Deploy an Airflow agent to your flyte cluster.

## Use Airflow tasks inside Flyte workflow
flytekit compiles Airflow tasks into Flyte tasks under the hood, so you can use
any Airflow sensor or operator inside a Flyte workflow.


```python
from flytekit import task, workflow
from airflow.operators.bash import BashOperator

@task
def say_hello() -> str:
return "Hello, World!"

@workflow
def airflow_wf():
flyte_task = say_hello()
airflow_task = BashOperator(task_id=f"airflow_bash_operator", bash_command="echo hello")
airflow_task >> flyte_task

if __name__ == "__main__":
print(f"Running airflow_wf() {airflow_wf()}")
```

## Run your Airflow tasks locally
Although Airflow doesn't support local execution, you can run your Airflow tasks locally using Flyte.

```bash
pyflyte run workflows.py airflow_wf
```

::: warn
Some Airflow operators may require certain permissions to execute. For instance, `DataprocCreateClusterOperator` requires the `dataproc.clusters.create` permission.
When running Airflow tasks locally, you may need to set up the necessary permissions locally for the task to execute successfully.
:::

## Move to production
Airflow workflows can be executed on a Flyte cluster using the `--remote` flag.
In this case, Flyte creates a pod in the Kubernetes cluster to run `say_hello` task, and then run
your Airflow `BashOperator` on the Airflow agent.

```bash
pyflyte run --remote workflows.py airflow_wf
```

## Configure Airflow connection
In the local execution, you can configure the [Airflow connection](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html) by setting the `AIRFLOW_CONN_{CONN_ID}` environment variable.
For example,
```bash
export AIRFLOW_CONN_MY_PROD_DATABASE='my-conn-type://login:password@host:port/schema?param1=val1&param2=val2'
```

In production, we recommend storing connections in a [secret Backend](https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/secrets-backend/index.html).
Make sure agent pod has the right permission (IAM role) to access the secret from external secrets backends.
Loading