Skip to content

Commit

Permalink
3688 docs rfc writing test for anagha (#3689)
Browse files Browse the repository at this point in the history
* feat: add Apache Airflow integration


Signed-off-by: Iain Cox <[email protected]>
Co-authored-by: Anastasiia Tovpeko <[email protected]>
  • Loading branch information
billy-the-fish and atovpeko committed Jan 17, 2025
1 parent b8f1db3 commit 8167561
Show file tree
Hide file tree
Showing 3 changed files with 165 additions and 0 deletions.
150 changes: 150 additions & 0 deletions use-timescale/integrations/apache-airflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
---
title: Integrate Apache Airflow with Timescale Cloud
excerpt: How to install the psql client for PostgreSQL
products: [cloud, mst, self_hosted]
keywords: [connect, integrate, apache, airflow]
---

import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx";

# Integrate Apache Airflow with $CLOUD_LONG

Apache Airflow® is a platform created by the community to programmatically author, schedule, and monitor workflows.

A [DAG (Directed Acyclic Graph)][Airflow-DAG] is the core concept of Airflow, collecting [Tasks][Airflow-Task] together,
organized with dependencies and relationships to say how they should run. You declare a DAG in a Python file
in the `$AIRFLOW_HOME/dags` folder of your Airflow instance.

This page shows you how to use a Python connector in a DAG to integrate Apache Airflow with a $SERVICE_LONG.

## Prerequisites

<IntegrationPrereqs />

* [Install Python3 and pip3][install-python-pip]
* [Install Apache Airflow][install-apache-airflow]

Ensure that your Airflow instance has network access to $CLOUD_LONG.

This example DAG uses the `company` table you create in [Create regular PostgreSQL tables for relational data][create-a-table-in-timescale]

## Install python connectivity libraries

To install the Python libraries required to connect to $CLOUD_LONG:

<Procedure>

1. **Enable PostgreSQL connections between Airflow and $CLOUD_LONG**

```bash
pip install psycopg2-binary
```

1. **Enable PostgreSQL connection types in the Airflow UI**

```bash
pip install apache-airflow-providers-postgres
```

</Procedure>

## Create a connection between Airflow and your $SERVICE_LONG

In your Airflow instance, securely connect to your $SERVICE_LONG:

<Procedure>

1. **Run Airflow**

On your development machine, run the following command:

```bash
airflow standalone
```

The username and password for Airflow UI are displayed in the `standalone | Login with username`
line in the output.

1. **Add a connection from Airflow to your $SERVICE_LONG**

1. In your browser, navigate to `localhost:8080`, then select `Admin` > `Connections`.
1. Click `+` (Add a new record), then use your [connection info][connection-info] to fill in
the form. The `Connection Type` is `Postgres`.

</Procedure>

## Exchange data between Airflow and your $SERVICE_LONG

To exchange data between Airflow and your $SERVICE_LONG:

<Procedure>

1. **Create and execute a DAG**

To insert data in your $SERVICE_LONG from Airflow:
1. In `$AIRFLOW_HOME/dags/timescale_dag.py`, add the following code:

```python
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.hooks.postgres_hook import PostgresHook
from datetime import datetime
def insert_data_to_timescale():
hook = PostgresHook(postgres_conn_id='the ID of the connenction you created')
conn = hook.get_conn()
cursor = conn.cursor()
"""
This could be any query. This example inserts data into the table
you create in:
https://docs.timescale.com/getting-started/latest/tables-hypertables/#create-regular-postgresql-tables-for-relational-data
"""
cursor.execute("INSERT INTO company (symbol, name) VALUES (%s, %s)",
('new_company_symbol', 'New Company Name'))
conn.commit()
cursor.close()
conn.close()
default_args = {
'owner': 'airflow',
'start_date': datetime(2023, 1, 1),
'retries': 1,
}
dag = DAG('timescale_dag', default_args=default_args, schedule_interval='@daily')
insert_task = PythonOperator(
task_id='insert_data',
python_callable=insert_data_to_timescale,
dag=dag,
)
```
This DAG uses the `company` table created in [Create regular PostgreSQL tables for relational data][create-a-table-in-timescale].

1. In your browser, refresh the [Airflow UI][Airflow_UI].
1. In `Search DAGS`, type `timescale_dag` and press ENTER.
1. Press the play icon and trigger the DAG:
![daily eth volume of assets](https://assets.timescale.com/docs/images/integrations-apache-airflow.png)
1. **Verify that the data appears in $CLOUD_LONG**

1. In [Timescale Console][console], navigate to your service and click `SQL editor`.
1. Run a query to view your data. For example: `SELECT symbol, name FROM company;`.

You see the new rows inserted in the table.

</Procedure>

You have successfully integrated Apache Airflow with $CLOUD_LONG and created a data pipeline.


[create-a-table-in-timescale]: /getting-started/:currentVersion:/tables-hypertables/#create-regular-postgresql-tables-for-relational-data
[install-apache-airflow]: https://airflow.apache.org/docs/apache-airflow/stable/start.html
[install-python-pip]: https://docs.python.org/3/using/index.html
[console]: https://console.cloud.timescale.com/
[create-service]: /getting-started/:currentVersion:/services/
[enable-timescaledb]: /self-hosted/:currentVersion:/install/
[connection-info]: /use-timescale/:currentVersion:/integrations/find-connection-details/
[Airflow-DAG]: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#dags
[Airflow-Task]:https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html
[Airflow_UI]: localhost:8080
10 changes: 10 additions & 0 deletions use-timescale/integrations/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,15 @@ Some of the most in-demand integrations for $CLOUD_LONG are listed below, with l
|:---------------------------:|-----------------------------------------------------------------------------------------------------------------------------|
| [Terraform][terraform] | An infrastructure-as-code tool that enables you to safely and predictably provision and manage infrastructure in any cloud. |


## Data engineering and extract, transform, load

| Name | Description |
|:--------------------------------:|----------------------------------------------------------|
| [Apache Airflow][apache-airflow] | Programmatically author, schedule, and monitor workflows. |



[psql]: /use-timescale/:currentVersion:/integrations/psql/
[qstudio]: /use-timescale/:currentVersion:/integrations/qstudio/
[dbeaver]: /use-timescale/:currentVersion:/integrations/dbeaver/
Expand All @@ -49,4 +58,5 @@ Some of the most in-demand integrations for $CLOUD_LONG are listed below, with l
[grafana]: /use-timescale/:currentVersion:/integrations/grafana/
[tableau]: /use-timescale/:currentVersion:/integrations/tableau/
[terraform]: /use-timescale/:currentVersion:/integrations/terraform
[apache-airflow]: /use-timescale/:currentVersion:/integrations/apache-airflow
[postgresql-integrations]: https://slashdot.org/software/p/PostgreSQL/integrations/
5 changes: 5 additions & 0 deletions use-timescale/page-index/page-index.js
Original file line number Diff line number Diff line change
Expand Up @@ -796,6 +796,11 @@ module.exports = [
href: "find-connection-details",
excerpt: "Learn about connecting to your Timescale database",
},
{
title: "Apache Airflow",
href: "apache-airflow",
excerpt: "Integrate Apache Airflow with Timescale Cloud",
},
{
title: "Azure Data Studio",
href: "azure-data-studio",
Expand Down

0 comments on commit 8167561

Please sign in to comment.