Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3688 docs rfc writing test for anagha #3689

Merged
merged 21 commits into from
Jan 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
aac2270
feat: create writing test for Anagha.
billy-the-fish Jan 6, 2025
4b63bb8
feat: create writing test for Anagha.
billy-the-fish Jan 6, 2025
4478234
feat: create writing test for Anagha.
billy-the-fish Jan 6, 2025
3bbcf83
Merge branch 'latest' into 3688-docs-rfc-writing-test-for-anaga
billy-the-fish Jan 6, 2025
2c73da5
chore: updates on review.
billy-the-fish Jan 10, 2025
6a38410
Merge branch 'latest' into 3688-docs-rfc-writing-test-for-anaga
billy-the-fish Jan 13, 2025
8bd4237
chore: updates on review.
billy-the-fish Jan 13, 2025
21e3cb6
Combineв psql pages into one
atovpeko Jan 13, 2025
92eb1e1
chore: redirects (#3704)
coelhucas Jan 13, 2025
bff1a0a
Flatten the structure of the integrations section (#3681)
atovpeko Jan 15, 2025
87a0299
Remove deprecated flag (#3708)
MetalBlueberry Jan 16, 2025
979643a
feat: create writing test for Anagha.
billy-the-fish Jan 6, 2025
bc75004
chore: add the Apache Airflow doc to the new integrations structure.
billy-the-fish Jan 16, 2025
fe413f0
chore: update on review.
billy-the-fish Jan 16, 2025
064d273
chore: update on review.
billy-the-fish Jan 16, 2025
ee032fa
Merge branch 'latest' into 3688-docs-rfc-writing-test-for-anaga
billy-the-fish Jan 16, 2025
7506deb
Apply suggestions from code review
billy-the-fish Jan 16, 2025
1699c60
chore: updates on review.
billy-the-fish Jan 16, 2025
fc5a6cd
chore: updates on review.
billy-the-fish Jan 16, 2025
eccd4d0
Merge branch 'latest' into 3688-docs-rfc-writing-test-for-anaga
billy-the-fish Jan 16, 2025
3d7b8ec
chore: updates on review.
billy-the-fish Jan 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 150 additions & 0 deletions use-timescale/integrations/apache-airflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
---
title: Integrate Apache Airflow with Timescale Cloud
excerpt: How to install the psql client for PostgreSQL
products: [cloud, mst, self_hosted]
keywords: [connect, integrate, apache, airflow]
---

import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx";

# Integrate Apache Airflow with $CLOUD_LONG

Apache Airflow® is a platform created by the community to programmatically author, schedule, and monitor workflows.

A [DAG (Directed Acyclic Graph)][Airflow-DAG] is the core concept of Airflow, collecting [Tasks][Airflow-Task] together,
organized with dependencies and relationships to say how they should run. You declare a DAG in a Python file
in the `$AIRFLOW_HOME/dags` folder of your Airflow instance.

This page shows you how to use a Python connector in a DAG to integrate Apache Airflow with a $SERVICE_LONG.

## Prerequisites

<IntegrationPrereqs />

* [Install Python3 and pip3][install-python-pip]
* [Install Apache Airflow][install-apache-airflow]

Ensure that your Airflow instance has network access to $CLOUD_LONG.

This example DAG uses the `company` table you create in [Create regular PostgreSQL tables for relational data][create-a-table-in-timescale]
billy-the-fish marked this conversation as resolved.
Show resolved Hide resolved

## Install python connectivity libraries
billy-the-fish marked this conversation as resolved.
Show resolved Hide resolved

billy-the-fish marked this conversation as resolved.
Show resolved Hide resolved
To install the Python libraries required to connect to $CLOUD_LONG:

<Procedure>

1. **Enable PostgreSQL connections between Airflow and $CLOUD_LONG**

```bash
pip install psycopg2-binary
```

1. **Enable PostgreSQL connection types in the Airflow UI**

```bash
pip install apache-airflow-providers-postgres
```

</Procedure>

## Create a connection between Airflow and your $SERVICE_LONG

In your Airflow instance, securely connect to your $SERVICE_LONG:

<Procedure>

1. **Run Airflow**

On your development machine, run the following command:

```bash
airflow standalone
```

The username and password for Airflow UI are displayed in the `standalone | Login with username`
line in the output.

1. **Add a connection from Airflow to your $SERVICE_LONG**

1. In your browser, navigate to `localhost:8080`, then select `Admin` > `Connections`.
1. Click `+` (Add a new record), then use your [connection info][connection-info] to fill in
the form. The `Connection Type` is `Postgres`.

</Procedure>

## Exchange data between Airflow and your $SERVICE_LONG

To exchange data between Airflow and your $SERVICE_LONG:

<Procedure>

1. **Create and execute a DAG**

To insert data in your $SERVICE_LONG from Airflow:
1. In `$AIRFLOW_HOME/dags/timescale_dag.py`, add the following code:

```python
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.hooks.postgres_hook import PostgresHook
from datetime import datetime
def insert_data_to_timescale():
hook = PostgresHook(postgres_conn_id='the ID of the connenction you created')
conn = hook.get_conn()
cursor = conn.cursor()
"""
This could be any query. This example inserts data into the table
you create in:
https://docs.timescale.com/getting-started/latest/tables-hypertables/#create-regular-postgresql-tables-for-relational-data
"""
cursor.execute("INSERT INTO company (symbol, name) VALUES (%s, %s)",
('new_company_symbol', 'New Company Name'))
conn.commit()
cursor.close()
conn.close()
default_args = {
'owner': 'airflow',
'start_date': datetime(2023, 1, 1),
'retries': 1,
}
dag = DAG('timescale_dag', default_args=default_args, schedule_interval='@daily')
insert_task = PythonOperator(
task_id='insert_data',
python_callable=insert_data_to_timescale,
dag=dag,
)
```
This DAG uses the `company` table created in [Create regular PostgreSQL tables for relational data][create-a-table-in-timescale].

1. In your browser, refresh the [Airflow UI][Airflow_UI].
1. In `Search DAGS`, type `timescale_dag` and press ENTER.
1. Press the play icon and trigger the DAG:
![daily eth volume of assets](https://assets.timescale.com/docs/images/integrations-apache-airflow.png)
1. **Verify that the data appears in $CLOUD_LONG**

1. In [Timescale Console][console], navigate to your service and click `SQL editor`.
1. Run a query to view your data. For example: `SELECT symbol, name FROM company;`.

You see the new rows inserted in the table.

</Procedure>

You have successfully integrated Apache Airflow with $CLOUD_LONG and created a data pipeline.


[create-a-table-in-timescale]: /getting-started/:currentVersion:/tables-hypertables/#create-regular-postgresql-tables-for-relational-data
[install-apache-airflow]: https://airflow.apache.org/docs/apache-airflow/stable/start.html
[install-python-pip]: https://docs.python.org/3/using/index.html
[console]: https://console.cloud.timescale.com/
[create-service]: /getting-started/:currentVersion:/services/
[enable-timescaledb]: /self-hosted/:currentVersion:/install/
[connection-info]: /use-timescale/:currentVersion:/integrations/find-connection-details/
[Airflow-DAG]: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#dags
[Airflow-Task]:https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html
[Airflow_UI]: localhost:8080
10 changes: 10 additions & 0 deletions use-timescale/integrations/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,15 @@ Some of the most in-demand integrations for $CLOUD_LONG are listed below, with l
|:---------------------------:|-----------------------------------------------------------------------------------------------------------------------------|
| [Terraform][terraform] | An infrastructure-as-code tool that enables you to safely and predictably provision and manage infrastructure in any cloud. |


## Data engineering and extract, transform, load

| Name | Description |
|:--------------------------------:|----------------------------------------------------------|
| [Apache Airflow][apache-airflow] | Programmatically author, schedule, and monitor workflows. |



[psql]: /use-timescale/:currentVersion:/integrations/psql/
[qstudio]: /use-timescale/:currentVersion:/integrations/qstudio/
[dbeaver]: /use-timescale/:currentVersion:/integrations/dbeaver/
Expand All @@ -49,4 +58,5 @@ Some of the most in-demand integrations for $CLOUD_LONG are listed below, with l
[grafana]: /use-timescale/:currentVersion:/integrations/grafana/
[tableau]: /use-timescale/:currentVersion:/integrations/tableau/
[terraform]: /use-timescale/:currentVersion:/integrations/terraform
[apache-airflow]: /use-timescale/:currentVersion:/integrations/apache-airflow
[postgresql-integrations]: https://slashdot.org/software/p/PostgreSQL/integrations/
5 changes: 5 additions & 0 deletions use-timescale/page-index/page-index.js
Original file line number Diff line number Diff line change
Expand Up @@ -778,6 +778,11 @@ module.exports = [
href: "find-connection-details",
excerpt: "Learn about connecting to your Timescale database",
},
{
title: "Apache Airflow",
href: "apache-airflow",
excerpt: "Integrate Apache Airflow with Timescale Cloud",
},
{
title: "Azure Data Studio",
href: "azure-data-studio",
Expand Down
Loading