To get a local copy up and running follow these simple example steps.
- Python3.10
- Postgres
- Airflow
- Clickhouse
1.1. Download dataset from UCI from a link or in link and put online_retail_II.xlsx
file into a python_script/dataset/raw
directory.
1.2. Install the required packages from requirements.txt
pip install -r requirements.txt
1.3. Modify db_info.env
file
PG_HOST=""
PG_PORT=5432
PG_NAME=""
PG_USER=""
PG_PASSWORD=""
- PG_HOST: "localhost" if database is installed on your own local system otherwise IP address of system where PostgreSQL database is installed
- PG_PORT: default port for PostgreSQL server is "5432"
- PG_NAME: the database name (default is "postgres")
- PG_USER: the user name (default user is "postgres")
- PG_PASSWORD: the password which you gave when you installed PostgreSQL database for "postgres" user
1.4. Create postgres table as a assignment/sql_task/pg_online_retail.sql
file
1.5. Execute python script
- Open the Terminal and run
main.py
file with the --insert flag.
python3 main.py --insert
- To enable DEBUG mode. only run the following command:
python3 main.py
1.6. View analysis notebook (optional)
Recompiling analysis.ipynb
file
Dependences:
- airflow-clickhouse-plugin
- airflow-providers-clickhouse
- apache-airflow-providers-postgres
2.1. Deploy dags to airflow cluster
Note:
- Need to map
dags
directory withdags_folder
as a config in Airflow. To simplify, we can copy dag file to dags_folder dir in Airflow. The same withplugins
directory, we need to map withplugins_folder
.
Verify dags info in Airflow UI
2.2. Create connection
2.2.1. Postgres connection
2.3. Trigger dag