Skip to content

vannguyende/assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Getting Started


To get a local copy up and running follow these simple example steps.

Prerequisites

  • Python3.10
  • Postgres
  • Airflow
  • Clickhouse

Usage

1. python script

1.1. Download dataset from UCI from a link or in link and put online_retail_II.xlsx file into a python_script/dataset/raw directory. 1.2. Install the required packages from requirements.txt

pip install -r requirements.txt

1.3. Modify db_info.env file

PG_HOST="" 
PG_PORT=5432
PG_NAME=""
PG_USER=""
PG_PASSWORD=""
  • PG_HOST: "localhost" if database is installed on your own local system otherwise IP address of system where PostgreSQL database is installed
  • PG_PORT: default port for PostgreSQL server is "5432"
  • PG_NAME: the database name (default is "postgres")
  • PG_USER: the user name (default user is "postgres")
  • PG_PASSWORD: the password which you gave when you installed PostgreSQL database for "postgres" user

1.4. Create postgres table as a assignment/sql_task/pg_online_retail.sql file

1.5. Execute python script

  • Open the Terminal and run main.py file with the --insert flag.
python3 main.py --insert 
  • To enable DEBUG mode. only run the following command:
python3 main.py
  • Structure:
    Alt text

1.6. View analysis notebook (optional)

Recompiling analysis.ipynb file

2. Airflow

Dependences:

  • airflow-clickhouse-plugin
  • airflow-providers-clickhouse
  • apache-airflow-providers-postgres

2.1. Deploy dags to airflow cluster
Note:

  • Need to map dags directory with dags_folder as a config in Airflow. To simplify, we can copy dag file to dags_folder dir in Airflow. The same with plugins directory, we need to map with plugins_folder.

Verify dags info in Airflow UI
Alt text

2.2. Create connection
2.2.1. Postgres connection
Alt text

2.2.2. Clickhouse connection
Alt text

2.3. Trigger dag

Access the graph to check the health status of each task
Alt text

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published