-
Copy the GCP service account credentials file to the VM using sftp
-
set the necessary environment variables
export GCP_PROJECT_ID=<project-id> export GCP_GCS_BUCKET=<GCS BUCKET NAME> export GOOGLE_APPLICATION_CREDENTIALS="<path/to/your/service-account-authkeys>.json"
Please note that you will have to repeat this every time you create a new shell session
-
start airflow service
cd ~/musicaly-project/airflow && bash airflow_startup.sh
the airflow UI should be available on port 8080. Proceed to login with default username and password of airflow
-
start the DAGs
- trigger the
load_songs_dag
DAG first. This DAG is expected to run just once - trigger the
musicaly_dag
next. this DAG runs every hour on the 5th minute as thus;- We first create an external table for the data that was received in the past hour.
- We then create an empty table to which our hourly data will be appended. Usually, this will only ever run in the first run.
- Then we insert or append the hourly data, into the table.
- And then, delete the external table.
- Finally, run the dbt transformation, to create our dimensions and facts.
- trigger the
-
since airflow will be run in detached mode, run the commands below as needed
<!-- to view the logs --> docker-compose --follow <!-- to stop airflow --> docker-compose down