This repository provides an example Apache Airflow pipeline and Deeplake for local development of machine learning projects. The pipeline demonstrates how to load dataset and parse into Deeplake.
In this example, we leverage Apache Airflow to automate the following steps:
-
create_deep_lake_data: The pipeline retrieves images and annotations from a folder and then prepares them for Deeplake.
-
show_example_in_deeplake: Show the image with the bounding box from Deeplake dataset
The first and second tasks are executed in separate Docker-containers. By deploying the first two tasks in separate containers, we ensure efficient resource utilization and maintain modularity in the pipeline's execution environment.
To use this pipeline for local development, follow the steps below:
-
Ensure that your Docker Engine has sufficient memory allocated, as running the pipeline may require more memory in certain cases.
-
Сhange path to your local repo in
dags/test_dag.py
. Replace "<absolute_path_to_your_airflow-ml_repo>/data" and "<absolute_path_to_your_airflow-ml_repo>/results" with your path. -
Before the first Airflow run, prepare the environment by executing the following steps:
- If you are working on Linux, specify the AIRFLOW_UID by running the command:
echo -e "AIRFLOW_UID=$(id -u)" > .env
- Perform the database migration and create the initial user account by running the command:
docker compose up airflow-init
The created user account will have the login
airflow
and the passwordairflow
. -
Start Airflow and build custom images to run tasks in Docker-containers:
docker compose up --build
-
Access the Airflow web interface in your browser at http://localhost:8080.
-
Trigger the DAG
convert_to_deeplake
to initiate the pipeline execution. -
When you are finished working and want to clean up your environment, run:
docker compose down --volumes --rmi all
- Create API manual in Airflow_API folder
- Trigger dag with next execution time (e.g. Trigger in next 5 minutes, 1 hour, ...) completed
Toggle on
the DAG you want to use at http://localhost:8080- Run trigger_dag.py
- XCom in DockerOperator