Orchestrating unstructured data pipeline with dagster and dlt.

This is demo project to show how to orchestrate dlt data pipelines using dagster. In this demo, we create two data pipelines:

GitHub Issues: A data pipeline to extract issues data from a github repository and ingest the data into Google BigQuery.
MongoDB: A data pipeline to extract unstructures data from MongoDB and ingest the data into BigQuery.

The demo uses dlt create a pipeline and orchestrates it using dagster.

The diagram above illustrates the dlt data pipeline being orchestrated using dagster:

A dlt resource (github_issues) yields the data from the GitHub API and passes the data to a dagster asset.
A dagster configurable resource (Class: DltResource) which has a create_pipeline pipeline.
A dagster asset that takes the configurable resource (Class: DltResource) and dlt resource (github_issues) to execute the pipeline.

How to run the pipeline

The repo consists of two seperate dagster projects:

The steps to run the projects are the same for both:

Clone this repository.
Create a Python virtual environment with python -m venv venv and activate it with source venv/bin/activate.
cd to the project you want to run.
Install the dependencies with pip install -r requirements.txt.
Create a .env file in the root directory of the project and add the relevant credentials:
- github-issues: BigQuery
- mongodb_dlt: BigQuery and MongoDB Atlas
Start the dagster web server with dagster dev and materialize the assets.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.dlt		.dlt
github-issues		github-issues
mongodb_dlt		mongodb_dlt
.gitignore		.gitignore
README.md		README.md
dagster_dlt_overview.jpg		dagster_dlt_overview.jpg
github_issues.py		github_issues.py
requirements.txt		requirements.txt