Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Pipelines course material #56

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

bhupatiraju
Copy link

This pull request includes tutorial notes for the Data Pipelines topic (as part of the python course advanced topics section), utilising the BOOST financial data from Kenya to illustrate the construction of a data pipeline using the medallion schema and then automating this pipeline.

It contains the following:

  1. Introduction File: Created a file named intro-to-data-pipelines that provides an overview of the data pipeline topic and illustrates its importance in data processing workflows.

  2. Data Processing Walk-through: Created files called Bronze, Silver and Gold which contains the data processing code using the medallion schema for the Kenya data.

  3. Get additional data: Created a file called subnational_population that retrieves data from the WB API and restricts to the columns needed for merging with the cleaned Kenya data

  4. Aggregation: Simple aggregation is done using the subnational population and the cleaned Kenya data to illustrate a simple use case

  5. Orchestration: Added a section on orchestration using Databricks Workflows, detailing how to automate and manage the data processing pipeline effectively (contained in the intro-to-data-pipelines file).

@weilu
Copy link
Member

weilu commented Oct 10, 2024

Thanks @bhupatiraju! Following the convention of other modules, can you add a README.md and clear the outputs of the notebooks?

@bhupatiraju bhupatiraju marked this pull request as draft December 10, 2024 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants