Data Pipelines course material #56

bhupatiraju · 2024-10-10T15:14:44Z

This pull request includes tutorial notes for the Data Pipelines topic (as part of the python course advanced topics section), utilising the BOOST financial data from Kenya to illustrate the construction of a data pipeline using the medallion schema and then automating this pipeline.

It contains the following:

Introduction File: Created a file named intro-to-data-pipelines that provides an overview of the data pipeline topic and illustrates its importance in data processing workflows.
Data Processing Walk-through: Created files called Bronze, Silver and Gold which contains the data processing code using the medallion schema for the Kenya data.
Get additional data: Created a file called subnational_population that retrieves data from the WB API and restricts to the columns needed for merging with the cleaned Kenya data
Aggregation: Simple aggregation is done using the subnational population and the cleaned Kenya data to illustrate a simple use case
Orchestration: Added a section on orchestration using Databricks Workflows, detailing how to automate and manage the data processing pipeline effectively (contained in the intro-to-data-pipelines file).

Entry point to the course with notes

…nces

weilu · 2024-10-10T15:23:40Z

Thanks @bhupatiraju! Following the convention of other modules, can you add a README.md and clear the outputs of the notebooks?

bhupatiraju added 5 commits October 10, 2024 19:53

Create Medallioin stages for data processing

8f6cb85

Create intro-data-pipelines.ipynb

e68a3f0

Entry point to the course with notes

Create kenya_func_agg.ipynb

d97832d

Create subnational_population.ipynb

3bc52eb

Reduced the code block for silver. Added comments. Added basic refere…

3983fa4

…nces

luisesanmartin requested review from luisesanmartin and weilu October 17, 2024 15:25

bhupatiraju marked this pull request as draft December 10, 2024 18:54

Added Readme

1577c85

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Pipelines course material #56

Data Pipelines course material #56

bhupatiraju commented Oct 10, 2024

weilu commented Oct 10, 2024

Data Pipelines course material #56

Are you sure you want to change the base?

Data Pipelines course material #56

Conversation

bhupatiraju commented Oct 10, 2024

weilu commented Oct 10, 2024