Skip to content

Repository fo Data Engineering Course

Notifications You must be signed in to change notification settings

kpokk/dataeng

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering:

Repository for the Data Engineering Course (LTAT.02.007)

Graph View

inline

Teaching Assistants:

Acknowledgments

Special Thanks to Emanuele Della Valle and Marco Brambilla from Politecnico di Milano to letting me "steal" some of their great slides.

Lectures

Date Title Material Mandatory Reads Extras
01/09 Course Intro Slides - pdf slide 45-109)
03/09 Data Modeling Slides - pdf slide 1-44 Chp 4 p111-127, Chp 5 p151-156, Chp 6 p199-205 of [3]
10/09 DM for Relational Databases Slides - pdf slide 45-109 Chp 2, 6, and 7 (Normal Forms) of [1] Relational Model
10/09 DM for Data Warehouse Slides - pdfslide 109-118 pdf video Chp 2 of [2]
17/09 DM for Big Data Slides - pdf Chp 2 of [3], video paper
17/09 Key Value Stores Slides 1,Slides 2pdf nosql
24/10 Column Oriented Databases Slides 1 Slides 2 pdf nosql
24/10 Document Databases Slides 1 Slides 2 pdf nosql
01/10 Graph Databases Slides 1 Slides 2 pdf1 pdf2 Chp 3 and 5 of [5] book
08/10 Data Ingestion Slides 1 Slide 2 Slide 3 Slide 4
15/10 Part 1 Recap Slides 1 pdf
22/10 Midterm
29/10 Data Engineering Pipelines (Part1) Slides 1 slide 2 pdf
05/11 Data Engineering Pipelines (Part2) Slides 1 Slides 2 Slides 3 Chp 10 of 3 R. Chang Pt 2 R. Chang Pt 3
12/11 Streaming Data (Part 1) Slide 1 Slide 2 Chp 11 of 3 Streaming 101 Streaming 102
19/11 Data Journey Slides
26/11 Streaming Data (Part 2) Slide 1 Slide 2
03/12 Data Wrangling (Part 1) pdf
10/12 Data Wrangling (Part 2) pdf

Practices (Videos Will be Available after Group 2 issue)

Date Title Material Reads Videos Branch Notes
07-8/09 Docker Slides - Video GP1 Video GP2 Lab Branch QA GP2 only
14-15 /09 Modeling and Querying Relational Data with Postgres Slides Chp 32 of [1]§ Video Homework 1
21-22 /09 Modeling and Querying Key Value Data with Redis Slides Video Homework 2
28-29/09 Modeling and Querying Document Data with MongoDB Slides Video Homework 3
5-6/10 Modeling and Querying Graph Data with Neo4J Slides CypherManual Video Homework 4
19-20-26-27/10 Data Ingestion with Apache Kafka Slides Video 1 Video 2 Video 3 Video 4 Homework 5
10-11/11 Apache Airflow Data Pipelines Slides Video 1 Video 2 Homework 6
16-17/11 Stream Processing with Kafka Streams Slides Video 1 Video 2 Homework 7
23-24/11 Stream Processing with KSQL Slides Video 1 Video 2 Homework 7
07-8/12 Data Cleansing Slides Video 1 Video 2 Homework8
14-15/12 Data Augmentation Slides Video1Video2 Homework8

Extras

Contributing

  • Modeling and Querying RDF data: SPARQL
  • Domain Driven Design: a summary
  • Event Sourcing: a summary
  • Data Pipelines with Luigi
  • Data Pipelines with Apachi Nifi
  • Data Processing with Apache Flink

Syllabus

  • What is (Big) Data?
  • The Role of Data Engineer
  • Data Modeling
    • Data Replication
    • Data Partitioning
    • Transactions
  • Relational Data
  • NoSQL
    • Document
    • Graph
  • Data Warehousing
    • Star and Snowflake schemas
  • Data Vault
  • (Big) Data Pipelines
    • Big Data Systems Architectures
    • ETL and Data Pipelines
      • Best Practices and Anti-Patterns
    • Batch vs Streaming Processing
  • Data Cleansing
  • Data Augumentation

Books

About

Repository fo Data Engineering Course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 75.4%
  • HTML 24.6%