Skip to content

Commit

Permalink
合并了 feature-branch 分支
Browse files Browse the repository at this point in the history
  • Loading branch information
iYs9ZPcR committed Jan 3, 2025
1 parent 8c06ce6 commit 77e456b
Showing 1 changed file with 181 additions and 0 deletions.
181 changes: 181 additions & 0 deletions awesome-data-engineering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
Have you found any cool resources about data engineering? Put them here

## Learning Data Engineering

### Courses

* [Data Engineering Zoomcamp](https://github.com/DataTalksClub/data-engineering-zoomcamp) by DataTalks.Club (free)
* [Big Data Platforms, Autumn 2022: Introduction to Big Data Processing Frameworks](https://big-data-platforms-22.mooc.fi/) by the University of Helsinki (free)
* [Awesome Data Engineering Learning Path](https://awesomedataengineering.com/)


### Books

* [Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann](https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321)
* [Big Data: Principles and Best Practices of Scalable Realtime Data Systems by Nathan Marz, James Warren](https://www.amazon.com/Big-Data-Principles-practices-scalable/dp/1617290343)
* [Practical DataOps: Delivering Agile Data Science at Scale by Harvinder Atwal](https://www.amazon.com/Practical-DataOps-Delivering-Agile-Science/dp/1484251032)
* [Data Pipelines Pocket Reference: Moving and Processing Data for Analytics by James Densmore](https://www.amazon.com/Data-Pipelines-Pocket-Reference-Processing/dp/1492087831)
* [Best books for data engineering](https://awesomedataengineering.com/data_engineering_best_books)
* [Fundamentals of Data Engineering: Plan and Build Robust Data Systems by Joe Reis, Matt Housley](https://www.amazon.com/Fundamentals-Data-Engineering-Robust-Systems/dp/1098108302)


### Introduction to Data Engineering Terms

* [https://datatalks.club/podcast/s05e02-data-engineering-acronyms.html](https://datatalks.club/podcast/s05e02-data-engineering-acronyms.html)


### Data engineering in practice

Conference talks from companies, blog posts, etc

* [Uber Data Archives](https://eng.uber.com/category/articles/uberdata/) (Uber engineering blog)
* [Data Engineering Weekly (DE-focused substack)](https://www.dataengineeringweekly.com/)
* [Seattle Data Guy (DE-focused substack)](https://seattledataguy.substack.com/)


## Doing Data Engineering

### Coding & Python

* [CS50's Introduction to Computer Science | edX](https://www.edx.org/course/introduction-computer-science-harvardx-cs50x) (course)
* [Python for Everybody SpecializsationSpecialization](https://www.coursera.org/specializations/python) (course)
* [Practical Python programming](https://github.com/dabeaz-course/practical-python/blob/master/Notes/Contents.md)


### SQL

* [Intro to SQL: Querying and managing data | Khan Academy](https://www.khanacademy.org/computing/computer-programming/sql)
* [Mode SQL Tutorial](https://mode.com/sql-tutorial/)
* [Use The Index, Luke](https://use-the-index-luke.com/) (SQL Indexing a nd Tuning e-Book)nfreffx
* [SQL Performance Explained](https://sql-performance-explained.com/) (book) e


### Workflow orchestration

* [What is DAG?](https://youtu.be/1Yh5S-S6wsI) (video)
* [Airflow, Prefect, and Dagster: An Inside Look](https://towardsdatascience.com/airflow-prefect-and-dagster-an-inside-look-6074781c9b77) (blog post)
* [Open-Source Spotlight - Prefect - Kevin Kho](https://www.youtube.com/watch?v=ISLV9JyqF1w) (video)
* [Prefect as a Data Engineering Project Workflow Tool, with Mary Clair Thompson (Duke) - 11/6/2020](https://youtu.be/HuwA4wLQtCM) (video)


### ETL and ELT

* [ETL vs. ELT: What’s the Difference?](https://rivery.io/blog/etl-vs-elt/) (blog post) (print version)

### Data lakes

* [An Introduction to Modern Data Lake Storage Layers (Hodi, Iceberg, Delta Lake)](https://dacort.dev/posts/modern-data-lake-storage-layers/) (blog post)
* [Lake House Architecture @ Halodoc: Data Platform 2.0](https://blogs.halodoc.io/lake-house-architecture-halodoc-data-platform-2-0/amp/) (blzog post)


### Data warehousing


* [Guide to Data Warehousing. Short and comprehensive information… | by Tomas Peluritis](https://towardsdatascience.com/guide-to-data-warehousing-6fdcf30b6fbe) (blog post)
* [Snowflake, Redshift, BigQuery, and Others: Cloud Data Warehouse Tools Compared](https://www.altexsoft.com/blog/snowflake-redshift-bigquery-data-warehouse-tools/) (blog post)


### Streaming


* Building Streaming Analytics: The Journey and Learnings - Maxim Lukichev

### DataOps

* [DataOps 101 with Lars Albertsson – DataTalks.Club](https://datatalks.club/podcast/s02e11-dataops.html) (podcast)
*


### Monitoring and observability

* [Data Observability: The Next Frontier of Data Engineering with Barr Moses](https://datatalks.club/podcast/s03e03-data-observability.html) (podcast)


### Analytics engineering

* [Analytics Engineer: New Role in a Data Team with Victoria Perez Mola](https://datatalks.club/podcast/s03e11-analytics-engineer.html) (podcast)
* [Modern Data Stack for Analytics Engineering - Kyle Shannon](https://www.youtube.com/watch?v=UmIZIkeOfi0) (video)
* [Analytics Engineering vs Data Engineering | RudderStack Blog](https://www.rudderstack.com/blog/analytics-engineering-vs-data-engineering) (blog post)
* [Learn the Fundamentals of Analytics Engineering with dbt](https://courses.getdbt.com/courses/fundamentals) (course)


### Data mesh

* [Data Mesh in Practice - Max Schultze](https://www.youtube.com/watch?v=ekEc8D_D3zY) (video)

### Cloud

* [https://acceldataio.medium.com/data-engineering-best-practices-how-netflix-keeps-its-data-infrastructure-cost-effective-dee310bcc910](https://acceldataio.medium.com/data-engineering-best-practices-how-netflix-keeps-its-data-infrastructure-cost-effective-dee310bcc910)


### Reverse ETL

* TODO: What is reverse ETL?
* [https://datatalks.club/podcast/s05e02-data-engineering-acronyms.html](https://datatalks.club/podcast/s05e02-data-engineering-acronyms.html)
* [Open-Source Spotlight - Grouparoo - Brian Leonard](https://www.youtube.com/watch?v=hswlcgQZYuw) (video)
* [Open-Source Spotlight - Castled.io (Reverse ETL) - Arun Thulasidharan](https://www.youtube.com/watch?v=iW0XhltAUJ8) (video)

## Career in Data Engineering

* [From Data Science to Data Engineering with Ellen König – DataTalks.Club](https://datatalks.club/podcast/s07e08-from-data-science-to-data-engineering.html) (podcast)
* [Big Data Engineer vs Data Scientist with Roksolana Diachuk – DataTalks.Club](https://datatalks.club/podcast/s04e03-big-data-engineer-vs-data-scientist.html) (podcast)
* [What Skills Do You Need to Become a Data Engineer](https://www.linkedin.com/pulse/what-skills-do-you-need-become-data-engineer-peng-wang/) (blog post)
* [The future history of Data Engineering](https://groupby1.substack.com/p/data-engineering?s=r) (blog post)
* [What Skills Do Data Engineers Need](https://www.theseattledataguy.com/what-skills-do-data-engineers-need/) (blog post)

### Data Engineering Management

* [Becoming a Data Engineering Manager with Rahul Jain – DataTalks.Club](https://datatalks.club/podcast/s07e07-becoming-a-data-engineering-manager.html) (podcast)

## Data engineering projects

* [How To Start A Data Engineering Project - With Data Engineering Project Ideas](https://www.youtube.com/watch?v=WpN47Jddo7I) (video)
* [Data Engineering Project for Beginners - Batch edition](https://www.startdataengineering.com/post/data-engineering-project-for-beginners-batch-edition/) (blog post)
* [Building a Data Engineering Project in 20 Minutes](https://www.sspaeti.com/blog/data-engineering-project-in-twenty-minutes/) (blog post)
* [Automating Nike Run Club Data Analysis with Python, Airflow and Google Data Studio | by Rich Martin | Medium](https://medium.com/@rich_23525/automating-nike-run-club-data-analysis-with-python-airflow-and-google-data-studio-3c9556478926) (blog post)


## Data Engineering Resources

### Blogs

* [Start Data Engineering](https://www.startdataengineering.com/)

### Podcasts

* [The Data Engineering Podcast](https://www.dataengineeringpodcast.com/)
* [DataTalks.Club Podcast](https://datatalks.club/podcast.html) (only some episodes are about data engineering)
*

### Communities

* [DataTalks.Club](https://datatalks.club/)
* [/r/dataengineering](https://www.reddit.com/r/dataengineering)


### Meetups

* [Sydney Data Engineers](https://sydneydataengineers.github.io/)

### People to follow on Twitter and LinkedIn

* TODO

### YouTube channels

* [Karolina Sowinska - YouTube](https://www.youtube.com/channel/UCAxnMry1lETl47xQWABvH7g) x`
* [Seattle Data Guy - YouTube](https://www.youtube.com/c/SeattleDataGuy)
* [Andreas Kretz - YouTube](https://www.youtube.com/c/andreaskayy)
* [DataTalksClub - YouTube](https://youtube.com/c/datatalksclub) (only some videos are about data engineering)

### Resource aggregators

* [Reading List](https://www.scling.com/reading-list/) by Lars Albertsson
* [GitHub - igorbarinov/awesome-data-engineering](https://github.com/igorbarinov/awesome-data-engineering) (focus is more on tools)


## License

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

0 comments on commit 77e456b

Please sign in to comment.