diff --git a/awesome-data-engineering.md b/awesome-data-engineering.md new file mode 100644 index 0000000..da04649 --- /dev/null +++ b/awesome-data-engineering.md @@ -0,0 +1,181 @@ +Have you found any cool resources about data engineering? Put them here + +## Learning Data Engineering + +### Courses + +* [Data Engineering Zoomcamp](https://github.com/DataTalksClub/data-engineering-zoomcamp) by DataTalks.Club (free) +* [Big Data Platforms, Autumn 2022: Introduction to Big Data Processing Frameworks](https://big-data-platforms-22.mooc.fi/) by the University of Helsinki (free) +* [Awesome Data Engineering Learning Path](https://awesomedataengineering.com/) + + +### Books + +* [Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann](https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321) +* [Big Data: Principles and Best Practices of Scalable Realtime Data Systems by Nathan Marz, James Warren](https://www.amazon.com/Big-Data-Principles-practices-scalable/dp/1617290343) +* [Practical DataOps: Delivering Agile Data Science at Scale by Harvinder Atwal](https://www.amazon.com/Practical-DataOps-Delivering-Agile-Science/dp/1484251032) +* [Data Pipelines Pocket Reference: Moving and Processing Data for Analytics by James Densmore](https://www.amazon.com/Data-Pipelines-Pocket-Reference-Processing/dp/1492087831) +* [Best books for data engineering](https://awesomedataengineering.com/data_engineering_best_books) +* [Fundamentals of Data Engineering: Plan and Build Robust Data Systems by Joe Reis, Matt Housley](https://www.amazon.com/Fundamentals-Data-Engineering-Robust-Systems/dp/1098108302) + + +### Introduction to Data Engineering Terms + +* [https://datatalks.club/podcast/s05e02-data-engineering-acronyms.html](https://datatalks.club/podcast/s05e02-data-engineering-acronyms.html) + + +### Data engineering in practice + +Conference talks from companies, blog posts, etc + +* [Uber Data Archives](https://eng.uber.com/category/articles/uberdata/) (Uber engineering blog) +* [Data Engineering Weekly (DE-focused substack)](https://www.dataengineeringweekly.com/) +* [Seattle Data Guy (DE-focused substack)](https://seattledataguy.substack.com/) + + +## Doing Data Engineering + +### Coding & Python + +* [CS50's Introduction to Computer Science | edX](https://www.edx.org/course/introduction-computer-science-harvardx-cs50x) (course) +* [Python for Everybody SpecializsationSpecialization](https://www.coursera.org/specializations/python) (course) +* [Practical Python programming](https://github.com/dabeaz-course/practical-python/blob/master/Notes/Contents.md) + + +### SQL + +* [Intro to SQL: Querying and managing data | Khan Academy](https://www.khanacademy.org/computing/computer-programming/sql) +* [Mode SQL Tutorial](https://mode.com/sql-tutorial/) +* [Use The Index, Luke](https://use-the-index-luke.com/) (SQL Indexing a nd Tuning e-Book)nfreffx +* [SQL Performance Explained](https://sql-performance-explained.com/) (book) e + + +### Workflow orchestration + +* [What is DAG?](https://youtu.be/1Yh5S-S6wsI) (video) +* [Airflow, Prefect, and Dagster: An Inside Look](https://towardsdatascience.com/airflow-prefect-and-dagster-an-inside-look-6074781c9b77) (blog post) +* [Open-Source Spotlight - Prefect - Kevin Kho](https://www.youtube.com/watch?v=ISLV9JyqF1w) (video) +* [Prefect as a Data Engineering Project Workflow Tool, with Mary Clair Thompson (Duke) - 11/6/2020](https://youtu.be/HuwA4wLQtCM) (video) + + +### ETL and ELT + +* [ETL vs. ELT: What’s the Difference?](https://rivery.io/blog/etl-vs-elt/) (blog post) (print version) + +### Data lakes + +* [An Introduction to Modern Data Lake Storage Layers (Hodi, Iceberg, Delta Lake)](https://dacort.dev/posts/modern-data-lake-storage-layers/) (blog post) +* [Lake House Architecture @ Halodoc: Data Platform 2.0](https://blogs.halodoc.io/lake-house-architecture-halodoc-data-platform-2-0/amp/) (blzog post) + + +### Data warehousing + + +* [Guide to Data Warehousing. Short and comprehensive information… | by Tomas Peluritis](https://towardsdatascience.com/guide-to-data-warehousing-6fdcf30b6fbe) (blog post) +* [Snowflake, Redshift, BigQuery, and Others: Cloud Data Warehouse Tools Compared](https://www.altexsoft.com/blog/snowflake-redshift-bigquery-data-warehouse-tools/) (blog post) + + +### Streaming + + +* Building Streaming Analytics: The Journey and Learnings - Maxim Lukichev + +### DataOps + +* [DataOps 101 with Lars Albertsson – DataTalks.Club](https://datatalks.club/podcast/s02e11-dataops.html) (podcast) +* + + +### Monitoring and observability + +* [Data Observability: The Next Frontier of Data Engineering with Barr Moses](https://datatalks.club/podcast/s03e03-data-observability.html) (podcast) + + +### Analytics engineering + +* [Analytics Engineer: New Role in a Data Team with Victoria Perez Mola](https://datatalks.club/podcast/s03e11-analytics-engineer.html) (podcast) +* [Modern Data Stack for Analytics Engineering - Kyle Shannon](https://www.youtube.com/watch?v=UmIZIkeOfi0) (video) +* [Analytics Engineering vs Data Engineering | RudderStack Blog](https://www.rudderstack.com/blog/analytics-engineering-vs-data-engineering) (blog post) +* [Learn the Fundamentals of Analytics Engineering with dbt](https://courses.getdbt.com/courses/fundamentals) (course) + + +### Data mesh + +* [Data Mesh in Practice - Max Schultze](https://www.youtube.com/watch?v=ekEc8D_D3zY) (video) + +### Cloud + +* [https://acceldataio.medium.com/data-engineering-best-practices-how-netflix-keeps-its-data-infrastructure-cost-effective-dee310bcc910](https://acceldataio.medium.com/data-engineering-best-practices-how-netflix-keeps-its-data-infrastructure-cost-effective-dee310bcc910) + + +### Reverse ETL + +* TODO: What is reverse ETL? +* [https://datatalks.club/podcast/s05e02-data-engineering-acronyms.html](https://datatalks.club/podcast/s05e02-data-engineering-acronyms.html) +* [Open-Source Spotlight - Grouparoo - Brian Leonard](https://www.youtube.com/watch?v=hswlcgQZYuw) (video) +* [Open-Source Spotlight - Castled.io (Reverse ETL) - Arun Thulasidharan](https://www.youtube.com/watch?v=iW0XhltAUJ8) (video) + +## Career in Data Engineering + +* [From Data Science to Data Engineering with Ellen König – DataTalks.Club](https://datatalks.club/podcast/s07e08-from-data-science-to-data-engineering.html) (podcast) +* [Big Data Engineer vs Data Scientist with Roksolana Diachuk – DataTalks.Club](https://datatalks.club/podcast/s04e03-big-data-engineer-vs-data-scientist.html) (podcast) +* [What Skills Do You Need to Become a Data Engineer](https://www.linkedin.com/pulse/what-skills-do-you-need-become-data-engineer-peng-wang/) (blog post) +* [The future history of Data Engineering](https://groupby1.substack.com/p/data-engineering?s=r) (blog post) +* [What Skills Do Data Engineers Need](https://www.theseattledataguy.com/what-skills-do-data-engineers-need/) (blog post) + +### Data Engineering Management + +* [Becoming a Data Engineering Manager with Rahul Jain – DataTalks.Club](https://datatalks.club/podcast/s07e07-becoming-a-data-engineering-manager.html) (podcast) + +## Data engineering projects + +* [How To Start A Data Engineering Project - With Data Engineering Project Ideas](https://www.youtube.com/watch?v=WpN47Jddo7I) (video) +* [Data Engineering Project for Beginners - Batch edition](https://www.startdataengineering.com/post/data-engineering-project-for-beginners-batch-edition/) (blog post) +* [Building a Data Engineering Project in 20 Minutes](https://www.sspaeti.com/blog/data-engineering-project-in-twenty-minutes/) (blog post) +* [Automating Nike Run Club Data Analysis with Python, Airflow and Google Data Studio | by Rich Martin | Medium](https://medium.com/@rich_23525/automating-nike-run-club-data-analysis-with-python-airflow-and-google-data-studio-3c9556478926) (blog post) + + +## Data Engineering Resources + +### Blogs + +* [Start Data Engineering](https://www.startdataengineering.com/) + +### Podcasts + +* [The Data Engineering Podcast](https://www.dataengineeringpodcast.com/) +* [DataTalks.Club Podcast](https://datatalks.club/podcast.html) (only some episodes are about data engineering) +* + +### Communities + +* [DataTalks.Club](https://datatalks.club/) +* [/r/dataengineering](https://www.reddit.com/r/dataengineering) + + +### Meetups + +* [Sydney Data Engineers](https://sydneydataengineers.github.io/) + +### People to follow on Twitter and LinkedIn + +* TODO + +### YouTube channels + +* [Karolina Sowinska - YouTube](https://www.youtube.com/channel/UCAxnMry1lETl47xQWABvH7g) x` +* [Seattle Data Guy - YouTube](https://www.youtube.com/c/SeattleDataGuy) +* [Andreas Kretz - YouTube](https://www.youtube.com/c/andreaskayy) +* [DataTalksClub - YouTube](https://youtube.com/c/datatalksclub) (only some videos are about data engineering) + +### Resource aggregators + +* [Reading List](https://www.scling.com/reading-list/) by Lars Albertsson +* [GitHub - igorbarinov/awesome-data-engineering](https://github.com/igorbarinov/awesome-data-engineering) (focus is more on tools) + + +## License + +This work is licensed under a Creative Commons Attribution 4.0 International License. + +CC BY 4.0