Skip to content

codin-eric/Data-Engineering-Roadmap

 
 

Repository files navigation

Disclaimer

The purpose of this roadmap is to give you an idea about the landscape. The road map will guide you if you are confused about what to learn next, rather than encouraging you to pick what is hype and trendy. You should grow some understanding of why one tool would be better suited for some cases than the other and remember hype and trendy does not always mean best suited for the job.

Give a Star! ⭐

If you like or are using this project to learn or start your solution, please give it a star. Thanks!

Roadmap

Roadmap

Programming Languages

Learn Linux

There is two main parts for Linux learning: System Administration and Shell Scripting. You can arrange your learning depth with your preference

Data Structures and Algorithms / System Design

SQL

There are a number of good introductory SQL resources available for free and online. There are also some paid resources which I recommend for beginners, that are very effective, and well worth expensing in my opinion. A couple of notes:

  • I haven’t used all of these resources, but they come with strong recommendations around the web or myself/my peers.
  • You absolutely don’t need to use every single resource. Find a couple that work for you, and go to town.
  • You can always reach out to me if you have questions. I always paste this online when people are new to asking very technical questions – it’s not meant to be snarky – it's a gentle guide on how to compose your questions and gather necessary resources in order to best give technical people the information needed to get a quick/effective response: http://www.mikeash.com/getting_answers.html

Video/Class/Mini-course based:

  1. Stanford Self-paced ‘Database’ course
  • The original Coursera coursed has been converted into a series of mini-courses, which are all self-paced, and thorough.
  1. Portnov Computer School "SQL Tutorial for beginners” This is a mini-course (~4 hours in total) which is said to be quite good. Links:

Book/Tutorial Format (some interactive):

  1. SQL Problems and Solutions – Interactive book “…student[sic] can ask questions and get the answers even if such answers cannot be found in the textbook. To a certain extent interactive textbook is intended to substitute a teacher/advisor, which is, to our mind, indispensable requirement for the use of such teaching materials within the system of distance learning"
  2. Learn SQL The Hard Way "This book will teach you the 80% of SQL you probably need to use it effectively, and will mix in concepts in data modeling at the same time. If you've been fumbling around building web, desktop, or mobile applications because you don't know SQL, then this book is for you. It is written for people with no prior database, programming, or SQL knowledge, but knowing at least one programming language will help."
  3. GalaXQL "GalaXQL is a fun SQL tutorial where the database is a galaxy of stars that is rendered in 3D. Watch the galaxy change as your SQL commands create, modify, and destroy heavenly objects. What could be more fun?"
  4. PostgreSQL Tutorial "We developed the PostgreSQL tutorial to demonstrate the unique features of PostgreSQL that make it the most advanced open source database management system in the world. In addition, we will show you how to leverage those features to make your application faster and more secure."
  5. Head First SQL An excellent resource for beginners, I went through years ago. I highly recommend picking up a copy if you truly want to start at the ground level. It’s a big book, but the font size is large, and there are exercises / pictures etc. It takes about 1-2 days to get through, maybe a week spread out. “Is your data dragging you down? Are your tables all tangled up? Well we've got the tools to teach you just how to wrangle your databases into submission. Using the latest research in neurobiology, cognitive science, and learning theory to craft a multi-sensory SQL learning experience, Head First SQL has a visually rich format designed for the way your brain works, not a text-heavy approach that puts you to sleep. Maybe you've written some simple SQL queries to interact with databases. But now you want more, you want to really dig into those databases and work with your data. Head First SQL will show you the fundamentals of SQL and how to really take advantage of it. We'll take you on a journey through the language, from basic INSERT statements and SELECT queries to hardcore database manipulation with indices, joins, and transactions. We all know "Data is Power"—but we'll show you how to have "Power over your Data". Expect to have fun, expect to learn, and expect to be querying, normalizing, and joining your data like a pro by the time you're finished reading!"

Practice resources:

  1. SchemaVerse "The Schemaverse is a space-based strategy game implemented entirely within a PostgreSQL database. Compete against other players using raw SQL commands to command your fleet. Or, if your PL/pgSQL-foo is strong, wield it to write AI and have your fleet command itself!"
  2. SqlEx An extension of the sql-tutorial.ru book with practice exercises.
  3. SQLZoo Some tutorials and practice exercises
  4. PostgreSQL Exercises "This site was born when I noticed that there's a load of material out there to help people learn about SQL, but not a great deal to make it easy to learn by doing. PGExercises provides a series of questions and explanations built on a single, simple dataset. It's designed for use as a partner to a good book or Postgres' excellent documentation. The exercises on this site range from simple select and where clauses, through joins and case statements, and on to aggregations, window functions, and recursive queries. Most people who aren't already pros should find something to test themselves with."

Testing

CI/CD and Virtualization

Database Fundamentals

  • SQL
  • Normalisation
  • ACID transactions
  • CAP Theorem
  • OLTPS vs OLAP
  • Horizontal vs Vertical Scaling
  • Dimensional Modeling

Relational Database

Non-Relational Databases

Data Processing

Messaging

Cluster Computing Fundamentals

Object storage

Datawarehouses

Monitoring Datapipelines

Data Visualization

Machine Learning and Deep Learning Tools

MLOPS tools

Cloud

Wrap Up

If you think the roadmap can be improved, please do open a PR with any updates and submit any issues. Also, I will continue to improve this, so you might want to star this repository to revisit. Idea from : Python Developer Roadmap

Contribution

The roadmap is built using Draw.io. Project file can be found at DataEngRoadmap.xml file. To modify it, open draw.io, click Open Existing Diagram and choose xml file with project. It will open the roadmap for you. Update it, upload and update the images in readme and create a PR (export as png with 400% zoom and minify that with Compressor.io).

  • Open a pull request with improvements
  • Discuss ideas in issues
  • Spread the word

About

Roadmap for Data Engineering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 33.4%
  • Scala 25.3%
  • Go 20.7%
  • Shell 11.4%
  • Python 9.2%