This repository contains my learnings and projects from this very insightful Nanodegree.
The program focuses on designing scalable data infrastructure and covers key areas like basics of data modeling, data warehousing, big data systems, and data governance.
Here I've learned the basics of data architecture and data modeling. Created entity relationship diagrams (ERDs) and built a physical database using PostgreSQL.
Project:
I've designed and built an HR database using PostgreSQL.
I've focused on building a data warehouse using dimensional data models with the cloud-solution platform Snowflake. This involved ingesting data from multiple sources and transforming it for reporting.
Project:
I've designed a data warehouse to analyze how weather impacts restaurant ratings using Yelp and climate data.
In my most intense chapter I've explored big data tools like HDFS, Hive, and Spark to process large datasets. I've worked with NoSQL databases and built a scalable data lake solution. For the first time I dealt with Amazon AWS S3 and it's various tools.
Project:
I've proposed a new data lake architecture for a medical company, solving real-world data challenges by meeting business and technical requirements.
In the last chapter I've learned about data governance best practices, including metadata management, data quality checks, and master data management.
Project:
I've implemented a governance framework for an online shoe reseller to manage data quality and design a data catalog.