In this project, an actual YELP and climate datasets of Las Vegas is used to build an enterprise data warehouse utilized in analyze the effects the weather on customer reviews of restaurants.
Restaurant rating data are sourced fro Yelp, the covid data is sourced from kaggle and the temperature and precipitation observations data are source from the Global Historical Climatology Network-Daily (GHCN-D) database
A leading industry cloud-native data warehouse system called Snowflake architecting the data in this project. The Data Warehouse (DWH) design in this project will be used for the purpose of reporting and online analytical processing (OLAP).
INITIAL STEPS
i. Snowflakes account creation
ii. Snowsql download and installation
iii. Database Projectdb creation
iv. schema (Staging, ODS and Datawarehouse) Creation
v. Table creation in bothe schemas
PROJECT DATA ARCHITECT DIAGRAM
The conceptual data architect diagram for the project was drawn in lucid see below
The diagram show multiple data sources (Yelp, and weather data) ingesting data into the staging area. These raw JSON and CSV data were further transformed, curated and ingested into the Operational Data Store (ODS) and finally landed in the Datawarehouse for analytics work. All these data are stored in tables.
Tables are created in the staging schema and data are loaded into it for example the business;
JSON and CSV files are copied into the tables in the staging schema using snowsql CLI
Similarly, tables are also created in ODS schema and data from the Staging schema tables are transformed and loadedonto those table
In general, 6 tables are created from the yelp dataset in both schemas and 2 tables are created fro the weather data. See the figure above.
Using SQL queries, data ingested into thr staging area, is transformed and stored in ODS. See screenshot of example SQL code
Entity Relationship Diagram - ERD was drawn on lucid. THis was useful in visualizing the data structures
Using SQl queries particulaly SQL joins, yelp data was integrated with climate (weather data). For example;
Star schema was also drawn using lucid to understand the data structures and its relationship to facts (total_info) tables and the dimension tables
Also, using SQL queries, data ingested into the ODS was moved into the Data Warehouse DWH for analytics example;
While in the Data Warehouse (DWH), the cleaned data is ready for analytics. SQL queries was written to report the business name, temperature, precipitation, and rating. This was runned on the CLI
and the Snowflakes UI example;
PS: This is by no means, exhaustive. Alot can stil be done....