Skip to content

In this project scenario, I will use actual YELP and climate datasets of Las Vegas in order to analyze the effects the weather has on customer reviews of restaurants. The data for temperature and precipitation observations are from the Global Historical Climatology Network-Daily (GHCN-D) database. I will use a leading industry cloud-native data …

License

Notifications You must be signed in to change notification settings

isaacabrahamodeh/Architecting-Data-in-Snowflakes-Data-Architect-Nanodegree-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Architecting-Data-in-Snowflakes-Data-Architect-Nanodegree-

In this project, an actual YELP and climate datasets of Las Vegas is used to build an enterprise data warehouse utilized in analyze the effects the weather on customer reviews of restaurants.

Restaurant rating data are sourced fro Yelp, the covid data is sourced from kaggle and the temperature and precipitation observations data are source from the Global Historical Climatology Network-Daily (GHCN-D) database

A leading industry cloud-native data warehouse system called Snowflake architecting the data in this project. The Data Warehouse (DWH) design in this project will be used for the purpose of reporting and online analytical processing (OLAP).

INITIAL STEPS

i. Snowflakes account creation

ii. Snowsql download and installation

iii. Database Projectdb creation

show db

iv. schema (Staging, ODS and Datawarehouse) Creation

v. Table creation in bothe schemas

PROJECT DATA ARCHITECT DIAGRAM

The conceptual data architect diagram for the project was drawn in lucid see below Data Architect Diagram

The diagram show multiple data sources (Yelp, and weather data) ingesting data into the staging area. These raw JSON and CSV data were further transformed, curated and ingested into the Operational Data Store (ODS) and finally landed in the Datawarehouse for analytics work. All these data are stored in tables.

Tables are created in the staging schema and data are loaded into it for example the business;

create table business

staging tables

JSON and CSV files are copied into the tables in the staging schema using snowsql CLI

Transfer file 1

Transfer file 2

Similarly, tables are also created in ODS schema and data from the Staging schema tables are transformed and loadedonto those table

ODS table

In general, 6 tables are created from the yelp dataset in both schemas and 2 tables are created fro the weather data. See the figure above.

Using SQL queries, data ingested into thr staging area, is transformed and stored in ODS. See screenshot of example SQL code

table transformation ODS schema

Entity Relationship Diagram - ERD was drawn on lucid. THis was useful in visualizing the data structures

ER Diagram

Using SQl queries particulaly SQL joins, yelp data was integrated with climate (weather data). For example;

Integrating Yelp and Weather data

Star schema was also drawn using lucid to understand the data structures and its relationship to facts (total_info) tables and the dimension tables

star schema

Also, using SQL queries, data ingested into the ODS was moved into the Data Warehouse DWH for analytics example;

ODS to DWH

While in the Data Warehouse (DWH), the cleaned data is ready for analytics. SQL queries was written to report the business name, temperature, precipitation, and rating. This was runned on the CLI

Query DWH schema

and the Snowflakes UI example;

Screenshot 2022-05-22 at 11 12 52

PS: This is by no means, exhaustive. Alot can stil be done....

About

In this project scenario, I will use actual YELP and climate datasets of Las Vegas in order to analyze the effects the weather has on customer reviews of restaurants. The data for temperature and precipitation observations are from the Global Historical Climatology Network-Daily (GHCN-D) database. I will use a leading industry cloud-native data …

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published