Project description from HiCounselor website: Link
This project will train you how to use SQL to analyze a real-world database, how to extract the most useful information from the dataset, how to pre-process the data using Python for improved performance, how to use a structured query language to retrieve useful information from the database, and how to visualize the data using the PowerBI tool.
download the superstore dataset: Click here
This project is divided into three modules:
In this Module you will be working on how to perform pre-processing of data and working on handling null values, deletion or transformation of irrelevant values, data type transformation, removing duplicates and data validations to get refined and cleaner data to perform further analysis.
Step 1: Removing duplicate rows ( there could be duplicate rows excluding Row_ID column ).
Step 2: Removing rows for which few values are missing.
Step 3: Remove irrelevant values from each column if any. Validation of all values for a column( order date and ship date value must be in correct date format ). For each entry in dataset ship date >= order date
Step 4: Export the cleaned dataset as a .csv file: prefer UTF-8 encoding.
Step 5: Convert the pre-processed dataset into an SQL file and import it to table named "superstore".
In this module, you will be working on performing data analysis on the pre-processed data from the previous module and conducting Data Analysis using SQL. You will generate queries for given problem statements.
#1. What percentage of total orders were shipped on the same date?
#2. Name top 3 customers with highest total value of orders.
#3. Find the top 5 items with the highest average sales per day.
#4. Write a query to find the average order value for each customer, and rank the customers by their average order value.
#5. Give the name of customers who ordered highest and lowest orders from each city.
#6. What is the most demanded sub-category in the west region?
#7. Which order has the highest number of items? And which order has the highest cumulative value?
#8. Which order has the highest cumulative value?
#9. Which segment’s order is more likely to be shipped via first class?
#10. Which city is least contributing to total revenue?
#11. What is the average time for orders to get shipped after order is placed?
#12. Which segment places the highest number of orders from each state and which segment places the largest individual orders from each state?
#13. Find all the customers who individually ordered on 3 consecutive days where each day’s total order was more than 50 in value. **
#14. Find the maximum number of days for which total sales on each day kept rising.