This project explores an EDA (Exploratory Data Analysis) on the Online Sales Dataset, which is available on Kaggle. The dataset contains information related to customer purchases, including product details, quantity, unit price, discount, and customer-specific data.
In this project, various patterns and trends related to shopping behaviors are identified, such as:
- How customer spending varies with discount status
- Distribution of order priorities
- Seasonal trends in customer purchases
- Purchase behavior by payment method and more
The goal is to uncover valuable insights from the dataset that could be useful for business analysis, marketing strategies, and customer behavior prediction.
The dataset used for this analysis is the Online Sales Dataset by Yusuf Delikkaya, available on Kaggle. You can find the dataset here:
Online Sales Dataset - Kaggle
The dataset includes the following columns:
InvoiceNo
: Unique identifier for each invoiceStockCode
: Unique product codeDescription
: Description of the purchased productQuantity
: Number of units purchasedInvoiceDate
: Date and time of the purchaseUnitPrice
: Price per unit of the purchased itemCustomerID
: Unique identifier for each customerCountry
: Country where the purchase was madeDiscount
: Discount applied to the purchasePaymentMethod
: Customer's preferred payment methodShippingCost
: Cost of shipping for the orderCategory
: Product categorySalesChannel
: Whether the purchase was made online or in-storeReturnStatus
: Whether the item was returned or notShipmentProvider
: Logistics providerWarehouseLocation
: Warehouse locationOrderPriority
: Priority level of the order (High, Medium, Low)
Before running the notebook, make sure to have the following libraries installed:
pandas
numpy
matplotlib
seaborn
plotly
IPython
To install these dependencies, run the following command:
pip install pandas numpy matplotlib seaborn plotly ipython
Option 1: Running on Jupyter Notebook Download the notebook file online_sales_eda.ipynb from this repository. Open Jupyter Notebook. Navigate to the directory where the notebook is saved and open the file. Make sure the dataset Online Sales Dataset is available locally or upload it to the same directory where the notebook is stored. Run the notebook by executing the cells.
Option 2: Running on Google Colab Open Google Colab (https://colab.research.google.com). In Colab, go to File > Open notebook > GitHub tab. Paste the URL of this repository and open the online_sales_eda.ipynb file. Make sure to upload the dataset to Colab (you can either use files.upload() or mount Google Drive). Execute the cells to run the EDA.
Option 3: Running Locally 1.Clone the repository to your local machine:
git clone https://github.com/yourusername/Identifying-Shopping-Trends-EDA.git
2.Navigate to the project directory:
cd Identifying-Shopping-Trends-EDA
3.Ensure that you have installed all the necessary libraries (mentioned in the requirements section). 4.Run the notebook on Jupyter Notebook or any IDE that supports Python and Jupyter notebooks.
The notebook will generate various analyses and visualizations, such as:
Distribution of order priorities (High, Medium, Low) Spending behavior based on discounts Trends by payment methods, shipping cost, and other customer data Most frequently purchased products and seasonal trends
This project is licensed under the MIT License - see the LICENSE file for details.
Harsh Gopal
@Harsh-Gopal
For any questions, please feel free to contact me at [[email protected]].