Product_Similarity

OBJECTIVE: The objective is to develop a model that includes:

Extracting relevant features from Samsung and LG washing machine datasets.
Employing clustering and Machine Learning algorithms (such as K-means, Random Forest, Topic Modeling, LDA) to group similar models from both brands.
Developing a similarity metric to quantify the resemblance between Samsung and LG washing machine models within the same cluster.
Providing recommendations of Samsung models that are closest in terms of features to a given LG model.
It will help LG Management team to find same model and making analysis of how ot improve there sales in view of their opponent. METHODOLOGY AND PROCEDURE:

Data Preparation:

Data Collecting: Created a selenium script that scrap all washing machines detail from LG & Samsung’s website in a schemas that matters while performing analsys.
Data Cleaning: Handle missing values, outliers, and noise in the data.
Data Visualization: Applying histogram to known the underlying distribution of variables.
Data Transformation: Normalize or standardize data to ensure that variables are on the same scale.

Choosing a Clustering Method:

Select an appropriate clustering algorithm based on the nature of the data and the specific goals of the analysis. It includes: a. K-means is a widely-used clustering algorithm that divides a dataset into a specified number (k) of clusters. It is an iterative algorithm that assigns each data
point to the nearest cluster center, then adjusts the cluster centers to be the mean of the points in the cluster.

b. BIRCH stands for Balanced Iterative Hierarchical Based Clustering. It is used on very large datasets where K-Means cannot practically scale. BIRCH algorithm divides large data into small clusters and tries to retain the maximum amount of information possible.

c. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is an unsupervised machine learning algorithm that is used to identify clusters of points in a dataset based on their density

d. Hierarchical clustering is a method of clustering that involves creating a hierarchy (tree like structure) of clusters. Determining the Number of Clusters
Use methods like the Elbow Method or Silhouette Score to decide on the optimal number of clusters. a. Elbow Method: Plots the within-cluster sum of squares (WCSS) against the number of clusters and looks for an "elbow point" where the rate of decrease sharply slows. b. Silhouette Score: Measures how similar a data point is to its own cluster compared to other clusters.

Clustering the Data:

Apply the chosen clustering algorithm to the dataset.
Assign each data point to a cluster.

Evaluating Clusters:

Assess the quality of the clusters using internal evaluation metrics like WCSS.
Validate the stability and robustness of the clusters by cross validation

Interpreting and Visualizing Results:

Interpret the characteristics of each cluster by examining the centroid or representative points.
Determine the average values of each variable within each cluster to understand the typical characteristics of products in that group.
Compare cluster profiles to uncover the key attributes that distinguish one cluster from another. These factors will help in defining the unique characteristics of each Product segment.

INTERPRETATION:

a. Model Similarity: The project will determine the degree of similarity between Samsung and LG washing machine models, aiding in product comparisons and competitive analysis.

b. Feature Importance: By examining the features that contribute most to model similarity, insights can be gained into the key factors influencing consumer preferences and product differentiation.

c. Market Segmentation: The clustering algorithms can potentially identify distinct groups of washing machines based on their features, allowing for targeted marketing and product development strategies.

d. Product Recommendations: The project can assist in recommending Samsung models to customers based on their preferences for specific LG features, enhancing customer satisfaction and sales.

e. Algorithm Evaluation: The performance of different machine learning algorithms in this context will be assessed, providing valuable insights for future projects involving product similarity analysis

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Lg_Washing_Machine_Data_Scraper.ipynb		Lg_Washing_Machine_Data_Scraper.ipynb
Lg_product.csv		Lg_product.csv
Main Output_2024-07-15_lg.xlsx		Main Output_2024-07-15_lg.xlsx
Product similarity.ipynb		Product similarity.ipynb
README.md		README.md
Samsung_Scraped_Data.xlsx		Samsung_Scraped_Data.xlsx
Samsungdata Scraper.ipynb		Samsungdata Scraper.ipynb
df.xlsx		df.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Product_Similarity

OBJECTIVE: The objective is to develop a model that includes:

Data Preparation:

Choosing a Clustering Method:

Clustering the Data:

Evaluating Clusters:

Interpreting and Visualizing Results:

INTERPRETATION:

About

Releases

Packages

Languages

priyans877/Product_Similarity

Folders and files

Latest commit

History

Repository files navigation

Product_Similarity

OBJECTIVE: The objective is to develop a model that includes:

Data Preparation:

Choosing a Clustering Method:

Clustering the Data:

Evaluating Clusters:

Interpreting and Visualizing Results:

INTERPRETATION:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages