Skip to content

Extract features from Samsung and LG washing machine datasets, using clustering (K-means) and ML models (Random Forest, LDA) to group similar models. Develop similarity metrics to match Samsung models to LG equivalents, aiding LG management in competitor analysis and sales strategy.

Notifications You must be signed in to change notification settings

priyans877/Product_Similarity

Repository files navigation

Product_Similarity

OBJECTIVE: The objective is to develop a model that includes:

  1. Extracting relevant features from Samsung and LG washing machine datasets.
  2. Employing clustering and Machine Learning algorithms (such as K-means, Random Forest, Topic Modeling, LDA) to group similar models from both brands.
  3. Developing a similarity metric to quantify the resemblance between Samsung and LG washing machine models within the same cluster.
  4. Providing recommendations of Samsung models that are closest in terms of features to a given LG model.
  5. It will help LG Management team to find same model and making analysis of how ot improve there sales in view of their opponent. METHODOLOGY AND PROCEDURE:

Data Preparation:

  1. Data Collecting: Created a selenium script that scrap all washing machines detail from LG & Samsung’s website in a schemas that matters while performing analsys.

  2. Data Cleaning: Handle missing values, outliers, and noise in the data.

  3. Data Visualization: Applying histogram to known the underlying distribution of variables.

  4. Data Transformation: Normalize or standardize data to ensure that variables are on the same scale.

Choosing a Clustering Method:

  1. Select an appropriate clustering algorithm based on the nature of the data and the specific goals of the analysis. It includes: a. K-means is a widely-used clustering algorithm that divides a dataset into a specified number (k) of clusters. It is an iterative algorithm that assigns each data
    point to the nearest cluster center, then adjusts the cluster centers to be the mean of the points in the cluster.

    b. BIRCH stands for Balanced Iterative Hierarchical Based Clustering. It is used on very large datasets where K-Means cannot practically scale. BIRCH algorithm divides large data into small clusters and tries to retain the maximum amount of information possible.

    c. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is an unsupervised machine learning algorithm that is used to identify clusters of points in a dataset based on their density

    d. Hierarchical clustering is a method of clustering that involves creating a hierarchy (tree like structure) of clusters. Determining the Number of Clusters

  2. Use methods like the Elbow Method or Silhouette Score to decide on the optimal number of clusters. a. Elbow Method: Plots the within-cluster sum of squares (WCSS) against the number of clusters and looks for an "elbow point" where the rate of decrease sharply slows. b. Silhouette Score: Measures how similar a data point is to its own cluster compared to other clusters.

Clustering the Data:

  1. Apply the chosen clustering algorithm to the dataset.
  2. Assign each data point to a cluster.

Evaluating Clusters:

  1. Assess the quality of the clusters using internal evaluation metrics like WCSS.
  2. Validate the stability and robustness of the clusters by cross validation

Interpreting and Visualizing Results:

  1. Interpret the characteristics of each cluster by examining the centroid or representative points.
  2. Determine the average values of each variable within each cluster to understand the typical characteristics of products in that group.
  3. Compare cluster profiles to uncover the key attributes that distinguish one cluster from another. These factors will help in defining the unique characteristics of each Product segment.

INTERPRETATION:

a. Model Similarity: The project will determine the degree of similarity between Samsung and LG washing machine models, aiding in product comparisons and competitive analysis.

b. Feature Importance: By examining the features that contribute most to model similarity, insights can be gained into the key factors influencing consumer preferences and product differentiation.

c. Market Segmentation: The clustering algorithms can potentially identify distinct groups of washing machines based on their features, allowing for targeted marketing and product development strategies.

d. Product Recommendations: The project can assist in recommending Samsung models to customers based on their preferences for specific LG features, enhancing customer satisfaction and sales.

e. Algorithm Evaluation: The performance of different machine learning algorithms in this context will be assessed, providing valuable insights for future projects involving product similarity analysis

About

Extract features from Samsung and LG washing machine datasets, using clustering (K-means) and ML models (Random Forest, LDA) to group similar models. Develop similarity metrics to match Samsung models to LG equivalents, aiding LG management in competitor analysis and sales strategy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published