🛍️ HomeDepot Product Search Relevance Prediction

📜 Overview

This project builds a ranking model to predict the relevance score for query-product pairs in HomeDepot’s product search. Using Learning to Rank (LTR), we apply a Pointwise Approach to train a regression model based on text similarity features between the user query and product information.

📌 Note: Product description dataset is very large. Please contact me if you want to use it.

📌 Dataset:

Train Set (train_new.csv) – Query-product pairs with ground-truth relevance scores.
Test Set (test_new.csv) – Query-product pairs for prediction.
Product Descriptions (product_descriptions_new.csv) – Additional product details.
Product Attributes (attributes_new.csv) – Additional structured product attributes.

📌 Goal:

Compute text similarity between search_term and:
- product_title
- product_description
- product_attributes
Generate feature vectors for training and testing.
Train a machine learning model to predict relevance scores.
Evaluate performance using Mean Squared Error (MSE) & R² score.

📌 Programming Language: Python 3
📌 Libraries Used: pandas, scikit-learn, nltk, numpy, scipy, XGBoost

🚀 Approach

1️⃣ Data Preprocessing

Text Cleaning (e.g., spelling correction, numerical normalization).
Tokenization & Stopword Removal using NLTK.
TF-IDF Vectorization for product details.

2️⃣ Feature Engineering

Compute Cosine Similarity between search_term and:
- product_title
- product_description
- product_attributes
Compute additional similarity measures (e.g., Jaccard, Dice Coefficient, Overlap).
Minimum 6 similarity features for each query-product pair.

3️⃣ Model Training & Evaluation

Train models using Supervised Learning Algorithms:
- Linear Regression
- Support Vector Regressor (SVR)
- XGBoost Regressor
- Neural Networks
Evaluate model performance using:
- Mean Squared Error (MSE)
- R² Score

4️⃣ Predictions on Test Data

Generate predicted relevance scores for test_new.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LETOR data 1.7z		LETOR data 1.7z
LETOR.ipynb		LETOR.ipynb
README.md		README.md
test_new.csv		test_new.csv
train_new.csv		train_new.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛍️ HomeDepot Product Search Relevance Prediction

📜 Overview

🚀 Approach

1️⃣ Data Preprocessing

2️⃣ Feature Engineering

3️⃣ Model Training & Evaluation

4️⃣ Predictions on Test Data

About

Languages

pngo1997/Learning-to-Rank-Algorithm

Folders and files

Latest commit

History

Repository files navigation

🛍️ HomeDepot Product Search Relevance Prediction

📜 Overview

🚀 Approach

1️⃣ Data Preprocessing

2️⃣ Feature Engineering

3️⃣ Model Training & Evaluation

4️⃣ Predictions on Test Data

About

Topics

Resources

Stars

Watchers

Forks

Languages