Skip to content

Kaggle competition to predict likelihood that pairs of Quora questions are duplicates

Notifications You must be signed in to change notification settings

adammalpass/quora_duplicates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quora Duplicate Questions

This project is my attempt to solve the Quora Duplicate Questions machine learning challenge on Kaggle.

Requirements

The program uses Python2.7. The following additional packages must be installed for it to run correctly:

Additionally the training and testing datasets must be downloaded from Kaggle here and placed in the same folder as the main notebook.

Usage

If all dependencies are installed correctly the notebook should run correctly without further modification. It is possible to change which machine learning algorithm is used with a modification to just one line of code. Just set the algorithm_choice parameter according to the following settings:

  • 0 = Linear Regression
  • 1 = SVR (not recommended - too slow)
  • 2 = Decision Tree
  • 3 = Random Forest
  • 4 = XGBoost

About

Kaggle competition to predict likelihood that pairs of Quora questions are duplicates

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published