This project is my attempt to solve the Quora Duplicate Questions machine learning challenge on Kaggle.
The program uses Python2.7. The following additional packages must be installed for it to run correctly:
Additionally the training and testing datasets must be downloaded from Kaggle here and placed in the same folder as the main notebook.
If all dependencies are installed correctly the notebook should run correctly without further modification.
It is possible to change which machine learning algorithm is used with a modification to just one line of code. Just set the algorithm_choice
parameter according to the following settings:
- 0 = Linear Regression
- 1 = SVR (not recommended - too slow)
- 2 = Decision Tree
- 3 = Random Forest
- 4 = XGBoost