Wine price predicting kaggle competition

This project was made for the (AMMI Ghana Bootcamp Kaggle competition)

Problem statement

Given the following features:

country (String) The country that the wine is from
province (String) The province or state that the wine is from
region_1 (String) The wine growing area in a province or state (ie Napa)
region_2 (String) Sometimes there are more specific regions within the wine growing area (ie Rutherford inside the Napa Valley), but this value can sometimes be blank
winery (String) The winery that made the wine
variety (String) The type of grapes used to make the wine (ie Pinot Noir)
designation (String) The vineyard within the winery where the grapes that made the wine are from
taster_name (String) taster name
taster_twitter_handle (String) taster twitter account name
description (String) A few sentences from a sommelier describing the wine's taste, smell, look, feel, etc.
points (Numeric) Number of points WineEnthusiast rated the wine on a scale of 1-100

We need to predict the price (Numeric) The cost for a bottle of wine.

dependencies

pip3 install -r requirements.txt

feature engnieering

for the models to be able to deal with the categorical features some preprocessing was made.

country, region_2, province, taster_name and variety were encoded as one hot vectors
title, region_1 and designation were vectorized using CountVectorizer from sklearn
taster_twitter_handle was ignored due to it's redundant contribution to the data (see visualisation.ipynb)
And finally the description feature was encoded using Word2Vec (by summing the vectors representing all of a training example description)

Models

Linear regression
Dicision Trees
Random Forests
Neural networks

Techniques used

K-Fold cross validation
Word Embeddings
GridSearch hyper-parameters optimization
One Hot Enconding
CountVectorizer
PCA

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
.gitignore		.gitignore
Decision trees.ipynb		Decision trees.ipynb
Linear regression-with-word2vec.ipynb		Linear regression-with-word2vec.ipynb
README.md		README.md
RF with k-fold.ipynb		RF with k-fold.ipynb
lstm for word embeddings.ipynb		lstm for word embeddings.ipynb
nn_model.ipynb		nn_model.ipynb
requirements.txt		requirements.txt
utils.py		utils.py
visualization.ipynb		visualization.ipynb
words_processing.ipynb		words_processing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wine price predicting kaggle competition

Problem statement

dependencies

feature engnieering

Models

Techniques used

About

Releases

Packages

Contributors 3

Languages

TatianaMoteuN/AMMI-Ghana-Bootcamp-Kaggle-Competition-Group-6

Folders and files

Latest commit

History

Repository files navigation

Wine price predicting kaggle competition

Problem statement

dependencies

feature engnieering

Models

Techniques used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages