wsm-final

run ./run.sh in command line

Divide.py randomly selects files from the folder 'train' and put them into the folders 'new_train' and 'new_test' In the meanwhile, divide 'train_label' into its corresoponding 'new_train_label' and 'new_test_label'
Preprocess.py cleans the training and testing data
Predict.py make prediction of 'test' with the model generated by 'train'
Evaluate.py calculates the result of submission.csv by 'new_test_label' (ground truth)

Remember not to 'git add -A' so you won't end up including all the data

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Divide.py		Divide.py
Evaluate.py		Evaluate.py
PreProcess.py		PreProcess.py
Predict.py		Predict.py
README.md		README.md
clear.sh		clear.sh
run.sh		run.sh
terrier_stopwords.txt		terrier_stopwords.txt

Provide feedback