GitHub

01.112_ML_Project

A NLP project building a sentiment analysis system as well as a phrase chunking system for Tweets on multiple languages like EN, FR, CN and SG.

Team Member:

Mok Jun Neng
Rachel Gan
You Song Shan

Instruction

Part2 - Emissions

Calculate emission parameters for HMM in part3.

Run following command line to start training and testing. The output file called dev.p2.out will be generated in data folder which contains the test set.

python part2/emission.py [train file] [dev.in file]
# for example
python part2/emission.py data/EN/train.dev data/EN/dev.in

Part3 - First-order HMM

Run following command line to start training and testing. The output file called dev.p3.out will be generated in data folder which contains the test set.

python part3/viterbi.py [train_file] [test_file]
# for example
python part3/viterbi.py data/EN/train.dev data/EN/dev.in

Part4 - Second-order HMM

Run following command line to start training and testing. The output file dev.p4.out will be generated in data folder which contains the test set.

python part4/viterbi2.py [train_file] [test_file]
# for example
python part4/viterbi2.py data/EN/train.dev data/EN/dev.in

Part5 - Design Challenge

To try performance of different models, 3 different approaches had been implemented for part5 design challenge, results and explanation can be found in our final report:

CRF (Build from scratch)

python part5/crf-nolib.py [train file] [dev.in file] [result filepath]

Perceptron (Build from scratch)

python part5/structured_perceptron.py [train file] [dev.in file] [result filepath]

CRF (Build with external ML packages)

python part5/structured_perceptron.py [train file] [dev.in file] [result filepath]

MEMM (Build with external ML packages)

python part5/MEMM.py [train file] [dev.in file]

HMM

python part5/HMM_turingsmoothing/viterbi.py [train file] [dev.in file]

Evaluate

To evaluate the performance using script, run following:

python evalResult.py [gold truth file] [prediction file]

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.vscode		.vscode
data		data
part2		part2
part3		part3
part4		part4
part5		part5
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
evalResult.py		evalResult.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

01.112_ML_Project

Instruction

Part2 - Emissions

Part3 - First-order HMM

Part4 - Second-order HMM

Part5 - Design Challenge

Evaluate

About

Uh oh!

Releases

Packages

Languages

rgan19/01.112_ML_Project

Folders and files

Latest commit

History

Repository files navigation

01.112_ML_Project

Instruction

Part2 - Emissions

Part3 - First-order HMM

Part4 - Second-order HMM

Part5 - Design Challenge

Evaluate

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages