A NLP project building a sentiment analysis system as well as a phrase chunking system for Tweets on multiple languages like EN, FR, CN and SG.
Team Member:
- Mok Jun Neng
- Rachel Gan
- You Song Shan
Calculate emission parameters for HMM in part3.
Run following command line to start training and testing. The output file called dev.p2.out will be generated in data folder which contains the test set.
python part2/emission.py [train file] [dev.in file]
# for example
python part2/emission.py data/EN/train.dev data/EN/dev.in
Run following command line to start training and testing. The output file called dev.p3.out will be generated in data folder which contains the test set.
python part3/viterbi.py [train_file] [test_file]
# for example
python part3/viterbi.py data/EN/train.dev data/EN/dev.in
Run following command line to start training and testing. The output file dev.p4.out will be generated in data folder which contains the test set.
python part4/viterbi2.py [train_file] [test_file]
# for example
python part4/viterbi2.py data/EN/train.dev data/EN/dev.in
To try performance of different models, 3 different approaches had been implemented for part5 design challenge, results and explanation can be found in our final report:
-
CRF (Build from scratch)
python part5/crf-nolib.py [train file] [dev.in file] [result filepath]
-
Perceptron (Build from scratch)
python part5/structured_perceptron.py [train file] [dev.in file] [result filepath]
-
CRF (Build with external ML packages)
python part5/structured_perceptron.py [train file] [dev.in file] [result filepath]
-
MEMM (Build with external ML packages)
python part5/MEMM.py [train file] [dev.in file]
-
python part5/HMM_turingsmoothing/viterbi.py [train file] [dev.in file]
To evaluate the performance using script, run following:
python evalResult.py [gold truth file] [prediction file]