Skip to content

rgan19/01.112_ML_Project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

01.112_ML_Project

A NLP project building a sentiment analysis system as well as a phrase chunking system for Tweets on multiple languages like EN, FR, CN and SG.

Team Member:

  • Mok Jun Neng
  • Rachel Gan
  • You Song Shan

Instruction

Part2 - Emissions

Calculate emission parameters for HMM in part3.

Run following command line to start training and testing. The output file called dev.p2.out will be generated in data folder which contains the test set.

python part2/emission.py [train file] [dev.in file]
# for example
python part2/emission.py data/EN/train.dev data/EN/dev.in 

Part3 - First-order HMM

Run following command line to start training and testing. The output file called dev.p3.out will be generated in data folder which contains the test set.

python part3/viterbi.py [train_file] [test_file]
# for example
python part3/viterbi.py data/EN/train.dev data/EN/dev.in 

Part4 - Second-order HMM

Run following command line to start training and testing. The output file dev.p4.out will be generated in data folder which contains the test set.

python part4/viterbi2.py [train_file] [test_file]
# for example
python part4/viterbi2.py data/EN/train.dev data/EN/dev.in 

Part5 - Design Challenge

To try performance of different models, 3 different approaches had been implemented for part5 design challenge, results and explanation can be found in our final report:

  • CRF (Build from scratch)

    python part5/crf-nolib.py [train file] [dev.in file] [result filepath]
  • Perceptron (Build from scratch)

    python part5/structured_perceptron.py [train file] [dev.in file] [result filepath]
  • CRF (Build with external ML packages)

    python part5/structured_perceptron.py [train file] [dev.in file] [result filepath]
  • MEMM (Build with external ML packages)

    python part5/MEMM.py [train file] [dev.in file]
  • HMM

    python part5/HMM_turingsmoothing/viterbi.py [train file] [dev.in file]

Evaluate

To evaluate the performance using script, run following:

python evalResult.py [gold truth file] [prediction file]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%