Skip to content

Latest commit

 

History

History
127 lines (93 loc) · 4.58 KB

README.md

File metadata and controls

127 lines (93 loc) · 4.58 KB

CZ4045-NLP-Assignment-2

Assignment 2

Python Version - 3.7 is used throughout our project

Download or Clone the repository before running any file

Question One

Firstly, the main dependency, torch, for this question is installed. code for this question is under Question1/code

For Windows

   $ pip install torch===1.7.0 torchvision===0.8.1 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

For MacOS

   $ pip install torch torchvision torchaudio

(MacOS Binaries dont support CUDA, install from source if CUDA is needed)

The version of torch used in this project is Pytorch 1.7.0

A Language model is built to predict the probability of the next word in the sequence based on the words already observed in the sequence. The first model given was implemented using RNN (LSTM). LSTMs are Long Short Term Memory Networks that are capapable of remembering long term dependencies. We trained the default RNN model with the default parameters given to us as shown below -

   python Question1/code/Q1_main.py --model LSTM --epochs 6 --lr 20

At the end of 6 epochs, the test perplexity was noted to be 141.13 while our test loss was 4.95

We then built a FNN network which does the same function and also processed inputs as ngrams (N = 8). The optimizer used was Adam with a learning rate of 0.0003

The model can be run my using the ipynb notebook -

   Question1/code/NLP_Q1_FNN.ipynb

At the end of 6 epochs, the test perplexity was noted to be 3696.64 while our test loss was 8.22

Similarly, a model is trained by sharing weights between the embedding layer and the output layer. The model can be run my using the ipynb notebook -

   Question1/code/NLP_Q1_FNNS.ipynb

At the end of 6 epochs, the test perplexity was noted to be 3743.54 while our test loss was 8.23

At every 100 sequences, the epoch, sequence number, learning rate, loss and perplexity of each model are printed during training. Additionally, at the end of each epoch, the time taken, as well as validation loss and perplexity are displayed.

After both our models were trained, the text is generated by running Q1_generate.py To get the output using the base FNN model, we can run the code as shown below -

   python Question1/code/Q1_generate.py --checkpoint ../models/modelFNN.pt 

To get the output using the base FNNS model, we can run the code as shown below -

   python Question1/code/Q1_generate.py --checkpoint ../models/modelFNNS.pt 

The progress of the number of words generated are printed while the output texts are stored in a text file specified by the --outf argument. The already generated texts are stored in ../generated/generated FNN.txt and ../generated/generated FNNS.txt

Question Two

Named Entity Recognition (NER) is an information extraction technique to identify and classify named entities in a corpus. A model is trained to identify the entities in a sequence. The data set contains four different types of named entities: PERSON, LOCATION, ORGANIZATION, and MISC.

For Question 2, getting the pre trained glove emebeddings for NER is necessary. This is done by -

cd Question2/data/
mkdir glove.6B

And download the txt from the link given below -

   https://www.kaggle.com/danielwillgeorge/glove6b100dtxt

We have altered the default code given to us to ensure that all models can be run by changing just one parameter. There are 4 different models that are built to perform the task of NER. The main noteboook is the NLP_Q2.ipynb.

   Question2/NLP_Q2.ipynb

The default model given to us used LSTMs as word level encoders. This model can be run by changing the parameter as shown below in code cell 2 of the notebook.

   parameters['layers_mode']='LSTM'

The best validation F-score goes up 0.92 while using LSTMs for encoding.

The next model we trained was by replacing the LSTM word level encoder with a CNN layer. This can be run by -

   parameters['layers_mode']='CNN'

The best validation F-score goes up 0.85 while using CNNs instead of LSTMs.

We then added one more CNN layer to see how the accuracy would change.

   parameters['layers_mode']='2CNN'

The best validation F-score reached 0.82 when we added an additional CNN layer.

Finally, we set our models to have three CNN layers.

   parameters['layers_mode']='3CNN'

It was noticed that the best validation F-score was 0.78

While training the model, we set our

   parameters['epoch']=3

and the output prints the F-score and Best F-score for Train, Validation and Test sets.