PyTorch Implementation of the BiLSTM-CRF model as described in https://guillaumegenthial.github.io/.
This model builds upon that by adding including ELMO embeddings as a feature representation option. (For more detail about ELMo, please see the publication "Deep contextualized word representations")
For the Keras implementation (without ELMO) please refer to this link.
-
Requirements:
a. Packages: Anaconda, Pytorch, AllenNLP (if on linux and using elmo)
b. Data: Train, valid and test datasets in CoNLL 2003 NER format.
c. Glove 300B embeddings (If not using Elmo) -
Configure Settings:
a. Change settings in model/config.py
b. Main settings to change: File directories, model hyperparameters etc. -
Build Data:
a. Run build_data.py
i. Builds embedding dictionary, text file of words, chars tags, as well as idx to word and idx to char mapping for the model to read -
Train Model:
a. Run train.py -
Test Model:
a. Run test.py
b. Evaluates on test set. Also accepts other arguments to predict on custom string