This repo is a minimalist implementation of a BERT Sentence Classifier. The goal of this repo is to show how to combine 3 of my favourite libraries to supercharge your NLP research.
My favourite libraries:
This project uses Python 3.6
Create a virtual env with (outside the project folder):
virtualenv -p python3.6 sbert-env
source sbert-env/bin/activate
Install the requirements (inside the project folder):
pip install -r requirements.txt
python training.py
Available commands:
Training arguments:
optional arguments:
--seed Training seed.
--batch_size Batch size to be used.
--accumulate_grad_batches Accumulated gradients runs K small batches of \
size N before doing a backwards pass.
--val_percent_check If you dont want to use the entire dev set, set \
how much of the dev set you want to use with this flag.
Early Stopping/Checkpoint arguments:
optional arguments:
--metric_mode If we want to min/max the monitored quantity.
--min_epochs Limits training to a minimum number of epochs
--max_epochs Limits training to a max number number of epochs
--save_top_k The best k models according to the quantity \
monitored will be saved.
Model arguments:
optional arguments:
--encoder_model BERT encoder model to be used.
--encoder_learning_rate Encoder specific learning rate.
--learning_rate Classification head learning rate.
--dropout Dropout to be applied to the BERT embeddings.
--train_csv Path to the file containing the train data.
--dev_csv Path to the file containing the dev data.
--test_csv Path to the file containing the test data.
--loader_workers How many subprocesses to use for data loading.
--label_set Set of labels we want to use in our classification task (e.g: 'pos,neg')
Note: After BERT several BERT-like models were released. You can test different size models like Mini-BERT and DistilBERT which are much smaller.
- Mini-BERT only contains 2 encoder layers with hidden sizes of 128 features. Use it with the flag:
--encoder_model google/bert_uncased_L-2_H-128_A-2
- DistilBERT contains only 6 layers with hidden sizes of 768 features. Use it with the flag:
--encoder_model distilbert-base-uncased
Training command example:
python training.py \
--gpus 1 \
--batch_size 32 \
--accumulate_grad_batches 1 \
--loader_workers 8 \
--nr_frozen_epochs 1 \
--encoder_model google/bert_uncased_L-2_H-128_A-2
Testing the model on shell:
python interact.py --experiment experiments/version_{date}
Launch tensorboard with:
tensorboard --logdir="experiments/"
To make sure all the code follows the same style we use Black.