This is the original implementation of EmBERT model from the paper "Multitask Learning Using BERT with Task-Embedded Attention". Our code is strongly based on the BERT and PALs implementation of Asa Cooper Stickland and Iain Murray.
Below one can find weights for the following models:
- BERT pretrained weights - we used the
uncased_L-12_H-768_A-12
model's weights shared by Google. - EmBERT weights.
Moreover we share a file with EmBERT GLUE submission.
In configs/embert_config.json
one can find the config needed to train the EmBERT model.
run_multi_task.py
is a script that runs a multitask model training.run_test_multi_task
is a script that returns the model predictions on GLUE benchmark.
Below one can see, how to run EmBERT training
export BERT_BASE_DIR=uncased_L-12_H-768_A-12
export BERT_PYTORCH_DIR=uncased_L-12_H-768_A-12
export GLUE_DIR=glue/glue_data
export SAVE_DIR=save_dir
python run_multi_task.py \
--seed 1 \
--output_dir $SAVE_DIR/embert \
--tasks all \
--sample 'anneal'\
--multi \
--do_train \
--do_eval \
--do_lower_case \
--data_dir $GLUE_DIR/ \
--vocab_file $BERT_BASE_DIR/vocab.txt \
--bert_config_file $BERT_BASE_DIR/embert_config.json \
--init_checkpoint $BERT_PYTORCH_DIR/pytorch_model.bin \
--max_seq_length 128 \
--train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 25.0 \
--gradient_accumulation_steps 1