Skip to content

Latest commit

 

History

History
48 lines (37 loc) · 1.9 KB

README.md

File metadata and controls

48 lines (37 loc) · 1.9 KB

PyTorch implementation of EmBERT

This is the original implementation of EmBERT model from the paper "Multitask Learning Using BERT with Task-Embedded Attention". Our code is strongly based on the BERT and PALs implementation of Asa Cooper Stickland and Iain Murray.

Models weights

Below one can find weights for the following models:

Moreover we share a file with EmBERT GLUE submission.

EmBERT training

In configs/embert_config.json one can find the config needed to train the EmBERT model.

  • run_multi_task.py is a script that runs a multitask model training.
  • run_test_multi_task is a script that returns the model predictions on GLUE benchmark.

Below one can see, how to run EmBERT training

export BERT_BASE_DIR=uncased_L-12_H-768_A-12
export BERT_PYTORCH_DIR=uncased_L-12_H-768_A-12
export GLUE_DIR=glue/glue_data
export SAVE_DIR=save_dir

python run_multi_task.py \
  --seed 1 \
  --output_dir $SAVE_DIR/embert \
  --tasks all \
  --sample 'anneal'\
  --multi \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir $GLUE_DIR/ \
  --vocab_file $BERT_BASE_DIR/vocab.txt \
  --bert_config_file $BERT_BASE_DIR/embert_config.json \
  --init_checkpoint $BERT_PYTORCH_DIR/pytorch_model.bin \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 25.0 \
  --gradient_accumulation_steps 1