This folder contains the implementations for NLU task.
conda create -n NLU python=3.7
conda activate NLU
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
Install dependencies:
pip install -r requirements.txt
Install transformers
(version: v4.4.2
):
pip install -e .
Install loralib
:
pip install -e ../loralib/
- loralib/ contains the PEFT methods used for this task.
- examples/ contains the training codes.
- scripts/ contains the training scripts for different datasets.
Example: Adapting RoBERTa-base with BitFit on the MNLI Dataset
python -m torch.distributed.launch --master_port=8679 --nproc_per_node=1 \
examples/text-classification/run_glue.py \
--model_name_or_path roberta-base \
--task_name mnli \
--apply_bitfit \
--do_train --do_eval \
--max_seq_length 256 \
--per_device_train_batch_size 32 --learning_rate 5e-4 --num_train_epochs 7 \
--warmup_steps 1000 \
--cls_dropout 0.15 --weight_decay 0 \
--evaluation_strategy steps --eval_steps 3000 \
--save_strategy steps --save_steps 30000 \
--logging_steps 500 \
--seed 42 \
--root_output_dir ./output/glue/mnli \
--overwrite_output_dir
Hyper-parameter Setup
model_name_or_path
: Apply pre-trained models.roberta-base
for RoBERTa-base (125M).apply_bitfit
: Apply BitFit for the model.
Example: Adapting DeBERTaV3-base with Adapter on the CoLA Dataset
python -m torch.distributed.launch --master_port=8679 --nproc_per_node=1 \
examples/text-classification/run_glue.py \
--model_name_or_path microsoft/deberta-v3-base \
--task_name cola \
--apply_adapter --adapter_type houlsby --adapter_size 64 \
--do_train --do_eval --max_seq_length 64 \
--per_device_train_batch_size 32 --learning_rate 8e-4 \
--num_train_epochs 25 --warmup_steps 100 \
--cls_dropout 0.10 --weight_decay 0.00 \
--evaluation_strategy steps --eval_steps 100 \
--save_strategy steps --save_steps 10000 \
--logging_steps 10 \
--tb_writter_loginterval 100 \
--report_to tensorboard \
--seed 6 \
--root_output_dir ./output/glue/cola \
--overwrite_output_dir
Hyper-parameter Setup
model_name_or_path
: Apply pre-trained models.microsoft/deberta-v3-base
for DeBERTaV3-base (184M).apply_adapter
: Apply Adapter for the model.adapter_type
: Specify the type of Adapter.houlsby
for Houlsby Adapter andpfeiffer
for Pfeiffer Adapter.adapter_size
: Specify the size of Adapter.
Example: Adapting RoBERTa-Large with AdaLoRA on the MRPC Dataset
python -m torch.distributed.launch --master_port=8679 --nproc_per_node=1 \
examples/text-classification/run_glue.py \
--model_name_or_path roberta-large \
--task_name mrpc \
--apply_lora --apply_adalora --lora_type svd \
--target_rank 1 --lora_r 2 \
--reg_orth_coef 0.1 \
--init_warmup 600 --final_warmup 1800 --mask_interval 1 \
--beta1 0.85 --beta2 0.85 \
--lora_module query,key,value,intermediate,layer.output,attention.output \
--lora_alpha 32 \
--do_train --do_eval --max_seq_length 320 \
--per_device_train_batch_size 32 --learning_rate 1e-3 \
--num_train_epochs 30 --warmup_ratio 0.1 \
--cls_dropout 0.0 --weight_decay 0.01 \
--evaluation_strategy steps --eval_steps 300 \
--save_strategy steps --save_steps 3000 \
--logging_steps 100 \
--report_to tensorboard \
--seed 6 \
--root_output_dir ./output/debertav3-base/mrpc \
--overwrite_output_dir
Hyper-parameter Setup
Example: Adapting RoBERTa-Large with Other Methods on the RTE Dataset
We here take LoRA as an example.python -m torch.distributed.launch --master_port=8679 --nproc_per_node=1 \
examples/text-classification/run_glue.py \
--model_name_or_path roberta-large \
--task_name rte \
--apply_lora --lora_type frd \
--lora_r 2 \
--lora_module query,key,value,intermediate,layer.output,attention.output \
--lora_alpha 4 \
--do_train --do_eval --max_seq_length 320 \
--per_device_train_batch_size 32 --learning_rate 1.2e-3 \
--num_train_epochs 50 --warmup_steps 200 \
--cls_dropout 0.20 --weight_decay 0.01 \
--evaluation_strategy steps --eval_steps 100 \
--save_strategy steps --save_steps 10000 \
--logging_steps 10 --report_to tensorboard \
--seed 6 \
--root_output_dir ./output/glue/rte \
--overwrite_output_dir
Hyper-parameter Setup
Considering that you may have different computing resources, we provide a simple table of the GPU resources required for the different models.
Model Name | GPU Requirement |
---|---|
RoBERTa-base | RTX 3090 (24G) |
DeBERTaV3-base | RTX 3090 (24G) & A100 (80G) |
RoBERTa-Large | A100 (80G) |
DeBERTa-XXL | A100 (80G) |
More.. | ... |
If you encounter any issues, please refer to these two links: LoRA and AdaLoRA. These resources cover the majority of problems you may encounter. Additionally, you are also welcome to contact us by submitting an issue or via email.
This directory is modified based on LoRA and AdaLoRA. We greatly appreciate their remarkable works.