Large variance in convergence between T5 and T5.1 #960

pablogranolabar · 2022-01-08T21:24:13Z

pablogranolabar
Jan 8, 2022

Hola!

I am fine tune training a T5 model for multilabel text classification and recently integrated the T5.1 checkpoints in the training pipeline. But the initial loss numbers and convergence rates are night and day, T5 starts to converge after a single epoch but T5.1 has high loss values and takes quite a bit longer, if at all. My model params are:

train_batch_size 2
eval_batch_size 16
fp16 false
learning rate 3e-4 (and 1e-4)
max_seq_length 512
max_source_length 512
max_target_length 96
model_class: str = "T5Model"
dataset_class: Dataset = None
do_sample: bool = False
early_stopping: bool = True
evaluate_generated_text: bool = False
length_penalty: float = 2.0
max_length: int = 20
max_steps: int = -1
num_beams: int = 1
num_return_sequences: int = 1
preprocess_inputs: bool = True
repetition_penalty: float = 1.0
scheduler: str = "constant_schedule_with_warmup"
adafactor_relative_step: bool = False
adafactor_scale_parameter: bool = False
adafactor_warmup_init: bool = False
learning_rate: float = 1e-3
optimizer: str = "Adafactor"
special_tokens_list: list = field(default_factory=list)
top_k: float = None
top_p: float = None
use_multiprocessed_decoding: bool = True

Any thoughts would be greatly appreciated

TIA

Answered by coreyfournier

Dec 16, 2022

This lists some differences between the versions.
https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md
It also says you should reenable dropout when fine tuning on T5.1

View full answer

coreyfournier · 2022-12-16T20:43:55Z

coreyfournier
Dec 16, 2022

This lists some differences between the versions.
https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md
It also says you should reenable dropout when fine tuning on T5.1

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large variance in convergence between T5 and T5.1 #960

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Large variance in convergence between T5 and T5.1 #960

pablogranolabar Jan 8, 2022

Replies: 1 comment

coreyfournier Dec 16, 2022

pablogranolabar
Jan 8, 2022

coreyfournier
Dec 16, 2022