Large variance in convergence between T5 and T5.1 #960
-
Hola! I am fine tune training a T5 model for multilabel text classification and recently integrated the T5.1 checkpoints in the training pipeline. But the initial loss numbers and convergence rates are night and day, T5 starts to converge after a single epoch but T5.1 has high loss values and takes quite a bit longer, if at all. My model params are: train_batch_size 2 Any thoughts would be greatly appreciated TIA |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
This lists some differences between the versions. |
Beta Was this translation helpful? Give feedback.
This lists some differences between the versions.
https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md
It also says you should reenable dropout when fine tuning on T5.1