Larger 13B model underperforms BASE model, any idea why ? #22

cramraj8 · 2024-06-02T13:44:23Z

I tried to evaluate both unicamp-dl/mt5-base-en-msmarco and unicamp-dl/mt5-13b-mmarco-100k, but the performance of 13b is lower than base model. Here is a simple comparison of reranking results of BM25 top-100 results measured in nDCG@10. Did you observe similar trend, or there can be any underling reasons ? @rodrigonogueira4

rodrigonogueira4 · 2024-06-02T18:23:13Z

Hi @vjeronymo2 @lhbonifacio maybe do you have a hint of what is going on here?

lhbonifacio · 2024-06-03T14:20:26Z

Hi @cramraj8
From the languages in your results I guess you are using Mr Tydi, right?
I would say that the gap from 580M (mT5-base) to 13B and the multi-language are the main issues here.
As a hint, we have observed similar results when trying to finetune mT5 models for 10k steps (as this number has generated better results for the monoT5-english version). However, finetuning for 10k steps in a multi-language scenario was just not enough for the model to learn the reranking task. That is the reason you cannot find any multi-language model finetuned for just 10k in our huggingface hub. You are going up in the number of parameters scale, but not following it in the training data scale, so I would say that's the reason here.

cramraj8 · 2024-06-06T04:34:25Z

Hi @lhbonifacio , yes I am evaluating on Mr TyDi. I am a bit of confused here.

If I interpret your reply correctly, monoT5-english version shows optimum performance with only 10k training. But mT5-base does not show optimum at 10k, so you had to train till 100k to show improvements. However, mT5-13B trained with 100k is not yet optimum because we should train on even larger training data because the model size now has increased from base to 13B. Is that accurate ?

In summary, in the context of multilingual re-ranking when the model size increases (580M --> 13B) we should increase the training iterations or training sample size too ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Larger 13B model underperforms BASE model, any idea why ? #22

Larger 13B model underperforms BASE model, any idea why ? #22

cramraj8 commented Jun 2, 2024

rodrigonogueira4 commented Jun 2, 2024

lhbonifacio commented Jun 3, 2024

cramraj8 commented Jun 6, 2024

Larger 13B model underperforms BASE model, any idea why ? #22

Larger 13B model underperforms BASE model, any idea why ? #22

Comments

cramraj8 commented Jun 2, 2024

rodrigonogueira4 commented Jun 2, 2024

lhbonifacio commented Jun 3, 2024

cramraj8 commented Jun 6, 2024