Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Larger 13B model underperforms BASE model, any idea why ? #22

Open
cramraj8 opened this issue Jun 2, 2024 · 3 comments
Open

Larger 13B model underperforms BASE model, any idea why ? #22

cramraj8 opened this issue Jun 2, 2024 · 3 comments

Comments

@cramraj8
Copy link

cramraj8 commented Jun 2, 2024

I tried to evaluate both unicamp-dl/mt5-base-en-msmarco and unicamp-dl/mt5-13b-mmarco-100k, but the performance of 13b is lower than base model. Here is a simple comparison of reranking results of BM25 top-100 results measured in nDCG@10. Did you observe similar trend, or there can be any underling reasons ? @rodrigonogueira4

image

@rodrigonogueira4
Copy link
Contributor

Hi @vjeronymo2 @lhbonifacio maybe do you have a hint of what is going on here?

@lhbonifacio
Copy link
Collaborator

Hi @cramraj8
From the languages in your results I guess you are using Mr Tydi, right?
I would say that the gap from 580M (mT5-base) to 13B and the multi-language are the main issues here.
As a hint, we have observed similar results when trying to finetune mT5 models for 10k steps (as this number has generated better results for the monoT5-english version). However, finetuning for 10k steps in a multi-language scenario was just not enough for the model to learn the reranking task. That is the reason you cannot find any multi-language model finetuned for just 10k in our huggingface hub. You are going up in the number of parameters scale, but not following it in the training data scale, so I would say that's the reason here.

@cramraj8
Copy link
Author

cramraj8 commented Jun 6, 2024

Hi @lhbonifacio , yes I am evaluating on Mr TyDi. I am a bit of confused here.

If I interpret your reply correctly, monoT5-english version shows optimum performance with only 10k training. But mT5-base does not show optimum at 10k, so you had to train till 100k to show improvements. However, mT5-13B trained with 100k is not yet optimum because we should train on even larger training data because the model size now has increased from base to 13B. Is that accurate ?

In summary, in the context of multilingual re-ranking when the model size increases (580M --> 13B) we should increase the training iterations or training sample size too ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants