You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there any data that you can share on how long it took to train the student models with the recommended setup of 4 GPUs with 12GB memory (What GPU series and model are we talking about here..?)
Months, weeks, days?
I'm interested in potentially contributing in the future but I'd need to know what to expect before getting hardware to do so.
Cheers :)
The text was updated successfully, but these errors were encountered:
It really depends on the size of the data used for distillation because the generation of n-best candidates takes a significant amount of the time. If you are distilling from a single transformer big in a mid-size language (5M to 40M sentences), I would say 1 week with a 12GB GPU. I trained some models with 2080ti 12GB and it's affordable, unless you train from a 2x or 4x ensemble of transformer bigs for English-French.
EDIT: but if you are asking for a GPU model because you want to buy someone, I'd suggest to go for one of the newest gen with more RAM than that. Maybe with a 4090 you can do all the work I mentioned in half a week.
Awesome, thank you very much for that info. I do not at the moment have the fortune to spend that kind of money for a 4090 or anything close to it, but hopefully that will change in the next 6 months or so. I'll be back.
Is there any data that you can share on how long it took to train the student models with the recommended setup of 4 GPUs with 12GB memory (What GPU series and model are we talking about here..?)
Months, weeks, days?
I'm interested in potentially contributing in the future but I'd need to know what to expect before getting hardware to do so.
Cheers :)
The text was updated successfully, but these errors were encountered: