Training speed Teacher to Student #71

Godnoken · 2023-01-12T18:09:29Z

Is there any data that you can share on how long it took to train the student models with the recommended setup of 4 GPUs with 12GB memory (What GPU series and model are we talking about here..?)

Months, weeks, days?

I'm interested in potentially contributing in the future but I'd need to know what to expect before getting hardware to do so.

Cheers :)

ZJaume · 2023-05-18T09:36:32Z

It really depends on the size of the data used for distillation because the generation of n-best candidates takes a significant amount of the time. If you are distilling from a single transformer big in a mid-size language (5M to 40M sentences), I would say 1 week with a 12GB GPU. I trained some models with 2080ti 12GB and it's affordable, unless you train from a 2x or 4x ensemble of transformer bigs for English-French.

EDIT: but if you are asking for a GPU model because you want to buy someone, I'd suggest to go for one of the newest gen with more RAM than that. Maybe with a 4090 you can do all the work I mentioned in half a week.

Godnoken · 2023-05-23T12:06:46Z

Awesome, thank you very much for that info. I do not at the moment have the fortune to spend that kind of money for a 4090 or anything close to it, but hopefully that will change in the next 6 months or so. I'll be back.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training speed Teacher to Student #71

Training speed Teacher to Student #71

Godnoken commented Jan 12, 2023

ZJaume commented May 18, 2023 •

edited

Loading

Godnoken commented May 23, 2023

Training speed Teacher to Student #71

Training speed Teacher to Student #71

Comments

Godnoken commented Jan 12, 2023

ZJaume commented May 18, 2023 • edited Loading

Godnoken commented May 23, 2023

ZJaume commented May 18, 2023 •

edited

Loading