Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training speed Teacher to Student #71

Open
Godnoken opened this issue Jan 12, 2023 · 2 comments
Open

Training speed Teacher to Student #71

Godnoken opened this issue Jan 12, 2023 · 2 comments

Comments

@Godnoken
Copy link
Contributor

Is there any data that you can share on how long it took to train the student models with the recommended setup of 4 GPUs with 12GB memory (What GPU series and model are we talking about here..?)

Months, weeks, days?

I'm interested in potentially contributing in the future but I'd need to know what to expect before getting hardware to do so.

Cheers :)

@ZJaume
Copy link
Contributor

ZJaume commented May 18, 2023

It really depends on the size of the data used for distillation because the generation of n-best candidates takes a significant amount of the time. If you are distilling from a single transformer big in a mid-size language (5M to 40M sentences), I would say 1 week with a 12GB GPU. I trained some models with 2080ti 12GB and it's affordable, unless you train from a 2x or 4x ensemble of transformer bigs for English-French.

EDIT: but if you are asking for a GPU model because you want to buy someone, I'd suggest to go for one of the newest gen with more RAM than that. Maybe with a 4090 you can do all the work I mentioned in half a week.

@Godnoken
Copy link
Contributor Author

Awesome, thank you very much for that info. I do not at the moment have the fortune to spend that kind of money for a 4090 or anything close to it, but hopefully that will change in the next 6 months or so. I'll be back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants