New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Benchmarking] Thorough benchmarking for Transformers! #96

Open

Vaibhavs10 opened this issue Dec 1, 2023 · 1 comment · May be fixed by #97

Owner

Vaibhavs10 commented Dec 1, 2023

I am starting this issue to do a more thorough benchmarking than the notebooks used in the repo.

What should we measure:

Time for generation
Max GPU VRAM
Accuracy

Hardware (this would give the best of both worlds IMO):

Consumer (T4)
A100s

Tricks that we should measure:

scaled_dot_product_attention via BetterTransformers API in Optimum.
Flash Attention 2
Chunked batching via the pipeline API in Transformers
Speculative Decoding

Models that we should test:

The text was updated successfully, but these errors were encountered:

Vaibhavs10 linked a pull request

that will close this issue

Add Transformers benchmarks. #97

Draft

BBC-Esq commented Dec 28, 2023

Has this been finalized yet just out of curiosity?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment