Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmarking] Thorough benchmarking for Transformers! #96

Open
Vaibhavs10 opened this issue Dec 1, 2023 · 1 comment · May be fixed by #97
Open

[Benchmarking] Thorough benchmarking for Transformers! #96

Vaibhavs10 opened this issue Dec 1, 2023 · 1 comment · May be fixed by #97

Comments

@Vaibhavs10
Copy link
Owner

I am starting this issue to do a more thorough benchmarking than the notebooks used in the repo.

What should we measure:

  1. Time for generation
  2. Max GPU VRAM
  3. Accuracy

Hardware (this would give the best of both worlds IMO):

  1. Consumer (T4)
  2. A100s

Tricks that we should measure:

  1. scaled_dot_product_attention via BetterTransformers API in Optimum.
  2. Flash Attention 2
  3. Chunked batching via the pipeline API in Transformers
  4. Speculative Decoding

Models that we should test:

  1. openai/whisper-large-v3
  2. distil-whisper/distil-large-v2
@Vaibhavs10 Vaibhavs10 linked a pull request Dec 1, 2023 that will close this issue
@BBC-Esq
Copy link

BBC-Esq commented Dec 28, 2023

Has this been finalized yet just out of curiosity?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants