Add Transformers benchmarks. #97

Vaibhavs10 · 2023-12-01T18:17:29Z

Closes: #96

(Preliminary script for now)

Vaibhavs10 · 2023-12-01T18:24:54Z

Maybe I should test for different chunk lengths too?

Vaibhavs10 · 2023-12-01T18:39:39Z

Steps to build the environment:

python -m venv benchmark
source benchmark/bin/activate
pip install --no-cache-dir torch transformers optimum accelerate wheel flash-attn --no-build-isolation

Vaibhavs10 · 2023-12-01T19:13:39Z

Cool I think we're done here! 🚀

You should be able to run this script via python script/benchmark.py

Here's how the output looks like:

Running Model: openai/whisper-large-v3
Flash Attention: True
Batch Size: 1
Total time: 7.989270925521851
Total memory: 3608.0

All the results are saved over a benchmark.json file.

Next step: Let this run on an A100 and a T4 (I'll probably only get to this tomorrow!)

sanchit-gandhi

Feel free to have a look at what we used for Distil-Whisper benchmarking as an example: for evaluating over a dataset and computing the WER: https://github.com/huggingface/distil-whisper/blob/914dcdf3919552d5a3826a9d5db99b059ddcc16e/training/flax/run_speed_pt.py#L595

=> it's a bit messy, but largely follows the same structure as the short-form script. You can just pull out the logic we use for pipeline here (rather than original Whisper or model + processor)

sanchit-gandhi · 2023-12-05T11:30:18Z

scripts/benchmark.py

+            start = time.time()
+            outputs = pipe(
+                file_name,
+                chunk_length_s=30,


The configuration is a bit different for Whisper vs Distil-Whisper:

Whisper: chunk length 30s, with timestamps

Distil-Whisper: chunk length 15s, without timestamps

sanchit-gandhi · 2023-12-05T11:31:23Z

scripts/benchmark.py

+                chunk_length_s=30,
+                batch_size=batch_size,
+                return_timestamps=True,
+            )


We should also force the language/task for multilingual Whisper/Distil-Whisper checkpoints

sanchit-gandhi · 2023-12-05T11:33:44Z

scripts/benchmark.py

+
+            max_mem = torch.cuda.max_memory_reserved()
+            max_mem_mb = max_mem / (1024 * 1024)
+            print(f"Total memory: {max_mem_mb}")


As mentioned offline, "relative memory" is probably the best estimate we can give for memory, if we decide to provide one.

Add Transformers benchmarks.

cbd052e

Vaibhavs10 marked this pull request as draft December 1, 2023 18:17

Add memory measurements.

5c7d6da

Vaibhavs10 added 6 commits December 1, 2023 19:45

let there be print statements.

4feef9c

fix the borked conditional statement.

9d41592

Convert bytes to MB & record results in a dict.

b1958fd

Fix recording the benchmarks deets.

d8e6c7a

Remove them pesky warnings.

1c05bce

Fix logging.

44e66a7

sanchit-gandhi reviewed Dec 5, 2023

View reviewed changes

Vaibhavs10 added 5 commits December 12, 2023 14:59

add explicit language and task.

54c9a3c

Remove timestamps.

d207f67

debugging: memory usage.

1433ba3

fix chunk length for distil-whisper.

3bef857

debug: memory usage.

810a316

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Transformers benchmarks. #97

Add Transformers benchmarks. #97

Vaibhavs10 commented Dec 1, 2023

Vaibhavs10 commented Dec 1, 2023

Vaibhavs10 commented Dec 1, 2023

Vaibhavs10 commented Dec 1, 2023

sanchit-gandhi left a comment

sanchit-gandhi Dec 5, 2023

sanchit-gandhi Dec 5, 2023

sanchit-gandhi Dec 5, 2023

Add Transformers benchmarks. #97

Are you sure you want to change the base?

Add Transformers benchmarks. #97

Conversation

Vaibhavs10 commented Dec 1, 2023

Vaibhavs10 commented Dec 1, 2023

Vaibhavs10 commented Dec 1, 2023

Vaibhavs10 commented Dec 1, 2023

sanchit-gandhi left a comment

Choose a reason for hiding this comment

sanchit-gandhi Dec 5, 2023

Choose a reason for hiding this comment

sanchit-gandhi Dec 5, 2023

Choose a reason for hiding this comment

sanchit-gandhi Dec 5, 2023

Choose a reason for hiding this comment