[misc] benchmark_throughput : Add LoRA #11267

varun-sundar-rabindranath · 2024-12-17T19:52:42Z

Update benchmark_throughput.py to support LoRA benchmarking.

Examples:
Machine : 1xA100

python3 benchmarks/benchmark_throughput.py --model meta-llama/Llama-2-7b-hf --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts 1000

Throughput: 11.45 requests/s, 5509.93 total tokens/s, 2693.62 output tokens/s

python3 benchmarks/benchmark_throughput.py --model meta-llama/Llama-2-7b-hf --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts ${num_prompts} --max-loras 1 --max-lora-rank 8 --enable-lora --lora-path "yard1/llama-2-7b-sql-lora-test"

Throughput: 7.60 requests/s, 3656.98 total tokens/s, 1787.78 output tokens/s

python3 benchmarks/benchmark_throughput.py --model meta-llama/Llama-2-7b-hf --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts ${num_prompts} --max-loras 4 --max-lora-rank 8 --enable-lora --lora-path "yard1/llama-2-7b-sql-lora-test"

Throughput: 7.61 requests/s, 3664.67 total tokens/s, 1791.53 output tokens/s

python3 benchmarks/benchmark_throughput.py --model meta-llama/Llama-2-7b-hf --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts ${num_prompts} --async-engine

Throughput: 11.31 requests/s, 5441.21 total tokens/s, 2660.02 output tokens/s

python3 benchmarks/benchmark_throughput.py --model meta-llama/Llama-2-7b-hf --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts ${num_prompts} --async-engine --max-loras 1 --max-lora-rank 8 --enable-lora --lora-path "yard1/llama-2-7b-sql-lora-test"

Throughput: 7.59 requests/s, 3654.39 total tokens/s, 1786.51 output tokens/s

python3 benchmarks/benchmark_throughput.py --model meta-llama/Llama-2-7b-hf --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts ${num_prompts} --async-engine --max-loras 4 --max-lora-rank 8 --enable-lora --lora-path "yard1/llama-2-7b-sql-lora-test"

Throughput: 7.55 requests/s, 3634.68 total tokens/s, 1776.87 output tokens/s

cc @mgoin @robertgshaw2-neuralmagic @jeejeelee

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

github-actions · 2024-12-17T19:52:53Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

benchmarks/benchmark_throughput.py

jeejeelee · 2024-12-18T01:22:23Z

benchmarks/benchmark_throughput.py

+    # LoRA
+    parser.add_argument("--lora-path",
+                        type=str,
+                        default='yard1/llama-2-7b-sql-lora-test')


Not recommended to use default , as LoRA is strongly coupled bound to the model. We also need to add help to describe this arg

You are correct. I have removed the default 👍 Thanks !

jeejeelee · 2024-12-18T01:36:55Z

benchmarks/benchmark_throughput.py

+        request_tokenizer = tokenizer
+        if args.enable_lora:
+            lora_request, request_tokenizer = get_random_lora_request(args)
+


There are too many LoRA weights that don't include tokenizer files, which leads to request_tokenizer being None. We need to consider this

Nice catch ! Added a fallback to use the base tokenizer.

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

jeejeelee

overall lgtm, thanks

Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

Varun Sundar Rabindranath added 3 commits December 17, 2024 17:37

Add lora to benchmark throughput

d13f83d

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

fix caching

b52404e

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

format

7003b0e

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

Varun Sundar Rabindranath added 3 commits December 17, 2024 20:08

fix dataset none case

969396e

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

refactor

8ff64da

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

format

29d4002

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

jeejeelee reviewed Dec 18, 2024

View reviewed changes

benchmarks/benchmark_throughput.py Outdated Show resolved Hide resolved

jeejeelee reviewed Dec 18, 2024

View reviewed changes

Varun Sundar Rabindranath added 3 commits December 18, 2024 13:46

review comments

f640144

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

format

7d1db7f

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

fix to use all loras

1b3adda

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

varun-sundar-rabindranath requested a review from jeejeelee December 18, 2024 20:06

jeejeelee approved these changes Dec 19, 2024

View reviewed changes

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 19, 2024

DarkLight1337 merged commit 9835673 into vllm-project:main Dec 19, 2024
45 of 47 checks passed

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[misc] benchmark_throughput : Add LoRA (vllm-project#11267)

28eecca

Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[misc] benchmark_throughput : Add LoRA #11267

[misc] benchmark_throughput : Add LoRA #11267

varun-sundar-rabindranath commented Dec 17, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 17, 2024

jeejeelee Dec 18, 2024

varun-sundar-rabindranath Dec 18, 2024

jeejeelee Dec 18, 2024

varun-sundar-rabindranath Dec 18, 2024

jeejeelee left a comment

[misc] benchmark_throughput : Add LoRA #11267

[misc] benchmark_throughput : Add LoRA #11267

Conversation

varun-sundar-rabindranath commented Dec 17, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 17, 2024

jeejeelee Dec 18, 2024

Choose a reason for hiding this comment

varun-sundar-rabindranath Dec 18, 2024

Choose a reason for hiding this comment

jeejeelee Dec 18, 2024

Choose a reason for hiding this comment

varun-sundar-rabindranath Dec 18, 2024

Choose a reason for hiding this comment

jeejeelee left a comment

Choose a reason for hiding this comment

varun-sundar-rabindranath commented Dec 17, 2024 •

edited by github-actions bot

Loading