I ran some quick benchmarks on a very beefy NVidia GPU #1773

golgeek · 2024-02-29T01:29:53Z

golgeek
Feb 29, 2024
Collaborator

I got access to an Azure server with an NVidia H100 GPU, ran some very quick benchmarks and wanted to share the results.

Bench params:

Model: HuggingFaceH4/zephyr-7b-beta
Backend: vLLM
Backend is already started, and model preloaded in the GPU memory
System prompt: "You are a very succinct assistant and only do what you're told to do"
User prompts: "Just repeat the following number: [REQUEST_NUMBER]"
API path: /v1/chat/completions
Stream: true
LocalAI does the chat templating

Didn't spend much time on it and don't have tons of stats, but I ran 10K queries with some variations in the number of parallel requests, and registered the maximum times between request submission and result:

200 parallel requests: 1451ms FT / 4909ms LT / 50s runtime
250 parallel requests: 3326ms FT / 6325ms LT / 45s runtime
400 parallel requests: 7440ms FT / 22083ms LT / 45s runtme
500 parallel requests: 7297ms FT / 22284ms LT / 44s runtime
1K parallel requests: 7363ms FT / 27471ms LT / 43s runtime
2K parallel requests: 7336ms FT / 29841ms LT / 40s runtime
5K parallel requests: 23744ms FT / 38560ms LT / 40s runtime
10K parallel requests: 358475ms FT / 38475ms LT / 38.6s runtime

FT = longest time to first token (the longest time it took for LocalAI to start streaming the model answer)
LT = longest time to last token (the longest time it took for LocalAI to finish streaming the model answer)

So yes, this is an extremely beefy GPU, a small model, an easy prompt and the performance degrades way passed usability... but LocalAI handled 10K parallel requests and absolutely never errored out!

You've built an amazing thing @mudler!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I ran some quick benchmarks on a very beefy NVidia GPU #1773

{{title}}

Replies: 0 comments

Select a reply

I ran some quick benchmarks on a very beefy NVidia GPU #1773

golgeek Feb 29, 2024 Collaborator

Replies: 0 comments

golgeek
Feb 29, 2024
Collaborator