Why does GPU memory increase when setting num-gpu-blocks? #10338

hughesadam87 · 2024-11-14T17:21:28Z

hughesadam87
Nov 14, 2024

I've set --num-gpu-blocks-override directly so that vLLM preallocates 90% of my GPU on startup (model weight and KV Cache).

In addition, I've set --gpu-memory-utilization to 0.9.

However, under load the system keeps allocating more and more memory and eventually hits an OOM. How is this possible - shouldn't the preallocated memory be fixed after startup?

Finally - is there a way to dynamically free up GPU memory? After all requests are processed, the additional memory is never released by vLLM even when system is idle for long periods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does GPU memory increase when setting num-gpu-blocks? #10338

{{title}}

Replies: 0 comments

Select a reply

Why does GPU memory increase when setting num-gpu-blocks? #10338

hughesadam87 Nov 14, 2024

Replies: 0 comments

hughesadam87
Nov 14, 2024