vLLM not using all CPU's available on my machine #14133

bartmch · 2025-03-03T12:23:07Z

bartmch
Mar 3, 2025

I am running vLLM as a docker container, only specifying the model repository and tensor-parallel-size set to 2. The model loads successfully and I can use the /completion endpoint, both GPU's are used my vLLM. The issue is that only 2 CPU's are being used maxing out at 2*100%. I have 24 CPU's on my system. How can I make sure vLLM uses more than 2 CPU's?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM not using all CPU's available on my machine #14133

{{title}}

Replies: 0 comments

Select a reply

vLLM not using all CPU's available on my machine #14133

bartmch Mar 3, 2025

Replies: 0 comments

bartmch
Mar 3, 2025