vLLM部署后设置"CUDA_VISIBLE_DEVICES"无效 #359

Jasper-LittleBrotherHeart · 2025-02-12T16:35:50Z

代码如下：
`def get_completion(prompts, model, tokenizer=None, max_tokens=512, temperature=0.8, top_p=0.95, max_model_len=2048):
stop_token_ids = [151329, 151336, 151338]
# 创建采样参数。temperature 控制生成文本的多样性，top_p 控制核心采样的概率
sampling_params = SamplingParams(temperature=temperature, top_p=top_p, max_tokens=max_tokens, stop_token_ids=stop_token_ids)
# 初始化 vLLM 推理引擎
os.environ["CUDA_VISIBLE_DEVICES"] = "4,5"
print(os.environ["CUDA_VISIBLE_DEVICES"])
llm = LLM(model=model, tokenizer=tokenizer, max_model_len=max_model_len, trust_remote_code=True)
outputs = llm.generate(prompts, sampling_params)
return outputs

def load_model_paths(file_path):
with open(file_path, 'r') as f:
model_paths = f.readlines()
return [path.strip() for path in model_paths]

def load_prompts(file_path):
with open(file_path, 'r') as f:
prompts = json.load(f)
return prompts仍然报错，显示在GPU0上运行：Traceback (most recent call last):
File "test.py", line 76, in
responses = get_completion(prompts_to_process, model_path)
File "test.py", line 13, in get_completion
llm = LLM(model=model, tokenizer=tokenizer, max_model_len=max_model_len, trust_remote_code=True)
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/entrypoints/llm.py", line 112, in init
self.llm_engine = LLMEngine.from_engine_args(
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 196, in from_engine_args
engine = cls(
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 110, in init
self.model_executor = executor_class(model_config, cache_config,
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/executor/gpu_executor.py", line 37, in init
self._init_worker()
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/executor/gpu_executor.py", line 66, in _init_worker
self.driver_worker.load_model()
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/worker/worker.py", line 107, in load_model
self.model_runner.load_model()
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/worker/model_runner.py", line 95, in load_model
self.model = get_model(
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/model_executor/model_loader.py", line 81, in get_model
model = model_class(model_config.hf_config, linear_method,
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/model_executor/models/qwen2.py", line 298, in init
self.model = Qwen2Model(config, linear_method)
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/model_executor/models/qwen2.py", line 237, in init
self.layers = nn.ModuleList([
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/model_executor/models/qwen2.py", line 238, in
Qwen2DecoderLayer(config, layer_idx, linear_method)
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/model_executor/models/qwen2.py", line 181, in init
self.mlp = Qwen2MLP(
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/model_executor/models/qwen2.py", line 62, in init
self.gate_up_proj = MergedColumnParallelLinear(
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/model_executor/layers/linear.py", line 260, in init
super().init(input_size, sum(output_sizes), bias, gather_output,
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/model_executor/layers/linear.py", line 181, in init
self.linear_weights = self.linear_method.create_weights(
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/vllm/model_executor/layers/linear.py", line 63, in create_weights
weight = Parameter(torch.empty(output_size_per_partition,
File "/n/work3/jzhao/miniconda3/envs/muser/lib/python3.8/site-packages/torch/utils/_device.py", line 77, in torch_function
return func(*args, **kwargs)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 540.00 MiB. GPU 0 has a total capacty of 47.54 GiB of which 59.75 MiB is free. Including non-PyTorch memory, this process has 47.47 GiB memory in use. Of the allocated memory 47.01 GiB is allocated by PyTorch, and 14.33 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF`

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM部署后设置"CUDA_VISIBLE_DEVICES"无效 #359

vLLM部署后设置"CUDA_VISIBLE_DEVICES"无效 #359

Jasper-LittleBrotherHeart commented Feb 12, 2025

vLLM部署后设置"CUDA_VISIBLE_DEVICES"无效 #359

vLLM部署后设置"CUDA_VISIBLE_DEVICES"无效 #359

Comments

Jasper-LittleBrotherHeart commented Feb 12, 2025