You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2024-07-02 14:26:25,987 INFO worker.py:1771 -- Started a local Ray instance.
INFO 07-02 14:26:26 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='/mnt/data/user/luca_model/klara/models/unified_ai_platform_sft_white/v20240702122118/train-model', speculative_config=None, tokenizer='/mnt/data/user/luca_model/klara/models/unified_ai_platform_sft_white/v20240702122118/train-model', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=42, served_model_name=/mnt/data/user/luca_model/klara/models/unified_ai_platform_sft_white/v20240702122118/train-model)
/home/jeeves/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/jeeves/.local/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
�[36m(RayWorkerWrapper pid=1231)�[0m /home/jeeves/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/jeeves/.local/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
�[36m(RayWorkerWrapper pid=1231)�[0m warn(
NCCL version 2.20.5+cuda12.4
INFO 07-02 14:26:32 utils.py:623] Found nccl from library libnccl.so.2
INFO 07-02 14:26:32 pynccl.py:65] vLLM is using nccl==2.20.5
�[36m(RayWorkerWrapper pid=1231)�[0m INFO 07-02 14:26:32 utils.py:623] Found nccl from library libnccl.so.2
�[36m(RayWorkerWrapper pid=1231)�[0m INFO 07-02 14:26:32 pynccl.py:65] vLLM is using nccl==2.20.5
INFO 07-02 14:26:32 custom_all_reduce_utils.py:169] generating GPU P2P access cache in /home/jeeves/.config/vllm/gpu_p2p_access_cache_for_0,1.json
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
INFO 07-02 14:26:47 custom_all_reduce_utils.py:179] reading GPU P2P access cache from /home/jeeves/.config/vllm/gpu_p2p_access_cache_for_0,1.json
�[36m(RayWorkerWrapper pid=1231)�[0m INFO 07-02 14:26:47 custom_all_reduce_utils.py:179] reading GPU P2P access cache from /home/jeeves/.config/vllm/gpu_p2p_access_cache_for_0,1.json
ERROR 07-02 14:27:30 worker_base.py:148] Error executing method load_model. This might cause deadlock in distributed execution.
ERROR 07-02 14:27:30 worker_base.py:148] Traceback (most recent call last):
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 140, in execute_method
ERROR 07-02 14:27:30 worker_base.py:148] return executor(*args, **kwargs)
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 121, in load_model
ERROR 07-02 14:27:30 worker_base.py:148] self.model_runner.load_model()
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 147, in load_model
ERROR 07-02 14:27:30 worker_base.py:148] self.model = get_model(
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 21, in get_model
ERROR 07-02 14:27:30 worker_base.py:148] return loader.load_model(model_config=model_config,
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 249, in load_model
ERROR 07-02 14:27:30 worker_base.py:148] model.load_weights(
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/models/minicpm.py", line 536, in load_weights
ERROR 07-02 14:27:30 worker_base.py:148] param = params_dict[name]
ERROR 07-02 14:27:30 worker_base.py:148] KeyError: 'lm_head.weight'
[rank0]: Traceback (most recent call last):
[rank0]: File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]: return _run_code(code, main_globals, None,
[rank0]: File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]: exec(code, run_globals)
[rank0]: File "/local/apps/large_model_predict/open_llm_infer/llm_infer_vllm.py", line 224, in
[rank0]: llm_infer(sft_args)
[rank0]: File "/local/apps/large_model_predict/open_llm_infer/llm_infer_vllm.py", line 113, in llm_infer
[rank0]: llm = LLM(model=args.ckpt_dir, trust_remote_code=True, seed=42, tensor_parallel_size=torch.cuda.device_count())
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 144, in init
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 359, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 222, in init
[rank0]: self.model_executor = executor_class(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 25, in init
[rank0]: super().init(*args, **kwargs)
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in init
[rank0]: self._init_executor()
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 40, in _init_executor
[rank0]: self._init_workers_ray(placement_group)
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 172, in _init_workers_ray
[rank0]: self._run_workers("load_model",
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 246, in _run_workers
[rank0]: driver_worker_output = self.driver_worker.execute_method(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 149, in execute_method
[rank0]: raise e
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 140, in execute_method
[rank0]: return executor(*args, **kwargs)
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 121, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 147, in load_model
[rank0]: self.model = get_model(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 21, in get_model
[rank0]: return loader.load_model(model_config=model_config,
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 249, in load_model
[rank0]: model.load_weights(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/models/minicpm.py", line 536, in load_weights
[rank0]: param = params_dict[name]
[rank0]: KeyError: 'lm_head.weight'
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] Error executing method load_model. This might cause deadlock in distributed execution.
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] Traceback (most recent call last):
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 140, in execute_method
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] return executor(*args, **kwargs)
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 121, in load_model
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] self.model_runner.load_model()
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 147, in load_model
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] self.model = get_model(
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 21, in get_model
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] return loader.load_model(model_config=model_config,
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 249, in load_model
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] model.load_weights(
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/models/minicpm.py", line 536, in load_weights
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] param = params_dict[name]
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] KeyError: 'lm_head.weight'
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Inferring failed, please check log
The text was updated successfully, but these errors were encountered:
@uRENu I think this issue is resolved with the fix in #6758 to ignore lm_head.weight when tie_word_embeddings is set to true.
Do you still see this error when using the latest vLLM version?
@uRENu I think this issue is resolved with the fix in #6758 to ignore lm_head.weight when tie_word_embeddings is set to true. Do you still see this error when using the latest vLLM version?
Your current environment
vllm-0.5.0
vllm-flash-attn-2.5.9
transformers-4.42.3
torch-2.3.0
xformers-0.0.26.post1
flash-attn-2.5.6
cuda 11.6
🐛 Describe the bug
2024-07-02 14:26:25,987 INFO worker.py:1771 -- Started a local Ray instance.
INFO 07-02 14:26:26 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='/mnt/data/user/luca_model/klara/models/unified_ai_platform_sft_white/v20240702122118/train-model', speculative_config=None, tokenizer='/mnt/data/user/luca_model/klara/models/unified_ai_platform_sft_white/v20240702122118/train-model', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=42, served_model_name=/mnt/data/user/luca_model/klara/models/unified_ai_platform_sft_white/v20240702122118/train-model)
/home/jeeves/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/jeeves/.local/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from
torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source?warn(
�[36m(RayWorkerWrapper pid=1231)�[0m /home/jeeves/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/jeeves/.local/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from
torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source?�[36m(RayWorkerWrapper pid=1231)�[0m warn(
NCCL version 2.20.5+cuda12.4
INFO 07-02 14:26:32 utils.py:623] Found nccl from library libnccl.so.2
INFO 07-02 14:26:32 pynccl.py:65] vLLM is using nccl==2.20.5
�[36m(RayWorkerWrapper pid=1231)�[0m INFO 07-02 14:26:32 utils.py:623] Found nccl from library libnccl.so.2
�[36m(RayWorkerWrapper pid=1231)�[0m INFO 07-02 14:26:32 pynccl.py:65] vLLM is using nccl==2.20.5
INFO 07-02 14:26:32 custom_all_reduce_utils.py:169] generating GPU P2P access cache in /home/jeeves/.config/vllm/gpu_p2p_access_cache_for_0,1.json
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The
Hdfs
is deprecated, useUnionStore
instead.warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The
Hdfs
is deprecated, useUnionStore
instead.warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The
Hdfs
is deprecated, useUnionStore
instead.warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The
Hdfs
is deprecated, useUnionStore
instead.warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The
Hdfs
is deprecated, useUnionStore
instead.warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The
Hdfs
is deprecated, useUnionStore
instead.warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The
Hdfs
is deprecated, useUnionStore
instead.warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The
Hdfs
is deprecated, useUnionStore
instead.warnings.warn(
INFO 07-02 14:26:47 custom_all_reduce_utils.py:179] reading GPU P2P access cache from /home/jeeves/.config/vllm/gpu_p2p_access_cache_for_0,1.json
�[36m(RayWorkerWrapper pid=1231)�[0m INFO 07-02 14:26:47 custom_all_reduce_utils.py:179] reading GPU P2P access cache from /home/jeeves/.config/vllm/gpu_p2p_access_cache_for_0,1.json
ERROR 07-02 14:27:30 worker_base.py:148] Error executing method load_model. This might cause deadlock in distributed execution.
ERROR 07-02 14:27:30 worker_base.py:148] Traceback (most recent call last):
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 140, in execute_method
ERROR 07-02 14:27:30 worker_base.py:148] return executor(*args, **kwargs)
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 121, in load_model
ERROR 07-02 14:27:30 worker_base.py:148] self.model_runner.load_model()
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 147, in load_model
ERROR 07-02 14:27:30 worker_base.py:148] self.model = get_model(
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 21, in get_model
ERROR 07-02 14:27:30 worker_base.py:148] return loader.load_model(model_config=model_config,
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 249, in load_model
ERROR 07-02 14:27:30 worker_base.py:148] model.load_weights(
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/models/minicpm.py", line 536, in load_weights
ERROR 07-02 14:27:30 worker_base.py:148] param = params_dict[name]
ERROR 07-02 14:27:30 worker_base.py:148] KeyError: 'lm_head.weight'
[rank0]: Traceback (most recent call last):
[rank0]: File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]: return _run_code(code, main_globals, None,
[rank0]: File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]: exec(code, run_globals)
[rank0]: File "/local/apps/large_model_predict/open_llm_infer/llm_infer_vllm.py", line 224, in
[rank0]: llm_infer(sft_args)
[rank0]: File "/local/apps/large_model_predict/open_llm_infer/llm_infer_vllm.py", line 113, in llm_infer
[rank0]: llm = LLM(model=args.ckpt_dir, trust_remote_code=True, seed=42, tensor_parallel_size=torch.cuda.device_count())
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 144, in init
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 359, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 222, in init
[rank0]: self.model_executor = executor_class(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 25, in init
[rank0]: super().init(*args, **kwargs)
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in init
[rank0]: self._init_executor()
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 40, in _init_executor
[rank0]: self._init_workers_ray(placement_group)
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 172, in _init_workers_ray
[rank0]: self._run_workers("load_model",
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 246, in _run_workers
[rank0]: driver_worker_output = self.driver_worker.execute_method(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 149, in execute_method
[rank0]: raise e
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 140, in execute_method
[rank0]: return executor(*args, **kwargs)
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 121, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 147, in load_model
[rank0]: self.model = get_model(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 21, in get_model
[rank0]: return loader.load_model(model_config=model_config,
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 249, in load_model
[rank0]: model.load_weights(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/models/minicpm.py", line 536, in load_weights
[rank0]: param = params_dict[name]
[rank0]: KeyError: 'lm_head.weight'
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] Error executing method load_model. This might cause deadlock in distributed execution.
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] Traceback (most recent call last):
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 140, in execute_method
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] return executor(*args, **kwargs)
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 121, in load_model
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] self.model_runner.load_model()
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 147, in load_model
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] self.model = get_model(
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 21, in get_model
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] return loader.load_model(model_config=model_config,
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 249, in load_model
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] model.load_weights(
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/models/minicpm.py", line 536, in load_weights
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] param = params_dict[name]
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] KeyError: 'lm_head.weight'
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Inferring failed, please check log
The text was updated successfully, but these errors were encountered: