Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: load minicpm model, then get KeyError: 'lm_head.weight' #6058

Closed
uRENu opened this issue Jul 2, 2024 · 2 comments
Closed

[Bug]: load minicpm model, then get KeyError: 'lm_head.weight' #6058

uRENu opened this issue Jul 2, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@uRENu
Copy link

uRENu commented Jul 2, 2024

Your current environment

vllm-0.5.0
vllm-flash-attn-2.5.9
transformers-4.42.3
torch-2.3.0
xformers-0.0.26.post1
flash-attn-2.5.6

cuda 11.6

🐛 Describe the bug

2024-07-02 14:26:25,987 INFO worker.py:1771 -- Started a local Ray instance.
INFO 07-02 14:26:26 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='/mnt/data/user/luca_model/klara/models/unified_ai_platform_sft_white/v20240702122118/train-model', speculative_config=None, tokenizer='/mnt/data/user/luca_model/klara/models/unified_ai_platform_sft_white/v20240702122118/train-model', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=42, served_model_name=/mnt/data/user/luca_model/klara/models/unified_ai_platform_sft_white/v20240702122118/train-model)
/home/jeeves/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/jeeves/.local/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
�[36m(RayWorkerWrapper pid=1231)�[0m /home/jeeves/.local/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/home/jeeves/.local/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
�[36m(RayWorkerWrapper pid=1231)�[0m warn(
NCCL version 2.20.5+cuda12.4
INFO 07-02 14:26:32 utils.py:623] Found nccl from library libnccl.so.2
INFO 07-02 14:26:32 pynccl.py:65] vLLM is using nccl==2.20.5
�[36m(RayWorkerWrapper pid=1231)�[0m INFO 07-02 14:26:32 utils.py:623] Found nccl from library libnccl.so.2
�[36m(RayWorkerWrapper pid=1231)�[0m INFO 07-02 14:26:32 pynccl.py:65] vLLM is using nccl==2.20.5
INFO 07-02 14:26:32 custom_all_reduce_utils.py:169] generating GPU P2P access cache in /home/jeeves/.config/vllm/gpu_p2p_access_cache_for_0,1.json
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
/local/apps/large_model_predict/util/file_utils.py:24: FutureWarning: The Hdfs is deprecated, use UnionStore instead.
warnings.warn(
INFO 07-02 14:26:47 custom_all_reduce_utils.py:179] reading GPU P2P access cache from /home/jeeves/.config/vllm/gpu_p2p_access_cache_for_0,1.json
�[36m(RayWorkerWrapper pid=1231)�[0m INFO 07-02 14:26:47 custom_all_reduce_utils.py:179] reading GPU P2P access cache from /home/jeeves/.config/vllm/gpu_p2p_access_cache_for_0,1.json
ERROR 07-02 14:27:30 worker_base.py:148] Error executing method load_model. This might cause deadlock in distributed execution.
ERROR 07-02 14:27:30 worker_base.py:148] Traceback (most recent call last):
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 140, in execute_method
ERROR 07-02 14:27:30 worker_base.py:148] return executor(*args, **kwargs)
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 121, in load_model
ERROR 07-02 14:27:30 worker_base.py:148] self.model_runner.load_model()
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 147, in load_model
ERROR 07-02 14:27:30 worker_base.py:148] self.model = get_model(
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 21, in get_model
ERROR 07-02 14:27:30 worker_base.py:148] return loader.load_model(model_config=model_config,
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 249, in load_model
ERROR 07-02 14:27:30 worker_base.py:148] model.load_weights(
ERROR 07-02 14:27:30 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/models/minicpm.py", line 536, in load_weights
ERROR 07-02 14:27:30 worker_base.py:148] param = params_dict[name]
ERROR 07-02 14:27:30 worker_base.py:148] KeyError: 'lm_head.weight'
[rank0]: Traceback (most recent call last):
[rank0]: File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]: return _run_code(code, main_globals, None,
[rank0]: File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]: exec(code, run_globals)
[rank0]: File "/local/apps/large_model_predict/open_llm_infer/llm_infer_vllm.py", line 224, in
[rank0]: llm_infer(sft_args)
[rank0]: File "/local/apps/large_model_predict/open_llm_infer/llm_infer_vllm.py", line 113, in llm_infer
[rank0]: llm = LLM(model=args.ckpt_dir, trust_remote_code=True, seed=42, tensor_parallel_size=torch.cuda.device_count())
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 144, in init
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 359, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 222, in init
[rank0]: self.model_executor = executor_class(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 25, in init
[rank0]: super().init(*args, **kwargs)
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in init
[rank0]: self._init_executor()
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 40, in _init_executor
[rank0]: self._init_workers_ray(placement_group)
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 172, in _init_workers_ray
[rank0]: self._run_workers("load_model",
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 246, in _run_workers
[rank0]: driver_worker_output = self.driver_worker.execute_method(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 149, in execute_method
[rank0]: raise e
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 140, in execute_method
[rank0]: return executor(*args, **kwargs)
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 121, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 147, in load_model
[rank0]: self.model = get_model(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 21, in get_model
[rank0]: return loader.load_model(model_config=model_config,
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 249, in load_model
[rank0]: model.load_weights(
[rank0]: File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/models/minicpm.py", line 536, in load_weights
[rank0]: param = params_dict[name]
[rank0]: KeyError: 'lm_head.weight'
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] Error executing method load_model. This might cause deadlock in distributed execution.
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] Traceback (most recent call last):
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 140, in execute_method
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] return executor(*args, **kwargs)
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 121, in load_model
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] self.model_runner.load_model()
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 147, in load_model
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] self.model = get_model(
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 21, in get_model
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] return loader.load_model(model_config=model_config,
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 249, in load_model
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] model.load_weights(
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] File "/home/jeeves/.local/lib/python3.10/site-packages/vllm/model_executor/models/minicpm.py", line 536, in load_weights
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] param = params_dict[name]
�[36m(RayWorkerWrapper pid=1231)�[0m ERROR 07-02 14:27:31 worker_base.py:148] KeyError: 'lm_head.weight'
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Inferring failed, please check log

@uRENu uRENu added the bug Something isn't working label Jul 2, 2024
@tjohnson31415
Copy link
Contributor

@uRENu I think this issue is resolved with the fix in #6758 to ignore lm_head.weight when tie_word_embeddings is set to true.
Do you still see this error when using the latest vLLM version?

@uRENu
Copy link
Author

uRENu commented Aug 19, 2024

@uRENu I think this issue is resolved with the fix in #6758 to ignore lm_head.weight when tie_word_embeddings is set to true. Do you still see this error when using the latest vLLM version?

it's ok

@uRENu uRENu closed this as not planned Won't fix, can't repro, duplicate, stale Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants