Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Issue with Pixtral Model: Unsupported Vision Configuration in vLLM ( AMD amd 7900 xtx) #9069

Closed
1 task done
matrix1233 opened this issue Oct 4, 2024 · 1 comment · Fixed by #9036
Closed
1 task done
Labels
bug Something isn't working

Comments

@matrix1233
Copy link

Your current environment

Issue with Pixtral Model: Unsupported Vision Configuration in vLLM (AMD Radeon 7900 XTX)

I am trying to load the Pixtral model from Hugging Face (specifically, mistral-community/pixtral-12b) using vllm serve, but I am encountering an error related to the vision components of the model. The issue arises because vLLM doesn't seem to support models that include a vision tower, such as those that require both text and image processing.

NotImplementedError: Unsupported vision config: <class 'transformers.models.pixtral.configuration_pixtral.PixtralVisionConfig'>

From the error, it looks like vLLM cannot handle the PixtralVisionConfig, which is required to initialize the model's vision tower. The model includes both language and vision components, and it seems vLLM is not yet capable of working with such multimodal architectures.

On Ubuntu
Steps to Reproduce:

1- DOCKER_BUILDKIT=1 docker build --build-arg BUILD_FA="0" -f Dockerfile.rocm -t vllm-rocm . and connected directly on the pod with docker run -it
--network=host
--group-add=video
--ipc=host
--cap-add=SYS_PTRACE
--security-opt seccomp=unconfined
--device /dev/kfd
--device /dev/dri
-v <path/to/model>:/app/model
vllm-rocm
bash.

2- execute: vllm serve "mistral-community/pixtral-12b" to test but i have this problem

All Log :

root@gpt:/vllm-workspace# vllm serve "mistral-community/pixtral-12b"

WARNING 10-03 21:42:39 rocm.py:13] fork method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to spawn instead.
INFO 10-03 21:42:42 api_server.py:526] vLLM API server version 0.6.3.dev62+g22f5851b.d20241003
INFO 10-03 21:42:42 api_server.py:527] args: Namespace(model_tag='mistral-community/pixtral-12b', config='', host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_auto_tool_choice=False, tool_call_parser=None, model='mistral-community/pixtral-12b', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=False, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, override_neuron_config=None, scheduling_policy='fcfs', disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, dispatch_function=<function serve at 0x78dc87884430>)
INFO 10-03 21:42:42 api_server.py:164] Multiprocessing frontend to use ipc:///tmp/01061816-0b9e-4ad9-980d-f4eae3168498 for IPC Path.
INFO 10-03 21:42:42 api_server.py:177] Started engine process with PID 50
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 997/997 [00:00<00:00, 92.1kB/s]
preprocessor_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 483/483 [00:00<00:00, 128kB/s]
INFO 10-03 21:42:43 config.py:1659] Downcasting torch.float32 to torch.float16.
INFO 10-03 21:42:43 config.py:928] Disabled the custom all-reduce kernel because it is not supported on AMD GPUs.
WARNING 10-03 21:42:43 arg_utils.py:951] The model has a long context length (1024000). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 177k/177k [00:00<00:00, 1.07MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.26M/9.26M [00:00<00:00, 22.3MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 1.42MB/s]
INFO 10-03 21:42:46 config.py:1659] Downcasting torch.float32 to torch.float16.
INFO 10-03 21:42:46 config.py:928] Disabled the custom all-reduce kernel because it is not supported on AMD GPUs.
WARNING 10-03 21:42:46 arg_utils.py:951] The model has a long context length (1024000). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
INFO 10-03 21:42:46 llm_engine.py:237] Initializing an LLM engine (v0.6.3.dev62+g22f5851b.d20241003) with config: model='mistral-community/pixtral-12b', speculative_config=None, tokenizer='mistral-community/pixtral-12b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1024000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=mistral-community/pixtral-12b, use_v2_block_manager=False, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=False, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=True, mm_processor_kwargs=None)
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 119kB/s]
INFO 10-03 21:42:46 selector.py:121] Using ROCmFlashAttention backend.
INFO 10-03 21:42:46 model_runner.py:1022] Starting to load model mistral-community/pixtral-12b...
Process SpawnProcess-1:
Traceback (most recent call last):
File "/opt/conda/envs/py_3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/envs/py_3.9/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/vllm-workspace/vllm/engine/multiprocessing/engine.py", line 388, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
File "/vllm-workspace/vllm/engine/multiprocessing/engine.py", line 138, in from_engine_args
return cls(
File "/vllm-workspace/vllm/engine/multiprocessing/engine.py", line 78, in init
self.engine = LLMEngine(*args,
File "/vllm-workspace/vllm/engine/llm_engine.py", line 338, in init
self.model_executor = executor_class(
File "/vllm-workspace/vllm/executor/executor_base.py", line 47, in init
self._init_executor()
File "/vllm-workspace/vllm/executor/gpu_executor.py", line 40, in _init_executor
self.driver_worker.load_model()
File "/vllm-workspace/vllm/worker/worker.py", line 183, in load_model
self.model_runner.load_model()
File "/vllm-workspace/vllm/worker/model_runner.py", line 1024, in load_model
self.model = get_model(model_config=self.model_config,
File "/vllm-workspace/vllm/model_executor/model_loader/init.py", line 19, in get_model
return loader.load_model(model_config=model_config,
File "/vllm-workspace/vllm/model_executor/model_loader/loader.py", line 399, in load_model
model = _initialize_model(model_config, self.load_config,
File "/vllm-workspace/vllm/model_executor/model_loader/loader.py", line 176, in _initialize_model
return build_model(
File "/vllm-workspace/vllm/model_executor/model_loader/loader.py", line 161, in build_model
return model_class(config=hf_config,
File "/vllm-workspace/vllm/model_executor/models/llava.py", line 214, in init
self.vision_tower = _init_vision_tower(config)
File "/vllm-workspace/vllm/model_executor/models/llava.py", line 194, in _init_vision_tower
raise NotImplementedError(msg)
NotImplementedError: Unsupported vision config: <class 'transformers.models.pixtral.configuration_pixtral.PixtralVisionConfig'>
[rank0]:[W1003 21:42:47.095511690 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())

Traceback (most recent call last):
File "/opt/conda/envs/py_3.9/bin/vllm", line 33, in
sys.exit(load_entry_point('vllm', 'console_scripts', 'vllm')())
File "/vllm-workspace/vllm/scripts.py", line 191, in main
args.dispatch_function(args)
File "/vllm-workspace/vllm/scripts.py", line 40, in serve
uvloop.run(run_server(args))
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/uvloop/init.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/uvloop/init.py", line 61, in wrapper
return await main
File "/vllm-workspace/vllm/entrypoints/openai/api_server.py", line 538, in run_server
async with build_async_engine_client(args) as engine_client:
File "/opt/conda/envs/py_3.9/lib/python3.9/contextlib.py", line 181, in aenter
return await self.gen.anext()
File "/vllm-workspace/vllm/entrypoints/openai/api_server.py", line 105, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/opt/conda/envs/py_3.9/lib/python3.9/contextlib.py", line 181, in aenter
return await self.gen.anext()
File "/vllm-workspace/vllm/entrypoints/openai/api_server.py", line 192, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start

Model Input Dumps

No response

🐛 Describe the bug

Issue with Pixtral Model: Unsupported Vision Configuration in vLLM (AMD Radeon 7900 XTX)

I am trying to load the Pixtral model from Hugging Face (specifically, mistral-community/pixtral-12b) using vllm serve, but I am encountering an error related to the vision components of the model. The issue arises because vLLM doesn't seem to support models that include a vision tower, such as those that require both text and image processing.

NotImplementedError: Unsupported vision config: <class 'transformers.models.pixtral.configuration_pixtral.PixtralVisionConfig'>

From the error, it looks like vLLM cannot handle the PixtralVisionConfig, which is required to initialize the model's vision tower. The model includes both language and vision components, and it seems vLLM is not yet capable of working with such multimodal architectures.

On Ubuntu
Steps to Reproduce:

1- DOCKER_BUILDKIT=1 docker build --build-arg BUILD_FA="0" -f Dockerfile.rocm -t vllm-rocm . and connected directly on the pod with docker run -it
--network=host
--group-add=video
--ipc=host
--cap-add=SYS_PTRACE
--security-opt seccomp=unconfined
--device /dev/kfd
--device /dev/dri
-v <path/to/model>:/app/model
vllm-rocm
bash.

2- execute: vllm serve "mistral-community/pixtral-12b" to test but i have this problem

All Log :

root@gpt:/vllm-workspace# vllm serve "mistral-community/pixtral-12b"

WARNING 10-03 21:42:39 rocm.py:13] fork method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to spawn instead.
INFO 10-03 21:42:42 api_server.py:526] vLLM API server version 0.6.3.dev62+g22f5851b.d20241003
INFO 10-03 21:42:42 api_server.py:527] args: Namespace(model_tag='mistral-community/pixtral-12b', config='', host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_auto_tool_choice=False, tool_call_parser=None, model='mistral-community/pixtral-12b', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=False, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, override_neuron_config=None, scheduling_policy='fcfs', disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, dispatch_function=<function serve at 0x78dc87884430>)
INFO 10-03 21:42:42 api_server.py:164] Multiprocessing frontend to use ipc:///tmp/01061816-0b9e-4ad9-980d-f4eae3168498 for IPC Path.
INFO 10-03 21:42:42 api_server.py:177] Started engine process with PID 50
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 997/997 [00:00<00:00, 92.1kB/s]
preprocessor_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 483/483 [00:00<00:00, 128kB/s]
INFO 10-03 21:42:43 config.py:1659] Downcasting torch.float32 to torch.float16.
INFO 10-03 21:42:43 config.py:928] Disabled the custom all-reduce kernel because it is not supported on AMD GPUs.
WARNING 10-03 21:42:43 arg_utils.py:951] The model has a long context length (1024000). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 177k/177k [00:00<00:00, 1.07MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.26M/9.26M [00:00<00:00, 22.3MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 414/414 [00:00<00:00, 1.42MB/s]
INFO 10-03 21:42:46 config.py:1659] Downcasting torch.float32 to torch.float16.
INFO 10-03 21:42:46 config.py:928] Disabled the custom all-reduce kernel because it is not supported on AMD GPUs.
WARNING 10-03 21:42:46 arg_utils.py:951] The model has a long context length (1024000). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
INFO 10-03 21:42:46 llm_engine.py:237] Initializing an LLM engine (v0.6.3.dev62+g22f5851b.d20241003) with config: model='mistral-community/pixtral-12b', speculative_config=None, tokenizer='mistral-community/pixtral-12b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1024000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=mistral-community/pixtral-12b, use_v2_block_manager=False, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=False, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=True, mm_processor_kwargs=None)
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 119kB/s]
INFO 10-03 21:42:46 selector.py:121] Using ROCmFlashAttention backend.
INFO 10-03 21:42:46 model_runner.py:1022] Starting to load model mistral-community/pixtral-12b...
Process SpawnProcess-1:
Traceback (most recent call last):
File "/opt/conda/envs/py_3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/envs/py_3.9/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/vllm-workspace/vllm/engine/multiprocessing/engine.py", line 388, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
File "/vllm-workspace/vllm/engine/multiprocessing/engine.py", line 138, in from_engine_args
return cls(
File "/vllm-workspace/vllm/engine/multiprocessing/engine.py", line 78, in init
self.engine = LLMEngine(*args,
File "/vllm-workspace/vllm/engine/llm_engine.py", line 338, in init
self.model_executor = executor_class(
File "/vllm-workspace/vllm/executor/executor_base.py", line 47, in init
self._init_executor()
File "/vllm-workspace/vllm/executor/gpu_executor.py", line 40, in _init_executor
self.driver_worker.load_model()
File "/vllm-workspace/vllm/worker/worker.py", line 183, in load_model
self.model_runner.load_model()
File "/vllm-workspace/vllm/worker/model_runner.py", line 1024, in load_model
self.model = get_model(model_config=self.model_config,
File "/vllm-workspace/vllm/model_executor/model_loader/init.py", line 19, in get_model
return loader.load_model(model_config=model_config,
File "/vllm-workspace/vllm/model_executor/model_loader/loader.py", line 399, in load_model
model = _initialize_model(model_config, self.load_config,
File "/vllm-workspace/vllm/model_executor/model_loader/loader.py", line 176, in _initialize_model
return build_model(
File "/vllm-workspace/vllm/model_executor/model_loader/loader.py", line 161, in build_model
return model_class(config=hf_config,
File "/vllm-workspace/vllm/model_executor/models/llava.py", line 214, in init
self.vision_tower = _init_vision_tower(config)
File "/vllm-workspace/vllm/model_executor/models/llava.py", line 194, in _init_vision_tower
raise NotImplementedError(msg)
NotImplementedError: Unsupported vision config: <class 'transformers.models.pixtral.configuration_pixtral.PixtralVisionConfig'>
[rank0]:[W1003 21:42:47.095511690 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())

Traceback (most recent call last):
File "/opt/conda/envs/py_3.9/bin/vllm", line 33, in
sys.exit(load_entry_point('vllm', 'console_scripts', 'vllm')())
File "/vllm-workspace/vllm/scripts.py", line 191, in main
args.dispatch_function(args)
File "/vllm-workspace/vllm/scripts.py", line 40, in serve
uvloop.run(run_server(args))
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/uvloop/init.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/uvloop/init.py", line 61, in wrapper
return await main
File "/vllm-workspace/vllm/entrypoints/openai/api_server.py", line 538, in run_server
async with build_async_engine_client(args) as engine_client:
File "/opt/conda/envs/py_3.9/lib/python3.9/contextlib.py", line 181, in aenter
return await self.gen.anext()
File "/vllm-workspace/vllm/entrypoints/openai/api_server.py", line 105, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/opt/conda/envs/py_3.9/lib/python3.9/contextlib.py", line 181, in aenter
return await self.gen.anext()
File "/vllm-workspace/vllm/entrypoints/openai/api_server.py", line 192, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@matrix1233 matrix1233 added the bug Something isn't working label Oct 4, 2024
@DarkLight1337
Copy link
Member

The HuggingFace version of Pixtral isn't supported yet. It is currently WIP, see #9036

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants