[1/N] pass the complete config from engine to executor #9933

youkaichao · 2024-11-01T19:53:42Z

No description provided.

Signed-off-by: youkaichao <[email protected]>

github-actions · 2024-11-01T19:53:53Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-11-01T19:54:35Z

the path is:

Engine --> Executor --> Worker --> ModelRunner --> Model

all of them will get the complete engine config object

this way, if we have a config field that is only needed by one component, we don't need to change the function signature for all these classes.

russellb

This looks like a nice step in refactoring. I had one suggestion about leaving a TODO note in the code.

vllm/engine/llm_engine.py

vllm/v1/engine/llm_engine.py

robertgshaw2-neuralmagic · 2024-11-01T20:02:12Z

+1 this is a nice change

youkaichao · 2024-11-01T20:05:08Z

+1 this is a nice change

thanks for your appreciation! the goal is:

if we have a config field that is only needed by one component, we don't need to change the function signature for all these classes.

(context: I'm going to add a compilation config for torch.compile , and it is only needed by the final model.)

youkaichao · 2024-11-01T20:06:26Z

each component can still save their local copy of configs like self.model_config = vllm_config.model_config so that they don't need to always start with self.vllm_config . but the point is that all of them will have self.vllm_config and pass it as a whole to others.

Signed-off-by: youkaichao <[email protected]>

comaniac · 2024-11-03T22:41:25Z

After this PR, the following command for v1 hangs during startup:

VLLM_USE_V1=1 vllm serve neuralmagic/Meta-Llama-3-8B-Instruct-FP8 --disable-log-requests

Log:

(base) ray@ip-10-0-54-152:~/default/vllm$ VLLM_USE_V1=1 vllm serve neuralmagic/Meta-Llama-3-8B-Instruct-FP8 --disable-log-requests
/home/ray/anaconda3/lib/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
No module named 'vllm._version'
  from vllm.version import __version__ as VLLM_VERSION
INFO 11-03 22:37:01 api_server.py:551] vLLM API server version dev
INFO 11-03 22:37:01 api_server.py:552] args: Namespace(subparser='serve', model_tag='neuralmagic/Meta-Llama-3-8B-Instruct-FP8', config='', host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='neuralmagic/Meta-Llama-3-8B-Instruct-FP8', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', chat_template_text_format='string', trust_remote_code=False, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, override_neuron_config=None, scheduling_policy='fcfs', pooling_type=None, pooling_norm=None, pooling_softmax=None, pooling_step_tag_id=None, pooling_returned_token_ids=None, disable_log_requests=True, max_log_len=None, disable_fastapi_docs=False, dispatch_function=<function serve at 0x7ef3e97a8540>)
INFO 11-03 22:37:01 api_server.py:166] Multiprocessing frontend to use ipc:///tmp/6bd30e7c-ebe7-47a7-9a63-b75add142a3b for IPC Path.
INFO 11-03 22:37:01 api_server.py:181] Started engine process with PID 14128
/home/ray/anaconda3/lib/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
No module named 'vllm._version'
  from vllm.version import __version__ as VLLM_VERSION
INFO 11-03 22:37:06 config.py:323] This model supports multiple tasks: {'embedding', 'generate'}. Defaulting to 'generate'.
WARNING 11-03 22:37:06 arg_utils.py:1103] [DEPRECATED] Block manager v1 has been removed, and setting --use-v2-block-manager to True or False has no effect on vLLM behavior. Please remove --use-v2-block-manager in your engine argument. If your use case is not supported by SelfAttnBlockSpaceManager (i.e. block manager v2), please file an issue with detailed information.
INFO 11-03 22:37:10 config.py:323] This model supports multiple tasks: {'generate', 'embedding'}. Defaulting to 'generate'.
WARNING 11-03 22:37:10 arg_utils.py:1103] [DEPRECATED] Block manager v1 has been removed, and setting --use-v2-block-manager to True or False has no effect on vLLM behavior. Please remove --use-v2-block-manager in your engine argument. If your use case is not supported by SelfAttnBlockSpaceManager (i.e. block manager v2), please file an issue with detailed information.
INFO 11-03 22:37:10 llm_engine.py:69] Initializing an LLM engine (vdev) with config: model='neuralmagic/Meta-Llama-3-8B-Instruct-FP8', speculative_config=None, tokenizer='neuralmagic/Meta-Llama-3-8B-Instruct-FP8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=neuralmagic/Meta-Llama-3-8B-Instruct-FP8, num_scheduler_steps=1, enable_prefix_caching=False, use_async_output_proc=True, mm_processor_kwargs=None)
/home/ray/anaconda3/lib/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
No module named 'vllm._version'
  from vllm.version import __version__ as VLLM_VERSION
# Hanging for minutes

…9933) Signed-off-by: youkaichao <[email protected]>

…9933) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Linkun Chen <[email protected]>

…9933) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Richard Liu <[email protected]>

…9933) Signed-off-by: youkaichao <[email protected]>

Signed-off-by: youkaichao <[email protected]>

…9933) Signed-off-by: youkaichao <[email protected]>

…9933) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Loc Huynh <[email protected]>

…9933) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

…9933) Signed-off-by: youkaichao <[email protected]>

…9933) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>

…9933) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>

…9933) Signed-off-by: youkaichao <[email protected]>

basic

5eae55b

Signed-off-by: youkaichao <[email protected]>

youkaichao changed the title ~~[draft] pass the config as a whole~~ [1/N] pass the complete config from engine to executor Nov 1, 2024

youkaichao marked this pull request as ready for review November 1, 2024 19:56

youkaichao requested review from WoosukKwon, zhuohan123, alexm-neuralmagic, comaniac and njhill as code owners November 1, 2024 19:56

russellb reviewed Nov 1, 2024

View reviewed changes

vllm/engine/llm_engine.py Show resolved Hide resolved

vllm/v1/engine/llm_engine.py Show resolved Hide resolved

add todo

6c1773a

Signed-off-by: youkaichao <[email protected]>

robertgshaw2-neuralmagic approved these changes Nov 1, 2024

View reviewed changes

robertgshaw2-neuralmagic enabled auto-merge (squash) November 1, 2024 20:33

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 1, 2024

fix xpu executor

099eb0e

Signed-off-by: youkaichao <[email protected]>

youkaichao disabled auto-merge November 1, 2024 20:46

youkaichao merged commit 18bd758 into vllm-project:main Nov 1, 2024
22 of 26 checks passed

youkaichao deleted the single_config branch November 1, 2024 20:51

youkaichao mentioned this pull request Nov 1, 2024

[2/N] executor pass the complete config to worker/modelrunner #9938

Merged

DarkLight1337 pushed a commit that referenced this pull request Nov 2, 2024

[1/N] pass the complete config from engine to executor (#9933)

1fe960c

Signed-off-by: youkaichao <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024

[1/N] pass the complete config from engine to executor (vllm-project#…

b68839f

…9933) Signed-off-by: youkaichao <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024

[1/N] pass the complete config from engine to executor (vllm-project#…

ceb399d

…9933) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Linkun Chen <[email protected]>

richardsliu pushed a commit to richardsliu/vllm that referenced this pull request Nov 4, 2024

[1/N] pass the complete config from engine to executor (vllm-project#…

522cc4a

…9933) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Richard Liu <[email protected]>

bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Nov 5, 2024

[1/N] pass the complete config from engine to executor (vllm-project#…

3deb260

…9933) Signed-off-by: youkaichao <[email protected]>

DarkLight1337 pushed a commit that referenced this pull request Nov 5, 2024

[1/N] pass the complete config from engine to executor (#9933)

cd6ae2c

Signed-off-by: youkaichao <[email protected]>

DarkLight1337 pushed a commit that referenced this pull request Nov 5, 2024

[1/N] pass the complete config from engine to executor (#9933)

9010114

Signed-off-by: youkaichao <[email protected]>

hissu-hyvarinen pushed a commit to ROCm/vllm that referenced this pull request Nov 6, 2024

[1/N] pass the complete config from engine to executor (vllm-project#…

baef964

…9933) Signed-off-by: youkaichao <[email protected]>

JC1DA pushed a commit to JC1DA/vllm that referenced this pull request Nov 11, 2024

[1/N] pass the complete config from engine to executor (vllm-project#…

67ce9df

…9933) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Loc Huynh <[email protected]>

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024

[1/N] pass the complete config from engine to executor (vllm-project#…

67504a2

…9933) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[1/N] pass the complete config from engine to executor (vllm-project#…

3e32da0

…9933) Signed-off-by: youkaichao <[email protected]>

mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 20, 2024

[1/N] pass the complete config from engine to executor (vllm-project#…

0ad84cd

…9933) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>

tlrmchlsmth pushed a commit to neuralmagic/vllm that referenced this pull request Nov 23, 2024

[1/N] pass the complete config from engine to executor (vllm-project#…

0466eed

…9933) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[1/N] pass the complete config from engine to executor (vllm-project#…

8a203df

…9933) Signed-off-by: youkaichao <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/N] pass the complete config from engine to executor #9933

[1/N] pass the complete config from engine to executor #9933

youkaichao commented Nov 1, 2024

github-actions bot commented Nov 1, 2024

youkaichao commented Nov 1, 2024 •

edited

Loading

russellb left a comment •

edited

Loading

robertgshaw2-neuralmagic commented Nov 1, 2024

youkaichao commented Nov 1, 2024

youkaichao commented Nov 1, 2024

comaniac commented Nov 3, 2024 •

edited

Loading

[1/N] pass the complete config from engine to executor #9933

[1/N] pass the complete config from engine to executor #9933

Conversation

youkaichao commented Nov 1, 2024

github-actions bot commented Nov 1, 2024

youkaichao commented Nov 1, 2024 • edited Loading

russellb left a comment • edited Loading

Choose a reason for hiding this comment

robertgshaw2-neuralmagic commented Nov 1, 2024

youkaichao commented Nov 1, 2024

youkaichao commented Nov 1, 2024

comaniac commented Nov 3, 2024 • edited Loading

youkaichao commented Nov 1, 2024 •

edited

Loading

russellb left a comment •

edited

Loading

comaniac commented Nov 3, 2024 •

edited

Loading