-
-
Notifications
You must be signed in to change notification settings - Fork 7.6k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Misc] refactor: simplify EngineCoreClient.make_async_mp_client in AsyncLLM
v1
#18817
opened May 28, 2025 by
googs1025
Loading…
Fix tpu model runner testcase failure
tpu
Related to Google TPUs
v1
#18810
opened May 28, 2025 by
CAROLZXYZXY
•
Draft
[Bug] fix the structure of decoder_prompt
frontend
#18809
opened May 28, 2025 by
sangbumlikeagod
Loading…
Respect passed in device overrides in engine args
#18808
opened May 28, 2025 by
Adolfo-Karim
Loading…
[Frontend] add run batch to CLI
documentation
Improvements or additions to documentation
frontend
#18804
opened May 28, 2025 by
reidliu41
Loading…
[Deprecation] Disallow pos-args other than ONLY add when PR is ready to merge/full CI is needed
model
when initializing LLM
frontend
ready
#18802
opened May 28, 2025 by
DarkLight1337
Loading…
[Deprecation] Remove Improvements or additions to documentation
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
structured-output
v1
prompt_token_ids
arg fallback in LLM.generate
and LLM.embed
documentation
#18800
opened May 28, 2025 by
DarkLight1337
Loading…
[Deprecation] Remove ONLY add when PR is ready to merge/full CI is needed
inputs
arg fallback in Engine classes
ready
#18799
opened May 28, 2025 by
DarkLight1337
Loading…
[Model] Add support for normalized Transformer (nGPT) from NVIDIA
documentation
Improvements or additions to documentation
#18798
opened May 28, 2025 by
shan18
Loading…
[Bugfix] handle
attn_metadata=None
in calculate_kv_scales
branch of attn forward
#18788
opened May 28, 2025 by
llllvvuu
Loading…
decrement server_load on listen for disconnect
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
#18784
opened May 28, 2025 by
daniel-salib
Loading…
[Docs] Add developer doc about CI failures
documentation
Improvements or additions to documentation
#18782
opened May 27, 2025 by
russellb
Loading…
[V1] Support DP with Ray
frontend
needs-rebase
v1
#18779
opened May 27, 2025 by
ruisearch42
Loading…
Export NaNs in logits to scheduler_stats if output is corrupted
tpu
Related to Google TPUs
v1
#18777
opened May 27, 2025 by
vladmihailescu
Loading…
[V1] Allocate kv_cache with stride order for V1
v1
#18775
opened May 27, 2025 by
NickLucche
Loading…
[Feature] A calibration-free RTN-based quantization for accurate and accelerated INT4/INT8 inference
#18768
opened May 27, 2025 by
sakogan
Loading…
[WIP] Add a metric to track request failures
documentation
Improvements or additions to documentation
frontend
[Platform][Dist] Make torch distributed process group extendable
ready
ONLY add when PR is ready to merge/full CI is needed
#18763
opened May 27, 2025 by
MengqingCao
Loading…
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.