-
-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Feature]: Add reasoning token usage
feature request
New feature or request
#14472
opened Mar 8, 2025 by
Superskyyy
1 task done
[Bug]: pythonic tool parser only accepts alphabetical tool names
bug
Something isn't working
#14470
opened Mar 8, 2025 by
bjj
1 task done
[Bug]: Mistral-Small-24B-Instruct-2501 on V1 fails to start with Mistral tokenizer since V1 enabled guided decoding
bug
Something isn't working
#14465
opened Mar 7, 2025 by
sjuxax
1 task done
[Bugfix]: Suggesting an update in the vllm source code to fix the error "Unable to assign 256 multimodal tokens to 0 placeholders"
bug
Something isn't working
#14463
opened Mar 7, 2025 by
ameyanjarlekar
1 task done
[Usage]: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet.
usage
How to use vllm
#14458
opened Mar 7, 2025 by
Quinn-Meyer-Sustainment
[Bug]: No Cuda GPUs are available when running vLLM on Ray (Qwen 2.5 VL AWQ)
bug
Something isn't working
#14456
opened Mar 7, 2025 by
Fmak95
1 task done
[Doc]: Steps to run vLLM on your RTX5080 or 5090!
documentation
Improvements or additions to documentation
#14452
opened Mar 7, 2025 by
pavanimajety
1 task done
[Performance]: LoRA is not taken into account when determining the number of KV cache blocks
performance
Performance-related issues
#14450
opened Mar 7, 2025 by
chenhongyu2048
1 task done
[Bug]: Something isn't working
vllm serve Qwen/QwQ-32B-AWQ --tensor-parallel-size 2
hangs with both RTX A6000 GPUs at max utilization
bug
#14449
opened Mar 7, 2025 by
ubergarm
1 task done
[Usage]: After starting the QwQ-32B model normally, it was found that the model could not output the thought tag normally
usage
How to use vllm
#14446
opened Mar 7, 2025 by
shatang123
1 task done
[Feature]: Run/Debug vllm in pycharm
feature request
New feature or request
#14444
opened Mar 7, 2025 by
maobaolong
1 task done
[Bug]: External Launcher producing NaN outputs on Large Models when Collocating with Model Training
bug
Something isn't working
#14443
opened Mar 7, 2025 by
fabianlim
1 task done
[Usage]: Question about Multimodal token ids on offloaded tokenization
usage
How to use vllm
#14441
opened Mar 7, 2025 by
miguelalba96
1 task done
[RFC]: Configurable multi-modal data for profiling
multi-modality
Related to multi-modality (#4194)
RFC
#14438
opened Mar 7, 2025 by
DarkLight1337
1 task done
[Usage]: VLLM Inference - 2x slower with LoRA rank=256 vs none.
usage
How to use vllm
#14435
opened Mar 7, 2025 by
rtx-8000
1 task done
[Bug]: Docker GPU image is unnecessarily fat due to two (mismatching) copies of CUDA runtime libraries
bug
Something isn't working
#14433
opened Mar 7, 2025 by
sisp
1 task done
[Usage]: Cuda out of memory while loading the quantized model
usage
How to use vllm
#14432
opened Mar 7, 2025 by
Bakht-Ullah
[Feature]: support tool and reasoning together
feature request
New feature or request
#14429
opened Mar 7, 2025 by
NiuBlibing
1 task done
[Usage]: LLM.beam_search is much slower in vLLM 0.7.3 compared to 0.5.4
usage
How to use vllm
#14426
opened Mar 7, 2025 by
upayuryeva
1 task done
[Feature]: Prefill. How to support 1M prompt tokens input?
feature request
New feature or request
#14425
opened Mar 7, 2025 by
whitezhang
1 task done
[Bug]: Unexpected content when selecting the choice tool
bug
Something isn't working
#14424
opened Mar 7, 2025 by
liuwwang
1 task done
[Bug]: size mismatch when loading MixtralForCausalLM GGUF model
bug
Something isn't working
#14423
opened Mar 7, 2025 by
K-e-t-i
1 task done
[Bug]: low quality of deepseek-vl2 when using vllm
documentation
Improvements or additions to documentation
#14421
opened Mar 7, 2025 by
chenyzh28
1 task done
[Usage]: How to use the image datasets sharegpt4v provided in benchmark_serving?
usage
How to use vllm
#14418
opened Mar 7, 2025 by
DK-DARKmatter
1 task done
[Bug]: terminate called after throwing an instance of 'std::system_error' what(): Operation not permitted
bug
Something isn't working
#14416
opened Mar 7, 2025 by
gl443
1 task done
Previous Next
ProTip!
Follow long discussions with comments:>50.