vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 6.1k
Star 40.7k

Code
Issues 1.4k
Pull requests 492
Discussions
Actions
Projects 5
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q1 2025

#11862 opened Jan 8, 2025 by simon-mo

Open 6

[V1] Feedback Thread

#12568 opened Jan 30, 2025 by simon-mo

Open 50

Labels 42 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1,413 Open 5,567 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Feature]: Add reasoning token usage feature request

New feature or request

#14472 opened Mar 8, 2025 by Superskyyy

1 task done

[Bug]: pythonic tool parser only accepts alphabetical tool names bug

Something isn't working

#14470 opened Mar 8, 2025 by bjj

1 task done

[Bug]: Mistral-Small-24B-Instruct-2501 on V1 fails to start with Mistral tokenizer since V1 enabled guided decoding bug

Something isn't working

#14465 opened Mar 7, 2025 by sjuxax

1 task done

[Bugfix]: Suggesting an update in the vllm source code to fix the error "Unable to assign 256 multimodal tokens to 0 placeholders" bug

Something isn't working

#14463 opened Mar 7, 2025 by ameyanjarlekar

1 task done

[Usage]: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet. usage

How to use vllm

#14458 opened Mar 7, 2025 by Quinn-Meyer-Sustainment

[Bug]: No Cuda GPUs are available when running vLLM on Ray (Qwen 2.5 VL AWQ) bug

Something isn't working

#14456 opened Mar 7, 2025 by Fmak95

1 task done

[Doc]: Steps to run vLLM on your RTX5080 or 5090! documentation

Improvements or additions to documentation

#14452 opened Mar 7, 2025 by pavanimajety

1 task done

[Performance]: LoRA is not taken into account when determining the number of KV cache blocks performance

Performance-related issues

#14450 opened Mar 7, 2025 by chenhongyu2048

1 task done

[Bug]: vllm serve Qwen/QwQ-32B-AWQ --tensor-parallel-size 2 hangs with both RTX A6000 GPUs at max utilization bug

Something isn't working

#14449 opened Mar 7, 2025 by ubergarm

1 task done

[Usage]: After starting the QwQ-32B model normally, it was found that the model could not output the thought tag normally usage

How to use vllm

#14446 opened Mar 7, 2025 by shatang123

1 task done

[Feature]: Run/Debug vllm in pycharm feature request

New feature or request

#14444 opened Mar 7, 2025 by maobaolong

1 task done

[Bug]: External Launcher producing NaN outputs on Large Models when Collocating with Model Training bug

Something isn't working

#14443 opened Mar 7, 2025 by fabianlim

1 task done

[Usage]: Question about Multimodal token ids on offloaded tokenization usage

How to use vllm

#14441 opened Mar 7, 2025 by miguelalba96

1 task done

[RFC]: Configurable multi-modal data for profiling multi-modality

Related to multi-modality (#4194)

RFC

#14438 opened Mar 7, 2025 by DarkLight1337

1 task done

[Usage]: VLLM Inference - 2x slower with LoRA rank=256 vs none. usage

How to use vllm

#14435 opened Mar 7, 2025 by rtx-8000

1 task done

[Bug]: Docker GPU image is unnecessarily fat due to two (mismatching) copies of CUDA runtime libraries bug

Something isn't working

#14433 opened Mar 7, 2025 by sisp

1 task done

[Usage]: Cuda out of memory while loading the quantized model usage

How to use vllm

#14432 opened Mar 7, 2025 by Bakht-Ullah

[Feature]: support tool and reasoning together feature request

New feature or request

#14429 opened Mar 7, 2025 by NiuBlibing

1 task done

[Usage]: LLM.beam_search is much slower in vLLM 0.7.3 compared to 0.5.4 usage

How to use vllm

#14426 opened Mar 7, 2025 by upayuryeva

1 task done

[Feature]: Prefill. How to support 1M prompt tokens input? feature request

New feature or request

#14425 opened Mar 7, 2025 by whitezhang

1 task done

[Bug]: Unexpected content when selecting the choice tool bug

Something isn't working

#14424 opened Mar 7, 2025 by liuwwang

1 task done

[Bug]: size mismatch when loading MixtralForCausalLM GGUF model bug

Something isn't working

#14423 opened Mar 7, 2025 by K-e-t-i

1 task done

[Bug]: low quality of deepseek-vl2 when using vllm documentation

Improvements or additions to documentation

#14421 opened Mar 7, 2025 by chenyzh28

1 task done

[Usage]: How to use the image datasets sharegpt4v provided in benchmark_serving? usage

How to use vllm

#14418 opened Mar 7, 2025 by DK-DARKmatter

1 task done

[Bug]: terminate called after throwing an instance of 'std::system_error' what(): Operation not permitted bug

Something isn't working

#14416 opened Mar 7, 2025 by gl443

1 task done

Previous 1 2 3 4 5 … 56 57 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly