-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: TypeError in benchmark_serving.py when using --model parameter
bug
Something isn't working
#6069
opened Jul 2, 2024 by
Arthur-g-p
[Usage]: how to initiate the gemma2-27b with a 4-bit quantization?
usage
How to use vllm
#6068
opened Jul 2, 2024 by
maxin9966
[New Model]: Lora for Qwen/Qwen2-57B-A14B
new model
Requests to new models
#6067
opened Jul 2, 2024 by
H-Simpson123
[Bug]: benchmark_serving.py cannot calculate Median TTFT correctly
bug
Something isn't working
#6064
opened Jul 2, 2024 by
Sekri0
[Installation]: how to disable NCCL support on Jetson cevices
installation
Installation problems
#6063
opened Jul 2, 2024 by
thunder95
[Bug]: ValidationError using langchain_community.llms.VLLM
bug
Something isn't working
#6062
opened Jul 2, 2024 by
santurini
[Bug]: Garbled Tokens appears in vllm generation result every time change to new LLM model (Qwen)
bug
Something isn't working
#6060
opened Jul 2, 2024 by
Jason-csc
[Bug][CI/Build]: Missing attribute 'nvmlDeviceGetHandleByIndex' in AMD tests
bug
Something isn't working
rocm
#6059
opened Jul 2, 2024 by
DarkLight1337
[Bug]: load minicpm model, then get KeyError: 'lm_head.weight'
bug
Something isn't working
#6058
opened Jul 2, 2024 by
uRENu
[Usage]: How to use beam search when request OpenAI Completions API
usage
How to use vllm
#6057
opened Jul 2, 2024 by
nguyenhoanganh2002
[Bug]: debugging guide for device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp"
bug
Something isn't working
#6056
opened Jul 2, 2024 by
youkaichao
[Usage]: How to use --pipeline-parallel-size
usage
How to use vllm
#6054
opened Jul 2, 2024 by
XiaoYu2022
[Usage]: is there a way to turn off fast attention? a parameter maybe? my model deployment takes 30min to complete
usage
How to use vllm
#6053
opened Jul 2, 2024 by
bzr1
[Bug]: call for stack trace for "Watchdog caught collective operation timeout"
bug
Something isn't working
#6042
opened Jul 1, 2024 by
youkaichao
[Bug]: Speculative decoding does not respect per-request seed
bug
Something isn't working
#6038
opened Jul 1, 2024 by
tdoublep
[Bug]: identical branches in csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu
bug
Something isn't working
#6030
opened Jul 1, 2024 by
stevegrubb
[Bug]: When I inference with a 1b model, tp2 latency is greater than tp1
bug
Something isn't working
#6027
opened Jul 1, 2024 by
sitabulaixizawaluduo
[Bug]: Producer process has been terminated before all shared CUDA tensors released (v 0.5.0 post1, v 0.4.3)
bug
Something isn't working
#6025
opened Jul 1, 2024 by
yaronr
[Bug]: vllm offline调用和online调用,同一个prompt输出结果有差异(There are differences in the output results of the same prompt between vllm offline and online calls)
bug
Something isn't working
#6021
opened Jul 1, 2024 by
ArlanCooper
[New Model]: facebook/seamless-m4t-v2-large
new model
Requests to new models
#6017
opened Jul 1, 2024 by
frittentheke
[Usage]: Load local model from local path
usage
How to use vllm
#6012
opened Jul 1, 2024 by
xiaoyu-work
[Bug]: Segmentation fault (core dumped) while loading deepseek coder v2 lite model
bug
Something isn't working
#6011
opened Jul 1, 2024 by
zxdvd
Previous Next
ProTip!
Follow long discussions with comments:>50.