-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[WIP][Kernel] Unify the kernel used in flash attention backend
#6052
opened Jul 2, 2024 by
LiuXiaoxuanPKU
•
Draft
[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer
#6051
opened Jul 1, 2024 by
LiuXiaoxuanPKU
Loading…
[ROCm][AMD][Model]Adding alibi slopes support in ROCm triton flash attention and naive flash attention
#6043
opened Jul 1, 2024 by
gshtras
Loading…
[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation
#6036
opened Jul 1, 2024 by
LeiWang1999
•
Draft
2 tasks
[wip][Core] Introduce SPMD worker execution using Ray accelerated DAG
#6032
opened Jul 1, 2024 by
ruisearch42
•
Draft
Support for quantized kv cache (compressed-tensors)
#6028
opened Jul 1, 2024 by
dbogunowicz
Loading…
[Distributed][Hardware][Intel CPU][Intel GPU]fix tp issues on xpu and cpu
x86 CPU
#6013
opened Jul 1, 2024 by
jikunshang
Loading…
[ci][distributed] add distributed test gptq_marlin with tp = 2
#6010
opened Jul 1, 2024 by
llmpros
Loading…
[Hardware][Intel CPU] Adding intel openmp tunings in Docker file
x86 CPU
#6008
opened Jul 1, 2024 by
zhouyuan
Loading…
[Feature][Hardware][AMD] Enable Scaled FP8 GEMM on ROCm
rocm
#6006
opened Jun 30, 2024 by
HaiShaw
Loading…
[Doc] Update description of vLLM support for CPUs
x86 CPU
#6003
opened Jun 30, 2024 by
DamonFool
Loading…
[wip][Core] Introduce SPMD worker execution using Ray accelerated DAG
#5980
opened Jun 29, 2024 by
stephanie-wang
•
Draft
3 tasks
[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin
#5975
opened Jun 28, 2024 by
mgoin
Loading…
[ Misc ] Refactor MoE to isolate Fp8 From Mixtral
#5970
opened Jun 28, 2024 by
robertgshaw2-neuralmagic
Loading…
[misc][optimization] optimize data structure in allocator
#5968
opened Jun 28, 2024 by
youkaichao
Loading…
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.