Upgrade the latest vLLM version 09/18#4
Merged
Jeffwan merged 247 commits intoaibrix:mainfrom vllm-project:mainSep 19, 2024
+39,289-10,397
Commits
Commits on Aug 26, 2024
- authored
- authored
- authored
- authored
Commits on Aug 27, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 28, 2024
- authored
- authored
[Bugfix] Allow ScalarType to be compiled with pytorch 2.3 and add checks for registering FakeScalarType and dynamo support. (#7886)
authored- authored
- authored
- authored
- authored
- authored
- authored
[Kernel] [Triton] [AMD] Adding Triton implementations awq_dequantize and awq_gemm to support AWQ (#7386)
authored- authored
- authored
- authored
- authored
- authored
Commits on Aug 29, 2024
- authored
- authored
- authored
- authored
- authored
[Core][Kernels] Enable FP8 KV Cache with Flashinfer backend. + BugFix for kv_cache_dtype=auto (#7985)
- authored
- authored
- authored
- authored
Commits on Aug 30, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 31, 2024
- authored
- authored
- authored
- authored
- authored
Commits on Sep 1, 2024
Commits on Sep 2, 2024
- authored
- authored
- authored
- authored
- authored
Commits on Sep 3, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Sep 4, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Sep 5, 2024
- authored
- authored
- authored
- authored
- authored
- authored
[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM (#7962)
authored
Commits on Sep 6, 2024
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Sep 7, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Sep 8, 2024
Commits on Sep 9, 2024
Commits on Sep 10, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[MISC] Keep chunked prefill enabled by default with long context when prefix caching is enabled (#8342)
authored- authored
- authored
- authored
Commits on Sep 11, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Sep 12, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Hotfix][Core][VLM] Disable chunked prefill by default and prefix caching for multimodal models (#8425)
authored- authored
- authored
- authored
Commits on Sep 13, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Sep 14, 2024
- authored
- authored
- authored
- authored
Commits on Sep 16, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Sep 17, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Sep 18, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored