Update ROCm vLLM to 0.4.3#40
Merged
mawong-amd merged 393 commits intomainfrom main_upstream_candidate_531_fp8Jun 6, 2024
+54,524-18,615
Commits
This pull request is big! We're only showing the most recent 250 commits
Commits on Apr 29, 2024
- authored
- authored
- authored
Commits on Apr 30, 2024
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 1, 2024
- authored
- authored
- authored
- authored
- authored
- authored
[Bugfix] Fix the fp8 kv_cache check error that occurs when failing to obtain the CUDA version. (vllm-project#4173)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 2, 2024
[MISC] Rework logger to enable pythonic custom logging configuration to be provided (vllm-project#4273)
authoredDanny Guinther[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (vllm-project#4451)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 3, 2024
- authored
- authored
- authored
- authored
- authored
[Bugfix] Allow "None" or "" to be passed to CLI for string args that default to None (vllm-project#4586)
authored- authored
- authored
- authored
Commits on May 4, 2024
- authored
- authored
[Kernel] Support MoE Fp8 Checkpoints for Mixtral (Static Weights with Dynamic/Static Activations) (vllm-project#4527)
authored- authored
- authored
Commits on May 6, 2024
- authored
- authored
- authored
- authored
Commits on May 7, 2024
- authored
- authored
- authored
- authored
- authored
Commits on May 8, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 9, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 10, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 11, 2024
Commits on May 12, 2024
Commits on May 13, 2024
- authored
- authored
- authored
- authored
- authored
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update
tensorizer
to version 2.9.0 (vllm-project#4208)authored- authored
- authored
- authored
Commits on May 14, 2024
- authored
- authored
[Core][Hash][Automatic Prefix caching] Accelerating the hashing function by avoiding deep copies (vllm-project#4696)
authored- authored
- authored
Commits on May 15, 2024
- authored
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (vllm-project#4681)
authored- authored
- authored
- authored
- authored
Commits on May 16, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 17, 2024
- authored
[Build/CI] Extending the set of AMD tests with Regression, Basic Correctness, Distributed, Engine, Llava Tests (vllm-project#4797)
authored- authored
- authored
- authored
- authored
Commits on May 19, 2024
Commits on May 20, 2024
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 21, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 22, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 23, 2024
- authored
- authored
- authored
- authored
Commits on May 24, 2024
Commits on May 25, 2024
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (vllm-project#4799)
- authored
- authored
- authored
Commits on May 28, 2024
- authored
- authored
- authored
Commits on May 29, 2024
- authored
- authored
- authored
- authored
- authored
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (vllm-project#4837)
authored- authored
- authored
- authored
- authored
- authored
Commits on May 30, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on May 31, 2024
- authored
[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5) (vllm-project#5136)
authored- committed
Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5)" (vllm-project#5149)
authored- authored
Commits on Jun 1, 2024
- authored
- authored
- authored
Commits on Jun 4, 2024
Commits on Jun 5, 2024
Commits on Jun 6, 2024
- committed
- committed