Update ROCm vLLM to 0.4.3 #40

mawong-amd · 2024-06-06T21:34:09Z

Bring in upstream vLLM changes as of May 31 into ROCm/vllm

Updates ROCm vLLM to 0.4.3

…ect#4402)

…ct#4418)

Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]>

…project#3922) Co-authored-by: alexm <[email protected]> Co-authored-by: mgoin <[email protected]>

…project#4444)

…ect#4165) Co-authored-by: Lei Wen <[email protected]>

Signed-off-by: Prashant Gupta <[email protected]> Co-authored-by: Roger Wang <[email protected]>

Co-authored-by: Philipp Moritz <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Cody Yu <[email protected]>

…int (vllm-project#3467) Co-authored-by: Lily Liu <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

…ersion (vllm-project#4467)

…vllm-project#4494) Co-authored-by: Simon Mo <[email protected]>

… obtain the CUDA version. (vllm-project#4173) Signed-off-by: AnyISalIn <[email protected]>

…ect#4503)

…#5112) Co-authored-by: Alexey Kondratiev <[email protected]> Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]> Co-authored-by: Alexei V. Ivanov <[email protected]> Co-authored-by: omkarkakarparthi <okakarpa>

Co-authored-by: Breno Faria <[email protected]>

…er.py (vllm-project#5129)

Co-authored-by: Roger Wang <[email protected]>

…ject#5120)

…red_metadata modifier (introduced with PTX 8.5) (vllm-project#5136)

Co-authored-by: Zhuohan Li <[email protected]>

…e ::ordered_metadata modifier (introduced with PTX 8.5)" (vllm-project#5149)

Co-authored-by: xuhao <[email protected]>

…roject#5039)

…ect#5171)

…ernels (vllm-project#5168)

…e time (vllm-project#5034)

Updated the base docker to ROCm 6.1.1 Updated the RCCL pin to a new one with performance improvements

…thub.com/vllm-project/vllm into main_upstream_candidate_531

Partially reverts [Core][Distributed] use cpu group to broadcast metadata in cpu (vllm-project#4444)

initial commit for v0.4.0 with paged attn optimization update the integration code updates to custom attention kenrel update unit test case for custom update conditions to pick paged attn v2 vs custom update env condition enable more parameters in custom unit testing update conditions for custom vs v2 update gqa ratio condition for using custom kernel updated docs, cleanup and enabled it by default fixes imports for custom paged attn update the custom paged attn with latest data update conditions of max-context-len

Fix bias handling with tgemm Don't use custom matvec kernel for bf16

fp8 computation Using convert_fp8 kernel delete convert.cu clean up clean up remove extra kernels remove int8 -> fp8 convert fix naming fix typo clean up add compilation guard add convert_fp8 in cache_ops clean up adding missing quant config back fix the convert_fp8 issue convert_fp8 fix fix

Restore use of FA Triton as default update base docker image remove apply_custom Use inp_view for out = F.linear() in TunedGemm (#36) * use inp_view for out = F.linear() * add missing control path fix

…ate_531_fp8

njhill and others added 30 commits April 27, 2024 11:17

[BugFix] Fix return type of executor execute_model methods (vllm-proj…

ba4be44

…ect#4402)

[BugFix] Resolved Issues For LinearMethod --> QuantConfig (vllm-proje…

4ea1f96

…ct#4418)

[Misc] fix typo in llm_engine init logging (vllm-project#4428)

9c7306a

Add more Prometheus metrics (vllm-project#2764)

bf480c5

Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: Robert Shaw <[email protected]>

[CI] clean docker cache for neuron (vllm-project#4441)

03dd7d5

[mypy][5/N] Support all typing on model executor (vllm-project#4427)

df29793

[Kernel] Marlin Expansion: Support AutoGPTQ Models with Marlin (vllm-…

73c8d67

…project#3922) Co-authored-by: alexm <[email protected]> Co-authored-by: mgoin <[email protected]>

[CI] hotfix: soft fail neuron test (vllm-project#4458)

ac5ccf0

[Core][Distributed] use cpu group to broadcast metadata in cpu (vllm-…

f4f921b

…project#4444)

[Misc] Upgrade to torch==2.3.0 (vllm-project#4454)

d627a3d

[Bugfix][Kernel] Fix compute_type for MoE kernel (vllm-project#4463)

fa32207

[Core]Refactor gptq_marlin ops (vllm-project#4466)

26f2fb5

[BugFix] fix num_lookahead_slots missing in async executor (vllm-proj…

4bb53e2

…ect#4165) Co-authored-by: Lei Wen <[email protected]>

[Doc] add visualization for multi-stage dockerfile (vllm-project#4456)

b31a1fb

Signed-off-by: Prashant Gupta <[email protected]> Co-authored-by: Roger Wang <[email protected]>

[Frontend] Support complex message content for chat completions endpo…

a494140

…int (vllm-project#3467) Co-authored-by: Lily Liu <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

[Frontend] [Core] Tensorizer: support dynamic num_readers, update v…

715c2d8

…ersion (vllm-project#4467)

[Bugfix][Minor] Make ignore_eos effective (vllm-project#4468)

dd1a50a

fix_tokenizer_snapshot_download_bug (vllm-project#4493)

6ad58f4

Unable to find Punica extension issue during source code installation (…

ee37328

…vllm-project#4494) Co-authored-by: Simon Mo <[email protected]>

[Core] Centralize GPU Worker construction (vllm-project#4419)

2e240c6

[Misc][Typo] type annotation fix (vllm-project#4495)

f458112

[Misc] fix typo in block manager (vllm-project#4453)

a822eb3

Allow user to define whitespace pattern for outlines (vllm-project#4305)

c3845d8

[Misc]Add customized information for models (vllm-project#4132)

d6f4bd7

[Test] Add ignore_eos test (vllm-project#4519)

6f1df80

[Bugfix] Fix the fp8 kv_cache check error that occurs when failing to…

a88bb9b

… obtain the CUDA version. (vllm-project#4173) Signed-off-by: AnyISalIn <[email protected]>

[Bugfix] Fix 307 Redirect for /metrics (vllm-project#4523)

4dc8026

[Doc] update(example model): for OpenAI compatible serving (vllm-proj…

e491c7e

…ect#4503)

[Bugfix] Use random seed if seed is -1 (vllm-project#4531)

6990912

okakarpa and others added 27 commits May 30, 2024 03:27

[BUGFIX] [FRONTEND] Correct chat logprobs (vllm-project#5029)

87d41c8

Co-authored-by: Breno Faria <[email protected]>

[Bugfix] Automatically Detect SparseML models (vllm-project#5119)

d910816

[CI/Build] increase wheel size limit to 200 MB (vllm-project#5130)

f758505

[Misc] remove duplicate definition of seq_lens_tensor in model_runn…

d79d9ea

…er.py (vllm-project#5129)

[Doc] Use intersphinx and update entrypoints docs (vllm-project#5125)

a9bcc7a

add doc about serving option on dstack (vllm-project#3074)

429d897

Co-authored-by: Roger Wang <[email protected]>

Bump version to v0.4.3 (vllm-project#5046)

87a658c

[Build] Disable sm_90a in cu11 (vllm-project#5141)

45a1a69

[Bugfix] Avoid Warnings in SparseML Activation Quantization (vllm-pro…

b35be54

…ject#5120)

[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::orde…

6d21fa1

…red_metadata modifier (introduced with PTX 8.5) (vllm-project#5136)

Fix cutlass sm_90a vesrion in CMakeList

533c217

[Model] Support MAP-NEO model (vllm-project#5081)

a22dea5

Co-authored-by: Zhuohan Li <[email protected]>

Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using th…

e9d3aa0

…e ::ordered_metadata modifier (introduced with PTX 8.5)" (vllm-project#5149)

[Misc]: optimize eager mode host time (vllm-project#4196)

a377f0b

Co-authored-by: xuhao <[email protected]>

[Model] Enable FP8 QKV in MoE and refine kernel tuning script (vllm-p…

e9899fb

…roject#5039)

[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (vllm-proj…

6575791

…ect#5171)

[Build] Guard against older CUDA versions when building CUTLASS 3.x k…

1197e02

…ernels (vllm-project#5168)

[CI/Build] CMakeLists: build all extensions' cmake targets at the sam…

a360ff8

…e time (vllm-project#5034)

Update Dockerfile.rocm

4019807

Updated the base docker to ROCm 6.1.1 Updated the RCCL pin to a new one with performance improvements

Merge commit 'a360ff80bb34f9dfcd21cf880c2030daa2d6b3a3' of https://gi…

1f7c555

…thub.com/vllm-project/vllm into main_upstream_candidate_531

Use world group to broadcast metadata on ROCm

324cc8b

Partially reverts [Core][Distributed] use cpu group to broadcast metadata in cpu (vllm-project#4444)

Update linear.py

c893d70

Fix bias handling with tgemm Don't use custom matvec kernel for bf16

Fixes from main:

86bbfef

Restore use of FA Triton as default update base docker image remove apply_custom Use inp_view for out = F.linear() in TunedGemm (#36) * use inp_view for out = F.linear() * add missing control path fix

Merge branch 'main' of github.com:ROCm/vllm into main_upstream_candid…

9e4e680

…ate_531_fp8

mawong-amd changed the title ~~Main upstream candidate 531 fp8~~ Update ROCm vLLM to 0.4.3 Jun 6, 2024

mawong-amd merged commit e4af60b into main Jun 6, 2024
0 of 13 checks passed

gshtras deleted the main_upstream_candidate_531_fp8 branch August 20, 2024 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ROCm vLLM to 0.4.3 #40

Update ROCm vLLM to 0.4.3 #40

mawong-amd commented Jun 6, 2024

Update ROCm vLLM to 0.4.3 #40

Update ROCm vLLM to 0.4.3 #40

Conversation

mawong-amd commented Jun 6, 2024