Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ROCm vLLM to 0.4.3 #40

Merged
merged 393 commits into from
Jun 6, 2024
Merged

Update ROCm vLLM to 0.4.3 #40

merged 393 commits into from
Jun 6, 2024

Conversation

mawong-amd
Copy link

Bring in upstream vLLM changes as of May 31 into ROCm/vllm

Updates ROCm vLLM to 0.4.3

njhill and others added 30 commits April 27, 2024 11:17
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: Philipp Moritz <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Cody Yu <[email protected]>
okakarpa and others added 27 commits May 30, 2024 03:27
…#5112)

Co-authored-by: Alexey Kondratiev <[email protected]>
Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]>
Co-authored-by: Alexei V. Ivanov <[email protected]>
Co-authored-by: omkarkakarparthi <okakarpa>
…e ::ordered_metadata modifier (introduced with PTX 8.5)" (vllm-project#5149)
Updated the base docker to ROCm 6.1.1
Updated the RCCL pin to a new one with performance improvements
Partially reverts [Core][Distributed] use cpu group to broadcast
metadata in cpu (vllm-project#4444)
initial commit for v0.4.0 with paged attn optimization

update the integration code

updates to custom attention kenrel

update unit test case for custom

update conditions to pick paged attn v2 vs custom

update env condition

enable more parameters in custom unit testing

update conditions for custom vs v2

update gqa ratio condition for using custom kernel

updated docs, cleanup and enabled it by default

fixes imports for custom paged attn

update the custom paged attn with latest data

update conditions of max-context-len
Fix bias handling with tgemm

Don't use custom matvec kernel for bf16
fp8 computation

Using convert_fp8 kernel

delete convert.cu

clean up

clean up

remove extra kernels

remove int8 -> fp8 convert

fix naming

fix typo

clean up

add compilation guard

add convert_fp8 in cache_ops

clean up

adding missing quant config back

fix the convert_fp8 issue

convert_fp8 fix

fix
Restore use of FA Triton as default

update base docker image

remove apply_custom

Use inp_view for out = F.linear() in TunedGemm (#36)

* use inp_view for out = F.linear()

* add missing control path

fix
@mawong-amd mawong-amd changed the title Main upstream candidate 531 fp8 Update ROCm vLLM to 0.4.3 Jun 6, 2024
@mawong-amd mawong-amd merged commit e4af60b into main Jun 6, 2024
0 of 13 checks passed
@gshtras gshtras deleted the main_upstream_candidate_531_fp8 branch August 20, 2024 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.