Upstream merge 24/09/16 #187

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Minor post merge fixes

shajrawi

Great work!

kylesayrs and others added 30 commits September 9, 2024 16:27

[Misc] GPTQ Activation Ordering (vllm-project#8135)

c7cb5c3

[Misc] Fused MoE Marlin support for GPTQ (vllm-project#8217)

6cd5e5b

Add NVIDIA Meetup slides, announce AMD meetup, and add contact info (v…

a1d8742

…llm-project#8319)

[Bugfix] Fix missing post_layernorm in CLIP (vllm-project#8155)

da1a844

[CI/Build] enable ccache/scccache for HIP builds (vllm-project#8327)

6234385

[Frontend] Clean up type annotations for mistral tokenizer (vllm-proj…

8c054b7

…ect#8314)

[CI/Build] Enabling kernels tests for AMD, ignoring some of then that…

f421f3c

… fail (vllm-project#8130)

Fix ppc64le buildkite job (vllm-project#8309)

02751a7

[Spec Decode] Move ops.advance_step to flash attn advance_step (vllm-…

5faedf1

…project#8224)

[Misc] remove peft as dependency for prompt models (vllm-project#8162)

04e7c4e

[MISC] Keep chunked prefill enabled by default with long context when…

b1f3e18

… prefix caching is enabled (vllm-project#8342)

[Bugfix] lookahead block table with cuda graph max capture (vllm-proj…

22f3a4b

…ect#8340) [Bugfix] Ensure multistep lookahead allocation is compatible with cuda graph max capture (vllm-project#8340)

[Core/Bugfix] pass VLLM_ATTENTION_BACKEND to ray workers (vllm-projec…

1d5e397

…t#8172)

[CI/Build][Kernel] Update CUTLASS to 3.5.1 tag (vllm-project#8043)

94144e7

[Misc] Skip loading extra bias for Qwen2-MOE GPTQ models (vllm-projec…

e497b8a

…t#8329)

[Bugfix] Fix InternVL2 vision embeddings process with pipeline parall…

1230263

…el (vllm-project#8299)

[Hardware][NV] Add support for ModelOpt static scaling checkpoints. (v…

efcf946

…llm-project#6112)

[model] Support for Llava-Next-Video model (vllm-project#7559)

6a512a0

Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

[Frontend] Create ErrorResponse instead of raising exceptions in run_…

cea95df

…batch (vllm-project#8347)

[Model][VLM] Add Qwen2-VL model support (vllm-project#7905)

3b7fea7

Co-authored-by: Roger Wang <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>

[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (vll…

0b952af

…m-project#7257)

[CI/Build] Excluding test_moe.py from AMD Kernels tests for investiga…

aea02f3

…tion (vllm-project#8373)

[Bugfix] Add missing attributes in mistral tokenizer (vllm-project#8364)

7015417

[Kernel][Misc] register ops to prevent graph breaks (vllm-project#6917)

73202db

Co-authored-by: Sage Moore <[email protected]>

[Misc] Move device options to a single place (vllm-project#8322)

8baa454

[Speculative Decoding] Test refactor (vllm-project#8317)

775f00f

Co-authored-by: youkaichao <[email protected]>

Pixtral (vllm-project#8377)

d394787

Co-authored-by: Roger Wang <[email protected]>

Bump version to v0.6.1 (vllm-project#8379)

3fd2b0d

[MISC] Dump model runner inputs when crashing (vllm-project#8305)

a65cb16

[misc] remove engine_use_ray (vllm-project#8126)

f842a7a

jeejeelee and others added 21 commits September 13, 2024 07:58

[Misc] Skip loading extra bias for Qwen2-VL GPTQ-Int8 (vllm-project#8442

06311e2

)

[misc][ci] fix quant test (vllm-project#8449)

a246912

[Installation] Gate FastAPI version for Python 3.8 (vllm-project#8456)

ecd7a1d

[plugin][torch.compile] allow to add custom compile backend (vllm-pro…

0a4806f

…ject#8445)

[CI/Build] Reorganize models tests (vllm-project#7820)

a84e598

[Doc] Add oneDNN installation to CPU backend documentation (vllm-proj…

f57092c

…ect#8467)

[HotFix] Fix final output truncation with stop string + streaming (vl…

18e9e1f

…lm-project#8468)

bump version to v0.6.1.post2 (vllm-project#8473)

9ba0817

[bugfix] add multi-step advance_step to ROCmFlashAttentionMetadata

daddc14

add rocm to MULTI_STEP_ATTENTION_BACKENDS

306f21f

[Hardware][intel GPU] bump up ipex version to 2.3 (vllm-project#8365)

8517252

Co-authored-by: Yan Ma <[email protected]>

[Kernel][Hardware][Amd]Custom paged attention kernel for rocm (vllm-p…

1ef0d2e

…roject#8310)

[Model] support minicpm3 (vllm-project#8297)

8a0cf1d

Co-authored-by: DarkLight1337 <[email protected]>

[torch.compile] fix functionalization (vllm-project#8480)

a36e070

[torch.compile] add a flag to disable custom op (vllm-project#8488)

47790f3

[TPU] Implement multi-step scheduling (vllm-project#8489)

50e9ec4

[Bugfix][Model] Fix Python 3.8 compatibility in Pixtral model by upda…

3724d5f

…ting type annotations (vllm-project#8490)

[Bugfix][Kernel] Add IQ1_M quantization implementation to GGUF kern…

fc990f9

…el (vllm-project#8357)

Merge remote-tracking branch 'upstream/main'

0f397c3

New llm_engine output format

b0a39a4

Merge remote-tracking branch 'st/ms-rocm-advance-step' into upstream_…

30a9875

…merge_24_09_16

gshtras force-pushed the upstream_merge_24_09_16 branch 3 times, most recently from 533f64b to 6ed41b8 Compare September 16, 2024 21:23

Fix tests - disable marlin_fiest_moe; fix rocm_paged attention

c27753d

Minor post merge fixes

gshtras force-pushed the upstream_merge_24_09_16 branch from 6ed41b8 to c27753d Compare September 16, 2024 22:25

shajrawi approved these changes Sep 16, 2024

View reviewed changes

gshtras merged commit ad9026c into main Sep 16, 2024
16 checks passed

gshtras deleted the upstream_merge_24_09_16 branch September 16, 2024 22:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upstream merge 24/09/16 #187

Upstream merge 24/09/16 #187

gshtras commented Sep 16, 2024

github-actions bot commented Sep 16, 2024

shajrawi left a comment

Upstream merge 24/09/16 #187

Upstream merge 24/09/16 #187

Conversation

gshtras commented Sep 16, 2024

github-actions bot commented Sep 16, 2024

shajrawi left a comment

Choose a reason for hiding this comment