Update MoE examples #192

mgoin · 2024-09-20T18:05:46Z

Update the FP8 Mixtral example to not use GPTQ and add it to examples list

github-actions · 2024-09-20T18:05:58Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

dsikka · 2024-09-23T13:57:29Z

slightly unrelated but once your fp8 moe PR in vllm lands, we should add an e2e test case: https://github.com/vllm-project/llm-compressor/blob/main/tests/e2e/vLLM/test_vllm.py

* Add tests for examples * Ignore examples tests by default * Trailing comma * Add test for "quantizing_moe_fp8" example * Update "quantizing_moe" example tests * Add test for "compressed_inference" example folder * Remove unused import * Add new dependency * Different approach for "flash_attn" * Add comment about flash_attn requirement * Test additional quantizing_moe example script * Add decorator to skip based on available VRAM * Limit GPU usage in 'cpu_offloading' example * Add optional pytest-xdist parallelization * Reduce persistent /tmp usage * Fix parametrization in big_models * Add pytest mark for GPU count requirement * Add 'multi_gpu' pytest marker * Skip 'deepseek_moe_w4a16.py' by default * Fix skip mark * Mark 'ex_trl_distillation.py' as multi_gpu * Abstract command copy/run to helper functions * Update for MoE examples PR #192 * Reduce test marker/decorator redundancy * style fixes * Rip out unused run parallelization * Exclude 'deepseek_moe_w8a8_fp8' from multi-GPU * Use variable for repeated string literal * Use `requires_gpu_count` over `requires_gpu`

Co-authored-by: dhuangnm <[email protected]>

Update MoE examples

297fc44

mgoin added 6 commits September 20, 2024 18:06

Add top-level link

a189f37

Fix deepseek_moe_w8a8_int8.py

33a510c

Add deepseek_moe_w8a8_fp8.py

5a3465e

Quality

f3487cf

Quality

cd5f870

Merge branch 'main' into moe-fp8-update

0ea46a5

mgoin added the ready When a PR is ready for review label Sep 20, 2024

dsikka approved these changes Sep 23, 2024

View reviewed changes

mgoin merged commit 2e0035f into main Sep 23, 2024
6 of 7 checks passed

mgoin deleted the moe-fp8-update branch September 23, 2024 14:37

dbarbuzzi added a commit to dbarbuzzi/llm-compressor that referenced this pull request Sep 23, 2024

Update for MoE examples PR vllm-project#192

5d27928

markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024

bump up to 0.7.1 for patch release (vllm-project#192)

506cd36

Co-authored-by: dhuangnm <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update MoE examples #192

Update MoE examples #192

mgoin commented Sep 20, 2024

github-actions bot commented Sep 20, 2024

dsikka commented Sep 23, 2024 •

edited

Loading

Update MoE examples #192

Update MoE examples #192

Conversation

mgoin commented Sep 20, 2024

github-actions bot commented Sep 20, 2024

dsikka commented Sep 23, 2024 • edited Loading

dsikka commented Sep 23, 2024 •

edited

Loading