[Quantization] Channel wise output activation quantization for QKV Attention layers #270

horheynm · 2025-03-07T04:35:03Z

SUMMARY:
Initialize the Parameter for output activation quantization for QKV. O/Up/down is not quantized.

TEST PLAN:

Pass tests
Check shapes manually

… attn_quant

dsikka and others added 4 commits February 10, 2025 23:34

update

f618e84

update

b9b2846

Merge branch 'main' of github.com:neuralmagic/compressed-tensors into…

4f2b62e

… attn_quant

channel wise fp8 attn

6f64b38

horheynm enabled auto-merge (squash) March 7, 2025 04:35

horheynm mentioned this pull request Mar 7, 2025

[Quantization] Channel-wise Output Activation Quantization for Attention QKV Modules + KV-cache channel quantization vllm-project/llm-compressor#1233

Draft

horheynm changed the title ~~Attn quant~~ [Quantization] Channel wise quantization for output activation on QKV Mar 7, 2025

horheynm changed the title ~~[Quantization] Channel wise quantization for output activation on QKV~~ [Quantization] Channel wise quantization for output activation on QKV Attention layers Mar 7, 2025

remove unnec comments

6fb81ba

horheynm changed the title ~~[Quantization] Channel wise quantization for output activation on QKV Attention layers~~ [Quantization] Channel wise output activation quantization for QKV Attention layers Mar 7, 2025

dsikka marked this pull request as draft March 7, 2025 14:27

auto-merge was automatically disabled March 7, 2025 14:27
Pull request was converted to draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization] Channel wise output activation quantization for QKV Attention layers #270

[Quantization] Channel wise output activation quantization for QKV Attention layers #270

horheynm commented Mar 7, 2025

[Quantization] Channel wise output activation quantization for QKV Attention layers #270

Are you sure you want to change the base?

[Quantization] Channel wise output activation quantization for QKV Attention layers #270

Conversation

horheynm commented Mar 7, 2025