enable softcap and gemma2 #288

hliuca · 2024-11-20T21:57:43Z

Gemma2 model needs softcap feature from flash attention.

gshtras

The linter warnings are due to attn_func always being called with the softcap parameter, but not all implementations support it.
Per the other comments please see to only using this parameter on the models that require it

gshtras · 2024-12-02T22:03:09Z

vllm/attention/backends/rocm_flash_attn.py

@@ -218,12 +218,6 @@ def decode_metadata(self) -> Optional["ROCmFlashAttentionMetadata"]:
            max_encoder_seq_len=self.max_encoder_seq_len,
            cross_slot_mapping=self.cross_slot_mapping,
            cross_block_tables=self.cross_block_tables)
-        # Batch may be composed of prefill|decodes, adjust query start indices


Why is this section being removed?

sorry... this is accident... it has been restored.

gshtras · 2024-12-02T22:03:56Z

vllm/attention/backends/rocm_flash_attn.py

+
+        if logits_soft_cap is None:
+            # In flash-attn, setting logits_soft_cap as 0 means no soft cap.
+            logits_soft_cap = 0


This effectively enables logits_soft_cap for any model, unconditionally.

In flash attention, it is 0 by default, I think. https://github.com/ROCm/flash-attention/blob/main/flash_attn/flash_attn_interface.py#L1334

gshtras · 2024-12-02T22:04:30Z

vllm/attention/backends/rocm_flash_attn.py

@@ -511,6 +515,11 @@ def __init__(
                    self.use_naive_attn = True

            if self.use_naive_attn:
+                if logits_soft_cap is not None:


You make sure it's not None in the constructor. So naive flash attention can never be used now

fixed. thanks.

enable softcap for gemma2

84d1387

hliuca requested a review from gshtras November 20, 2024 21:57

hliuca added 4 commits November 20, 2024 14:02

fix lint

1348c23

restore fa

e5cf3da

Merge branch 'develop' into softcap_fix

f0ce486

fix layernorm_kernels conflict

b6a8200

hliuca changed the title ~~enable softcap for gemma2~~ enable softcap and gemma2 Dec 2, 2024

gshtras requested changes Dec 2, 2024

View reviewed changes

hliuca added 2 commits December 2, 2024 14:14

restore accidental deletion

9242621

fix logits_soft_cap constructor

8cdb96f

hliuca requested a review from gshtras December 3, 2024 22:51

hliuca added 2 commits December 3, 2024 15:36

use 0.0 instead of 0

566ebdb

Merge branch 'develop' into softcap_fix

0a77cd7

gshtras approved these changes Dec 3, 2024

View reviewed changes

gshtras merged commit 18ef0a0 into develop Dec 4, 2024
7 of 8 checks passed

gshtras deleted the softcap_fix branch December 4, 2024 01:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable softcap and gemma2 #288

enable softcap and gemma2 #288

hliuca commented Nov 20, 2024

gshtras left a comment

gshtras Dec 2, 2024

hliuca Dec 2, 2024

gshtras Dec 2, 2024

hliuca Dec 2, 2024

gshtras Dec 2, 2024

hliuca Dec 2, 2024

enable softcap and gemma2 #288

enable softcap and gemma2 #288

Conversation

hliuca commented Nov 20, 2024

gshtras left a comment

Choose a reason for hiding this comment

gshtras Dec 2, 2024

Choose a reason for hiding this comment

hliuca Dec 2, 2024

Choose a reason for hiding this comment

gshtras Dec 2, 2024

Choose a reason for hiding this comment

hliuca Dec 2, 2024

Choose a reason for hiding this comment

gshtras Dec 2, 2024

Choose a reason for hiding this comment

hliuca Dec 2, 2024

Choose a reason for hiding this comment