Skip to content

Commit 719fe5a

Browse files
Clarify enable_gqa support in fx_importer.py
Removed TODO note for grouped query attention support in the docstring and comments.
1 parent 4470978 commit 719fe5a

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

python/torch_mlir/extras/fx_importer.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1915,7 +1915,7 @@ def _import_hop_flex_attention(
19151915
- score_mod: Optional submodule/callable for score modification (imported as function)
19161916
- block_mask: Optional BlockMask tuple containing mask_mod function and runtime tensors
19171917
- scale: Optional float for attention score scaling
1918-
- enable_gqa: Boolean for grouped query attention support (TODO: NYI)
1918+
- enable_gqa: Boolean for grouped query attention support
19191919
- kernel_options: Dict of performance tuning options (TODO: NYI)
19201920
19211921
This creates a call to aten.flex_attention with function symbol references for
@@ -1932,7 +1932,6 @@ def _import_hop_flex_attention(
19321932
node.args[:6]
19331933
)
19341934

1935-
# TODO: Add support for enable_gqa (grouped query attention)
19361935
# This is a boolean flag that enables GQA optimization
19371936
enable_gqa = node.args[6] if len(node.args) > 6 else False
19381937

0 commit comments

Comments
 (0)