Create KV cache input tensor only if cache len > 0 for that layer (#15042)

sxu · facebook-github-bot · commit 6d3025ddec3d · 2025-10-13T09:16:20.000-07:00
Summary:

The MHA branch has this logic already, add it to the other branch.

Differential Revision: D84471388
diff --git a/examples/models/llama/static_attention.py b/examples/models/llama/static_attention.py
@@ -297,6 +297,7 @@ def __init__(
                     dtype=dtype,
                 )
                 for layer_id in range(config.n_layers)
+                if cache_lens[layer_id] > 0
             }
             self.v_caches = {
                 StaticKVCache.calculate_cache_key(layer_id, 0): torch.zeros(
@@ -307,6 +308,7 @@ def __init__(
                     dtype=dtype,
                 )
                 for layer_id in range(config.n_layers)
+                if cache_lens[layer_id] > 0
             }
 
         self.config = config

Original file line number	Diff line number	Diff line change
`@@ -297,6 +297,7 @@ def __init__(`
`297`	`297`	`dtype=dtype,`
`298`	`298`	`)`
`299`	`299`	`for layer_id in range(config.n_layers)`
	`300`	`+ if cache_lens[layer_id] > 0`
`300`	`301`	`}`
`301`	`302`	`self.v_caches = {`
`302`	`303`	`StaticKVCache.calculate_cache_key(layer_id, 0): torch.zeros(`
`@@ -307,6 +308,7 @@ def __init__(`
`307`	`308`	`dtype=dtype,`
`308`	`309`	`)`
`309`	`310`	`for layer_id in range(config.n_layers)`
	`311`	`+ if cache_lens[layer_id] > 0`
`310`	`312`	`}`
`311`	`313`
`312`	`314`	`self.config = config`