Skip to content

Commit b8d1a6d

Browse files
committed
add 405b maverick perf data and note
Signed-off-by: zpatel <[email protected]>
1 parent ab941ca commit b8d1a6d

File tree

1 file changed

+17
-1
lines changed

1 file changed

+17
-1
lines changed

docs/source/performance/perf-overview.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,23 @@ nvidia/Llama-4-Maverick-17B-128E-Instruct-FP8
132132
| 20000, 2000 | | 363.27 | 509.87 |
133133

134134
#### Llama 4 Maverick FP8
135-
TODO
135+
136+
*Performance for Llama 4 on sequence lengths less than 8,192 tokens is affected by an issue introduced in v0.21. To reproduce the Llama 4 performance noted here, please use v0.20
137+
138+
| | GPU | H200 141GB HBM3 | H100 80GB HBM3 |
139+
|:-----------------------------|:---|:------------------|:-----------------|
140+
| | TP Size | 8 | 8 |
141+
| ISL, OSL | | | |
142+
| | | | |
143+
| 128, 2048 | | 27,543.87 | |
144+
| 128, 4096 | | 18,541.01 | 11,163.12 |
145+
| 500, 2000 | | 21,117.34 | |
146+
| 1000, 2000 | | | 10,556.00 |
147+
| 1024, 2048 | | 16,859.45 | 11,584.33 |
148+
| 2048, 128 | | 4,364.06 | 3,832.38 |
149+
| 2048, 2048 | | 12,800.89 | |
150+
| 5000, 500 | | 5,128.60 | |
151+
| 20000, 2000 | | 1,764.27 | 1,400.79 |
136152

137153
## Reproducing Benchmarked Results
138154

0 commit comments

Comments
 (0)