forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 29
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Updated library versions * Simple num_stages fix without re-tuning for performance * Tuning script adaptation for the new triton * navi lib versions * Update MI300X fused_moe configs for Triton 3.2 (#344) --------- Co-authored-by: Jeremy Arnold <[email protected]>
- Loading branch information
1 parent
1dcd9fe
commit ca4d670
Showing
58 changed files
with
3,576 additions
and
1,320 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
164 changes: 164 additions & 0 deletions
164
.../layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
{ | ||
"1": { | ||
"BLOCK_SIZE_M": 16, | ||
"BLOCK_SIZE_N": 64, | ||
"BLOCK_SIZE_K": 256, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 4, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"2": { | ||
"BLOCK_SIZE_M": 16, | ||
"BLOCK_SIZE_N": 64, | ||
"BLOCK_SIZE_K": 256, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 4, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"4": { | ||
"BLOCK_SIZE_M": 16, | ||
"BLOCK_SIZE_N": 32, | ||
"BLOCK_SIZE_K": 128, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 2, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"8": { | ||
"BLOCK_SIZE_M": 16, | ||
"BLOCK_SIZE_N": 64, | ||
"BLOCK_SIZE_K": 256, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 2, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"16": { | ||
"BLOCK_SIZE_M": 16, | ||
"BLOCK_SIZE_N": 64, | ||
"BLOCK_SIZE_K": 256, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 2, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"24": { | ||
"BLOCK_SIZE_M": 16, | ||
"BLOCK_SIZE_N": 64, | ||
"BLOCK_SIZE_K": 256, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 4, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"32": { | ||
"BLOCK_SIZE_M": 16, | ||
"BLOCK_SIZE_N": 64, | ||
"BLOCK_SIZE_K": 256, | ||
"GROUP_SIZE_M": 4, | ||
"num_warps": 4, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"48": { | ||
"BLOCK_SIZE_M": 16, | ||
"BLOCK_SIZE_N": 64, | ||
"BLOCK_SIZE_K": 256, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 4, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"64": { | ||
"BLOCK_SIZE_M": 32, | ||
"BLOCK_SIZE_N": 64, | ||
"BLOCK_SIZE_K": 256, | ||
"GROUP_SIZE_M": 4, | ||
"num_warps": 2, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"96": { | ||
"BLOCK_SIZE_M": 32, | ||
"BLOCK_SIZE_N": 64, | ||
"BLOCK_SIZE_K": 256, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 2, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"128": { | ||
"BLOCK_SIZE_M": 64, | ||
"BLOCK_SIZE_N": 64, | ||
"BLOCK_SIZE_K": 256, | ||
"GROUP_SIZE_M": 4, | ||
"num_warps": 4, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"256": { | ||
"BLOCK_SIZE_M": 128, | ||
"BLOCK_SIZE_N": 128, | ||
"BLOCK_SIZE_K": 256, | ||
"GROUP_SIZE_M": 4, | ||
"num_warps": 8, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"512": { | ||
"BLOCK_SIZE_M": 256, | ||
"BLOCK_SIZE_N": 128, | ||
"BLOCK_SIZE_K": 128, | ||
"GROUP_SIZE_M": 4, | ||
"num_warps": 8, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"1024": { | ||
"BLOCK_SIZE_M": 128, | ||
"BLOCK_SIZE_N": 128, | ||
"BLOCK_SIZE_K": 256, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 8, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"1536": { | ||
"BLOCK_SIZE_M": 128, | ||
"BLOCK_SIZE_N": 256, | ||
"BLOCK_SIZE_K": 128, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 8, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"2048": { | ||
"BLOCK_SIZE_M": 128, | ||
"BLOCK_SIZE_N": 256, | ||
"BLOCK_SIZE_K": 128, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 8, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"3072": { | ||
"BLOCK_SIZE_M": 128, | ||
"BLOCK_SIZE_N": 256, | ||
"BLOCK_SIZE_K": 128, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 8, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
}, | ||
"4096": { | ||
"BLOCK_SIZE_M": 256, | ||
"BLOCK_SIZE_N": 256, | ||
"BLOCK_SIZE_K": 64, | ||
"GROUP_SIZE_M": 1, | ||
"num_warps": 8, | ||
"num_stages": 2, | ||
"waves_per_eu": 0 | ||
} | ||
} |
Oops, something went wrong.