Always use 64 as the block size of moe_align kernel to avoid lds out of limit #303

charlifu · 2024-12-04T21:08:20Z

The moe_align kernel uses the max value between num of experts and warp size as the gpu block size. This leads to lds out of limit when handling model with big number of experts.

This PR sets the block size to always be warp size, and changes the kernel to handle the situation when thread number is smaller than number of experts.

shajrawi

Kudos Charlie!

gshtras · 2024-12-04T21:26:52Z

csrc/moe/moe_align_sum_kernels.cu

-  if (threadIdx.x < num_experts) {
-    tokens_cnts[index(num_experts, 0, threadIdx.x)] = 0;
+  for (int eid = threadIdx.x; eid < num_experts; eid += blockDim.x) {
+    tokens_cnts[index(num_experts, 0, eid)] = 0;
    for (int i = 1; i <= blockDim.x; ++i) {


Does this have to be a nested loop?
Could a reduce operation be more efficient here?

Each thread is doing its own reduce op on an independent data array. In this case warp reduce primitive does not apply.

always use 64 as the block size to avoid lds out of limit

72eb399

charlifu requested review from gshtras and shajrawi December 4, 2024 21:08

lint

74ab645

shajrawi approved these changes Dec 4, 2024

View reviewed changes

charlifu changed the title ~~Always use 64 as the block size to avoid lds out of limit~~ Always use 64 as the block size of moe_align kernel to avoid lds out of limit Dec 4, 2024

gshtras reviewed Dec 4, 2024

View reviewed changes

Merge branch 'develop' into charlifu/fix_moe_align_expert_num_too_big

8e9f28a

charlifu merged commit b414ae9 into develop Dec 5, 2024
7 checks passed

gshtras deleted the charlifu/fix_moe_align_expert_num_too_big branch December 7, 2024 03:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always use 64 as the block size of moe_align kernel to avoid lds out of limit #303

Always use 64 as the block size of moe_align kernel to avoid lds out of limit #303

charlifu commented Dec 4, 2024 •

edited by github-actions bot

Loading

shajrawi left a comment

gshtras Dec 4, 2024

charlifu Dec 5, 2024

Always use 64 as the block size of moe_align kernel to avoid lds out of limit #303

Always use 64 as the block size of moe_align kernel to avoid lds out of limit #303

Conversation

charlifu commented Dec 4, 2024 • edited by github-actions bot Loading

shajrawi left a comment

Choose a reason for hiding this comment

gshtras Dec 4, 2024

Choose a reason for hiding this comment

charlifu Dec 5, 2024

Choose a reason for hiding this comment

charlifu commented Dec 4, 2024 •

edited by github-actions bot

Loading