multi-gpu fused_moe tuning support #143

divakar-amd · 2024-08-16T18:35:25Z

put the batch on different gpu and tune concurrently
moved triton search space to a new file tuning_utils.py
updated tqdm to show neatly for multiprocessing
add fp8 support

torchrun benchmark_mixtral_moe_rocm.py --model 8x7B --modelTP 8 --numGPU 8

- todo: add fp8 support - todo: add comments and documentation

okakarpa · 2024-09-17T21:51:34Z

retest

benchmarks/kernels/benchmark_mixtral_moe_rocm.py

benchmarks/kernels/tuning_utils.py

vgokhale · 2024-10-17T22:00:44Z

benchmarks/kernels/tuning_utils.py

+    # For now we see better perf with num_stages=0 for all gemm configs we care
+    # But keep this explicit so that we do not forget we may need to set it to
+    # other values in the future
+    num_stage_range = [0]


So this PR comes at a tricky time unfortunately.

See triton-lang/triton#4845 (review)

We are changing the sw pipelining such that it is more aligned with the nvidia side. After the above PR is merged (imminently), num_stages = 0 will actually be num_stages=2. If it is kept at 0, it will fail with an error as mentioned in the above link.

So we have two options

Hold this PR, if possible, until the Triton PR is submitted.

Submit this and then submit another once things break. As mentioned, the error message should be clear and should say what to do.

@shajrawi FYI

github-actions · 2025-01-22T01:58:21Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

divakar-amd changed the title ~~add multi-gpu tuning support with tqdm progress bar~~ multi-gpu fused_moe tuning support Aug 16, 2024

divakar-amd self-assigned this Aug 16, 2024

divakar-amd added 2 commits September 17, 2024 10:56

add multi-gpu tuning support with tqdm progress bar

fd16af5

- todo: add fp8 support - todo: add comments and documentation

ruff & yapf

37fc500

divakar-amd force-pushed the distributed_fmoe_tuning branch from cd68e07 to 37fc500 Compare September 17, 2024 15:56

divakar-amd added 3 commits September 17, 2024 19:17

kernel api update & Torchrun usage warning

6f45d02

[nit] file mode fix

ffaffcd

ruff ruff ruff (isort)

0e0608f

divakar-amd requested review from gshtras, charlifu, mawong-amd, rasmith, maleksan85, hegemanjw4amd and Alexei-V-Ivanov-AMD September 17, 2024 19:46

divakar-amd marked this pull request as ready for review September 17, 2024 19:48

Alexei-V-Ivanov-AMD reviewed Sep 17, 2024

View reviewed changes

benchmarks/kernels/benchmark_mixtral_moe_rocm.py Show resolved Hide resolved

Merge branch 'main' into distributed_fmoe_tuning

86e53fd

gshtras reviewed Sep 20, 2024

View reviewed changes

benchmarks/kernels/tuning_utils.py Outdated Show resolved Hide resolved

divakar-amd and others added 9 commits September 20, 2024 22:20

fix hardcoded top_k

fd6a4ee

Merge branch 'main' into distributed_fmoe_tuning

ab92950

add exception handling to see silen torchrun failures

2cb34c4

yapf

60860d9

Merge branch 'main' into distributed_fmoe_tuning

1e8b1f0

Merge branch 'main' into distributed_fmoe_tuning

101564f

use itertool.product for readability

618663d

re-use fused_topk function

d47b89c

yapf

17f9533

divakar-amd requested a review from gshtras October 16, 2024 20:35

divakar-amd requested a review from Alexei-V-Ivanov-AMD October 16, 2024 20:35

keep the config keys sorted in multi-gpu

7202911

divakar-amd requested a review from vgokhale October 17, 2024 21:45

vgokhale reviewed Oct 17, 2024

View reviewed changes

divakar-amd and others added 4 commits October 22, 2024 21:01

Merge branch 'main' into distributed_fmoe_tuning

dda55c8

add fp8 tuning support

c1134b6

ruff + yapf

0356fdd

Merge branch 'main' into distributed_fmoe_tuning

72dfb5a

github-actions bot added the stale label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-gpu fused_moe tuning support #143

multi-gpu fused_moe tuning support #143

divakar-amd commented Aug 16, 2024 •

edited

Loading

okakarpa commented Sep 17, 2024

vgokhale Oct 17, 2024

divakar-amd Oct 18, 2024

github-actions bot commented Jan 22, 2025

multi-gpu fused_moe tuning support #143

Are you sure you want to change the base?

multi-gpu fused_moe tuning support #143

Conversation

divakar-amd commented Aug 16, 2024 • edited Loading

okakarpa commented Sep 17, 2024

vgokhale Oct 17, 2024

Choose a reason for hiding this comment

divakar-amd Oct 18, 2024

Choose a reason for hiding this comment

github-actions bot commented Jan 22, 2025

divakar-amd commented Aug 16, 2024 •

edited

Loading