Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-gpu fused_moe tuning support #143

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

divakar-amd
Copy link

@divakar-amd divakar-amd commented Aug 16, 2024

  • put the batch on different gpu and tune concurrently
  • moved triton search space to a new file tuning_utils.py
  • updated tqdm to show neatly for multiprocessing
  • add fp8 support

torchrun benchmark_mixtral_moe_rocm.py --model 8x7B --modelTP 8 --numGPU 8

image

@divakar-amd divakar-amd changed the title add multi-gpu tuning support with tqdm progress bar multi-gpu fused_moe tuning support Aug 16, 2024
@divakar-amd divakar-amd self-assigned this Aug 16, 2024
- todo: add fp8 support
- todo: add comments and documentation
@divakar-amd divakar-amd force-pushed the distributed_fmoe_tuning branch from cd68e07 to 37fc500 Compare September 17, 2024 15:56
@okakarpa
Copy link

retest

@divakar-amd divakar-amd requested a review from gshtras October 16, 2024 20:35
@divakar-amd divakar-amd requested a review from vgokhale October 17, 2024 21:45
# For now we see better perf with num_stages=0 for all gemm configs we care
# But keep this explicit so that we do not forget we may need to set it to
# other values in the future
num_stage_range = [0]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this PR comes at a tricky time unfortunately.

See triton-lang/triton#4845 (review)

We are changing the sw pipelining such that it is more aligned with the nvidia side. After the above PR is merged (imminently), num_stages = 0 will actually be num_stages=2. If it is kept at 0, it will fail with an error as mentioned in the above link.

So we have two options

  1. Hold this PR, if possible, until the Triton PR is submitted.

  2. Submit this and then submit another once things break. As mentioned, the error message should be clear and should say what to do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shajrawi FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants