Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Workgroup and Thread Allocation for Metal’s GPU Architecture #10

Open
moven0831 opened this issue Oct 25, 2024 · 0 comments
Open
Assignees

Comments

@moven0831
Copy link
Collaborator

Problem

Inefficient threadgroup sizes and thread allocations lead to increased overhead and underutilization of Metal’s GPU capabilities, resulting in suboptimal MSM performance.

Details

Fine-tune threadgroup sizes and thread allocations to align with Metal's GPU architecture. Aim to minimize overhead and maximize parallelism by determining optimal threadgroup config for different stages of the MSM process.

We are suffering from the GPU Hang Error on mobile device in current Metal MSM, and we suspect that the reason could be related to the maximum memory allocation of threadgroup. There

Acceptance criteria

  • Successfully run 2^20 to 2^22 sizes of MSM without encountering the GPU Hang Error on iOS device
  • Determine optimal workgroup sizes for various stages of the MSM process within Metal.
  • Implement dynamic thread dispatching strategies to balance the load across Metal’s GPU cores.
  • Test multiple configurations to identify the most performant setup on target iOS devices.
  • Achieve measurable performance improvements compared to the baseline thread allocation strategy.

Reference

@FoodChain1028 FoodChain1028 self-assigned this Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants