Optimize Workgroup and Thread Allocation for Metal’s GPU Architecture #10

moven0831 · 2024-10-25T11:50:43Z

Problem

Inefficient threadgroup sizes and thread allocations lead to increased overhead and underutilization of Metal’s GPU capabilities, resulting in suboptimal MSM performance.

Details

Fine-tune threadgroup sizes and thread allocations to align with Metal's GPU architecture. Aim to minimize overhead and maximize parallelism by determining optimal threadgroup config for different stages of the MSM process.

We are suffering from the GPU Hang Error on mobile device in current Metal MSM, and we suspect that the reason could be related to the maximum memory allocation of threadgroup. There

Acceptance criteria

Successfully run 2^20 to 2^22 sizes of MSM without encountering the GPU Hang Error on iOS device
Determine optimal workgroup sizes for various stages of the MSM process within Metal.
Implement dynamic thread dispatching strategies to balance the load across Metal’s GPU cores.
Test multiple configurations to identify the most performant setup on target iOS devices.
Achieve measurable performance improvements compared to the baseline thread allocation strategy.

Reference

FoodChain1028 self-assigned this Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Workgroup and Thread Allocation for Metal’s GPU Architecture #10

Optimize Workgroup and Thread Allocation for Metal’s GPU Architecture #10

moven0831 commented Oct 25, 2024

Optimize Workgroup and Thread Allocation for Metal’s GPU Architecture #10

Optimize Workgroup and Thread Allocation for Metal’s GPU Architecture #10

Comments

moven0831 commented Oct 25, 2024

Problem

Details

Acceptance criteria

Reference