You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
while the thread num is 96, which is not divisible by 16x128.
However, this tile is the most efficient among the tile configurations and can be tensorized using our raw BitBlas TIR backend.
The text was updated successfully, but these errors were encountered:
LeiWang1999
changed the title
[Feature Request] Parallel Primitive Should be enhanced to improve the performance for small shapes.
[Feature Request] Parallel Primitive Should be enhanced to improve the performance for irregular shapes.
Oct 2, 2024
LeiWang1999
changed the title
[Feature Request] Parallel Primitive Should be enhanced to improve the performance for irregular shapes.
[Feature Request] Parallel Primitive Should be enhanced to improve the performance for irregular shapes
Oct 2, 2024
To reproduce a worse case:
The output is:
The problem lies in the parallel primitives:
while the thread num is 96, which is not divisible by 16x128.
However, this tile is the most efficient among the tile configurations and can be tensorized using our raw BitBlas TIR backend.
The text was updated successfully, but these errors were encountered: