You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for sharing this great resource.
I'm trying to run some benchmarks with test_mask from examples/flex_attn.ipynb on one RTX 4090. When I set B=1,H=16,S=2048,D=128, it triggers an error:
triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 131074, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.
The text was updated successfully, but these errors were encountered:
Thanks for opening this PR up! Admittedly we dont have much CI testing for PyTorch on 4090. Would you mind trying to create a minimal repro on posting it on PyTorch. Feel free to tag me in the issue
Thanks for sharing this great resource.
I'm trying to run some benchmarks with
test_mask
fromexamples/flex_attn.ipynb
on one RTX 4090. When I setB=1,H=16,S=2048,D=128
, it triggers an error:The text was updated successfully, but these errors were encountered: