-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Tmem tiled copy with non power-of-2 size fails to compile #2094
Comments
Hmmmm, this is a really interesting bug, thanks Tri. The core issue seems to be in my lazy implementation of Indeed, when I hack in the correct left-inverse layouts to {
Layout tmem_layout = make_layout(make_shape(_128{}, _128{}), make_stride(Int<65536>{}, _1{}));
Tensor A = make_tensor(make_tmem_ptr<float>(0), tmem_layout);
TiledCopy load = make_tmem_copy(SM100_TMEM_LOAD_32dp32b1x{}, A);
print(tmem_layout); print("\n");
print(complement(tmem_layout)); print("\n");
print(left_inverse(tmem_layout)); print("\n");
print(load);
print(load.get_slice(0).partition_S(A)); print("\n");
}
{
Layout tmem_layout = make_layout(make_shape(_128{}, _160{}), make_stride(Int<65536>{}, _1{}));
Tensor A = make_tensor(make_tmem_ptr<float>(0), tmem_layout);
TiledCopy load = make_tmem_copy(SM100_TMEM_LOAD_32dp32b1x{}, A);
print(tmem_layout); print("\n");
print(complement(tmem_layout)); print("\n");
print(left_inverse(tmem_layout)); print("\n"); // XXX Wrong
print(load);
print(load.get_slice(0).partition_S(A)); print("\n");
} prints
Give me a little time to correct the implementation of |
In the meantime, this also suggests that an immediate workaround is to build the partitioner with a very similar TMEM tensor and then use it on your actual TMEM tensor. |
Yup that's the workaround i'm using right now: creating a fake tmem tensor of size (128, 128) to get the tiled tmem_copy. |
This fails to compile, with shape_div error. This is because
make_tmem_copy
callsmake_cotile_copy
, which callsleft_inverse
on the data layout. In this case left_inverse gives shape(65440, 1)
to make it divisible by 160, but that leads to layout mismatch.Is there another way I should be constructing the tmem tiled copy with non power-of-2 size?
Cc @thakkarV
The text was updated successfully, but these errors were encountered: