[BUG] Tmem tiled copy with non power-of-2 size fails to compile #2094

tridao · 2025-02-10T02:58:27Z

using namespace cute;
auto tmem_layout = make_layout(make_shape(_128{}, _160{}), make_stride(Int<65536>{}, _1{}));
Tensor A = make_tensor(make_tmem_ptr<float>(0), tmem_layout);
auto load = make_tmem_copy(SM100_TMEM_LOAD_32dp32b1x{}, A);

This fails to compile, with shape_div error. This is because make_tmem_copy calls make_cotile_copy, which calls left_inverse on the data layout. In this case left_inverse gives shape (65440, 1) to make it divisible by 160, but that leads to layout mismatch.

Is there another way I should be constructing the tmem tiled copy with non power-of-2 size?

Cc @thakkarV

The text was updated successfully, but these errors were encountered:

ccecka · 2025-02-10T03:44:48Z

Hmmmm, this is a really interesting bug, thanks Tri.

The core issue seems to be in my lazy implementation of left_inverse, which is simply incorrect in that case.

Indeed, when I hack in the correct left-inverse layouts to make_cotiled_copy, a valid partitioner (the same partitioner) is returned that performs exactly as we would expect

  {
  Layout tmem_layout = make_layout(make_shape(_128{}, _128{}), make_stride(Int<65536>{}, _1{}));
  Tensor A = make_tensor(make_tmem_ptr<float>(0), tmem_layout);
  TiledCopy load = make_tmem_copy(SM100_TMEM_LOAD_32dp32b1x{}, A);

  print(tmem_layout); print("\n");
  print(complement(tmem_layout)); print("\n");
  print(left_inverse(tmem_layout)); print("\n");
  print(load);
  print(load.get_slice(0).partition_S(A));  print("\n");
  }

  {
  Layout tmem_layout = make_layout(make_shape(_128{}, _160{}), make_stride(Int<65536>{}, _1{}));
  Tensor A = make_tensor(make_tmem_ptr<float>(0), tmem_layout);
  TiledCopy load = make_tmem_copy(SM100_TMEM_LOAD_32dp32b1x{}, A);

  print(tmem_layout); print("\n");
  print(complement(tmem_layout)); print("\n");
  print(left_inverse(tmem_layout)); print("\n");  // XXX Wrong
  print(load);
  print(load.get_slice(0).partition_S(A));   print("\n");
  }

prints

(_128,_128):(_65536,_1)
_512:_128
(_65536,_128):(_128,_1)
TiledCopy
  Tiler_MN:       ((_4,_32):(_32,_1),_1:_0)
  TiledLayout_TV: ((_32,_4),_32):((_0,_1),_4)
Copy_Atom
  ThrID:        _32:_1
  ValLayoutSrc: (_32,_32):(_0,_1)
  ValLayoutDst: (_32,_1):(_1,_1)
  ValLayoutRef: (_32,_32):(_0,_1)
  ValueType:    32b
tmem_[32b](0x0000.0000) o ((_32,_1),_1,_128):((_65536,_0),_0,_1)
(_128,_160):(_65536,_1)
_409:_160
_65440:_128
TiledCopy
  Tiler_MN:       ((_4,_32):(_32,_1),_1:_0)
  TiledLayout_TV: ((_32,_4),_32):((_0,_1),_4)
Copy_Atom
  ThrID:        _32:_1
  ValLayoutSrc: (_32,_32):(_0,_1)
  ValLayoutDst: (_32,_1):(_1,_1)
  ValLayoutRef: (_32,_32):(_0,_1)
  ValueType:    32b
tmem_[32b](0x0000.0000) o ((_32,_1),_1,_160):((_65536,_0),_0,_1)

Give me a little time to correct the implementation of left_inverse -- I'll put the patch here and get it merged asap.

ccecka · 2025-02-10T03:56:03Z

In the meantime, this also suggests that an immediate workaround is to build the partitioner with a very similar TMEM tensor and then use it on your actual TMEM tensor.

tridao · 2025-02-10T03:57:26Z

In the meantime, this also suggests that an immediate workaround is to build the partitioner with a very similar TMEM tensor and then use it on your actual TMEM tensor.

Yup that's the workaround i'm using right now: creating a fake tmem tensor of size (128, 128) to get the tiled tmem_copy.

tridao added ? - Needs Triage bug Something isn't working labels Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Tmem tiled copy with non power-of-2 size fails to compile #2094

[BUG] Tmem tiled copy with non power-of-2 size fails to compile #2094

tridao commented Feb 10, 2025

ccecka commented Feb 10, 2025

ccecka commented Feb 10, 2025

tridao commented Feb 10, 2025

[BUG] Tmem tiled copy with non power-of-2 size fails to compile #2094

[BUG] Tmem tiled copy with non power-of-2 size fails to compile #2094

Comments

tridao commented Feb 10, 2025

ccecka commented Feb 10, 2025

ccecka commented Feb 10, 2025

tridao commented Feb 10, 2025