You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TLDR: defining array sizes with unsigned integers and then tiling the resulting kernel will break the per-thread check to see if it is inside the loop domain. This is because the check is of the form if (N - something >= 0) {, which is always true if N is unsigned (at least in my testing)
Hefty mvp to reproduce (including PyCuda boilerplate)
which specifically breaks on the parts on the if statement. Expressions like -1 + -32 * bIdx(y) + -1 * tIdx(y) + nyc>=0 are always true in unsigned arithmetic, so the above kernel will go past the correct array bounds.
An easy fix is to have the code generator cast the limits to signed ... (signed) Nxc and (signed) Nyc) ... in the if statement. That seems to work on my end.
Index arithmetic in loopy is always assumed to be signed, but we're not doing a good job checking that. I agree that casting to signed in integer expressions is probably the cleanest way out of this.
TLDR: defining array sizes with unsigned integers and then tiling the resulting kernel will break the per-thread check to see if it is inside the loop domain. This is because the check is of the form
if (N - something >= 0) {
, which is always true ifN
is unsigned (at least in my testing)Hefty mvp to reproduce (including PyCuda boilerplate)
For the unsigned case, I get the following device code:
which specifically breaks on the parts on the if statement. Expressions like
-1 + -32 * bIdx(y) + -1 * tIdx(y) + nyc>=0
are always true in unsigned arithmetic, so the above kernel will go past the correct array bounds.Output I get with the unsigned kernel:
versus what I (correctly) get with a signed version, which is a ring of 0's around the exterior:
The text was updated successfully, but these errors were encountered: