We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For context, in a sumpy P2P kernel I have a temporary of size
local_isrc[5, 45]
which results in 5 memory loads/stores, but it could be split into
local_isrc_s0[2, 45] local_isrc_s1[2, 45] local_isrc_s2[1, 45]
which results in only 3 memory loads/stores.
One way that I can achieve this is to do
lp.split_array_axes(knl, "local_isrc", 0, 2) lp.tag_array_axes(knl, "local_isrc", "C,vec,C")
however this results in 6*45 elements being allocated in shared memory. (Sometimes the compiler optimizes this into 5, 45, sometimes not).
6*45
5, 45
I tried
lp.split_array_axes(knl, "local_isrc", 0, 2) lp.tag_array_axes(knl, "local_isrc", "sep,vec,C")
which does not work.
The text was updated successfully, but these errors were encountered:
Sometimes the compiler optimizes this into 5, 45, sometimes not
Turns out, the compiler does optimize it predictably. Was looking at a wrong source code.
Sorry, something went wrong.
No branches or pull requests
For context, in a sumpy P2P kernel I have a temporary of size
which results in 5 memory loads/stores,
but it could be split into
which results in only 3 memory loads/stores.
One way that I can achieve this is to do
however this results in
6*45
elements being allocated in shared memory. (Sometimes the compiler optimizes this into5, 45
, sometimes not).I tried
which does not work.
The text was updated successfully, but these errors were encountered: