-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KOKKOS_FUNCTION-annotated for_each and transform_reduce #708
base: main
Are you sure you want to change the base?
Conversation
@tpadioleau can you take a look at the two issues I mention in the first message ? The second one just seems to be an annoying bug with the support of |
I don't know if i will have time to have a closer look this week. You could look for the differences with the implementation in the first PR, i think the tests were passing |
I think I have the same implementation as the closed branch. But you don't have 2D test for nested for_each, and I think the bug with nvcc appears only for 2D+ cases. |
Did you test ? |
I just tested it and you were right to ask because the 1D case does not work to. I tried you branch and I don't get the bug. This looks very weird to me, either I am missing something obvious, either something broke somewhere since March. If you have any idea I'd take it, otherwise I will take my time for this because it does not look easy to debug. |
2c9f15f
to
ce927d9
Compare
include/ddc/for_each.hpp
Outdated
if constexpr (I == N) { | ||
f(RetType(is...)); | ||
} else { | ||
for (Element ii = static_cast<int>(begin[I]); ii < end[I]; ++ii) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This static_cast<int>
is a nonsense workaround since Element
aliases std::size_t
. Without it the loop does not iterate with nvcc
in the new tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it print if you remove the cast ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if I print something inside the loop it appears only once)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can print something inside the loop it means it can iterate once ?
What is the value of end[I] ?
Have you tried to run in debug ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry what do you want me to print inside the loop ? In my tests i was just printing a string like "test", not "%i" involved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could print ii
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do reproduce with g++-12
, but not with -G
, so it seems to be an optimization issue indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything with compute-sanitizer (tried every --tool
)
I have found another bug. This is not visible in the tests (should I add a test which reproduces it ?). I have a use-case where the
Must be replaced with:
(I don't understand what is the problem as ddc/include/ddc/transform_reduce.hpp Line 84 in 73ee089
|
All of that is weird, can you provide more information about your environment ?
Also do all other DDC tests pass ? It feels like the compiler optimized away some function calls |
All DDC tests pass. I don't know how to compile the Kokkos tests. I will try quite soon on a 1070Ti with mostly the same software configuration. I will try to dig in |
From the table https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#system-requirements you are one minor version above the maximum tested host compiler version, i.e. 13.2. Can you try with a different compiler ? |
Closes #172