Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add switches for some functions to compile differently for host vs device #434

Merged
merged 16 commits into from
Nov 27, 2024

Conversation

Yurlungur
Copy link
Collaborator

@Yurlungur Yurlungur commented Nov 22, 2024

PR Summary

This MR is motivated by two things:

  • EOSPAC currently can't be built on HIP because EOSPAC functions are not device-capable, but we need PORTABLE_INLINE_FUNCTION markings.
  • The Evaluate method in the base class needs to work on both host and device, but the constexpr method did not appear to fully function as desired and we were seeing segfualts with tables in riot.

Here we use the macro

#if defined(__CUDACC__) || defined(__HIPCC__)
/* My device code */
#else
/* My host code */
#endif // ON DEVICE

to create switches for separate implementations or prototypes depending on whether a function is called on either host or device. This unfortunately breaks the portability layer in ports-of-call but I don't see a way to hide #if blah inside an abstraction layer. So this is what we get.

@rbberger at your convenience, please try enabling EOSPAC for the RZadams CI for this MR and see if it kills it. There's a WIP PR on re-git you can use.

@pdmullen plase try to reproduce the segfault in riot that you experienced. Lets see if it still happens.

PR Checklist

  • Adds a test for any bugs fixed. Adds tests for new features.
  • Format your changes by using the make format command after configuring with cmake.
  • Document any new features, update documentation for changes made.
  • Make sure the copyright notice on any files you modified is up to date.
  • After creating a pull request, note it in the CHANGELOG.md file.
  • LANL employees: make sure tests pass both on the github CI and on the Darwin CI

If preparing for a new release, in addition please check the following:

  • Update the version in cmake.
  • Move the changes in the CHANGELOG.md file under a new header for the new release, and reset the categories.
  • Ensure that any when='@main' dependencies are updated to the release version in the package.py

@Yurlungur Yurlungur added bug Something isn't working build Something to do with the build system clean-up Changes that don't affect code execution but make the code cleaner labels Nov 22, 2024
@Yurlungur
Copy link
Collaborator Author

Correction, unfortunately this alternate implementation trick does not work for function prototypes. I had to split Evaluate into two functions, an EvaluateHost and an EvaluateDevice.

@Yurlungur Yurlungur linked an issue Nov 25, 2024 that may be closed by this pull request
@Yurlungur
Copy link
Collaborator Author

@rbberger @pdmullen this is now ready for review. EvaluateDevice now works and EOSPAC should how build for CUDA and HIP.

@@ -148,8 +148,13 @@ class EosBase {

// Generic evaluator
template <typename Functor_t>
constexpr void Evaluate(Functor_t &f) const {
CRTP copy = *(static_cast<CRTP const *>(this));
PORTABLE_INLINE_FUNCTION void EvaluateDevice(const Functor_t f) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall the exact error, but I recall making this a forwarding reference fixed some issues I had seen. Probably worth trying.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the current version works, so if you're happy with the split function signature (as discussed below) lets keep it as is. However, if you really dislike the split function signature I can take a look and try games with forwarding references, etc.

@dholladay00
Copy link
Collaborator

Correction, unfortunately this alternate implementation trick does not work for function prototypes. I had to split Evaluate into two functions, an EvaluateHost and an EvaluateDevice.

So does this mean codes that use evaluate are going to have a bunch of #if stuff to call the right thing? Something doesn't seem right.

@Yurlungur
Copy link
Collaborator Author

Correction, unfortunately this alternate implementation trick does not work for function prototypes. I had to split Evaluate into two functions, an EvaluateHost and an EvaluateDevice.

So does this mean codes that use evaluate are going to have a bunch of #if stuff to call the right thing? Something doesn't seem right.

I'm not sure I understand what you mean. EvaluateHost and EvaluateDevice both work regardless of the target architecture you compiled for. But the model I'm imagining now is that EvaluateHost is called when you want to call your own portableFor and EvaluateDevice is called from within a portableFor

CMakeLists.txt Show resolved Hide resolved
doc/sphinx/src/using-eos.rst Outdated Show resolved Hide resolved
singularity-eos/eos/eos_base.hpp Show resolved Hide resolved
singularity-eos/eos/eos_eospac.hpp Outdated Show resolved Hide resolved
test/test_eos_modifiers.cpp Show resolved Hide resolved
@Yurlungur
Copy link
Collaborator Author

The tests we expect to pass are all now passing. I'd like to merge this, with a note that we still have a segfault on HIP which needs to be resolved.

@Yurlungur
Copy link
Collaborator Author

@jhp-lanl @dholladay00 any objections?

Copy link
Collaborator

@pdmullen pdmullen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! My only comment is just to document here for future self that the associated symptom was not a segfault, but rather corrputed data. This should close Issue #393 (which I think is already linked to this PR).

@jhp-lanl
Copy link
Collaborator

@jhp-lanl @dholladay00 any objections?

Looks good 👍

@Yurlungur Yurlungur merged commit 14bc298 into main Nov 27, 2024
6 checks passed
@Yurlungur Yurlungur deleted the jmm/fix-warnings branch November 27, 2024 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build Something to do with the build system clean-up Changes that don't affect code execution but make the code cleaner
Projects
None yet
5 participants