Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(LLVM compiler bug) NV GPU Offload errors due to misaligned addresses #5138

Open
prckent opened this issue Aug 21, 2024 · 5 comments
Open

(LLVM compiler bug) NV GPU Offload errors due to misaligned addresses #5138

prckent opened this issue Aug 21, 2024 · 5 comments
Labels

Comments

@prckent
Copy link
Contributor

prckent commented Aug 21, 2024

Describe the bug

A whole variety of periodic Gaussian tests are failing with LLVM offload. The restart tests are also failing.

These are in the nightlies and offloading to V100.

See : https://cdash.qmcpack.org/viewTest.php?onlyfailed&buildid=7342

e.g.
deterministic-diamondC_2x1x1_pp-vmcbatch_gaussian_sdj-1-1
https://cdash.qmcpack.org/tests/2182646
"PluginInterface" error: Failure to synchronize stream (nil): Error in cuStreamSynchronize: misaligned address
omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
SoaAtomicBasisSet.h:875:7: omptarget fatal error 1: failure of target construct while offloading is mandatory
[sulfur:1226856] *** Process received signal ***

removed redundant
deterministic-restart-1-16
https://cdash.qmcpack.org/tests/2181794
Anonymous Buffer size per walker : 19280 Bytes.
MEMORY increase 0 MB VMC::resetRun
"PluginInterface" error: Faliure to copy data from device to host. Pointers: host = 0x00007f0df17bf3e4, device = 0x00007f0df20a9c00, size = 8: Error in cuMemcpyDtoHAsync: misaligned address
omptarget error: Copying data from device failed.
omptarget error: Call to targetDataEnd failed, abort target.
omptarget error: Failed to process data after launching the kernel.
omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
"PluginInterface" error: ompBLAS.cpp:649:3: omptarget fatal error 1: failure of target construct while offloading is mandatory
Failure to synchronize stream (nil): Error in cuStreamSynchronize: misaligned address

To Reproduce

Ask for latest software versions if not clear on cdash

Expected behavior
Tests should pass

System:
sulfur

@prckent prckent added the bug label Aug 21, 2024
@prckent
Copy link
Contributor Author

prckent commented Aug 21, 2024

Using LLVM 18.1.8

@ye-luo
Copy link
Contributor

ye-luo commented Aug 21, 2024

With clang, -DCMAKE_BUILD_TYPE=Debug doesn't add optimization flags -Ox namely using the default -O0.
I can reproduce the issue and after adding -O3 using -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS=-O3, the error disappears. So it is a compiler issue not QMCPACK source code issue.

@prckent
Copy link
Contributor Author

prckent commented Aug 21, 2024

Any chance for a small reproducer? Can you make an issue on the relevant repo and link it here?

@ye-luo
Copy link
Contributor

ye-luo commented Aug 21, 2024

Any chance for a small reproducer? Can you make an issue on the relevant repo and link it here?

Unfortunately, it will be very very low priority for me.

@prckent
Copy link
Contributor Author

prckent commented Aug 22, 2024

No worries.

@prckent prckent changed the title NV GPU Offload errors due to misaligned addresses (LLVM compiler bug) NV GPU Offload errors due to misaligned addresses Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants