Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocmPackages_[56].miopen tests fail to build, rocmPackages_6.miopen is broken #299156

Open
dwf opened this issue Mar 26, 2024 · 4 comments
Open
Labels
0.kind: bug Something is broken

Comments

@dwf
Copy link
Contributor

dwf commented Mar 26, 2024

Describe the bug

I ran into

/nix/store/schyg38lhnay87rm3080fnzzw2hiq3d8-rocm6-packages-joined/lib/libMIOpen.so.1: undefined symbol: LLVMInitializeX86TargetInfo

while working on getting jaxlib building with ROCm support, so I decided to build and run the MIOpen tests. At the moment they don't build.

One noticeable thing is that gtest should probably be in buildInputs but this was not enough to get the build going.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Build rocmPackages_6.miopen with buildTests = true

Expected behavior

The build succeeds.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Notify maintainers

@mschwaig @Madouura @Flakebi

@ScatteredRay I know you were working with ROCm 6 too, FYI

Cc #197885

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.64, NixOS, 23.11 (Tapir)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.1`
 - channels(root): `"nixos-21.11.334247.573095944e7"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Add a 👍 reaction to issues you find important.

@dwf dwf added the 0.kind: bug Something is broken label Mar 26, 2024
@dwf
Copy link
Contributor Author

dwf commented Mar 26, 2024

More information: synced to c77e28a with buildTests = true:

In file included from /build/source/test/gtest/log.cpp:26:
/build/source/test/gtest/log.hpp:28:10: fatal error: 'gtest/gtest.h' file not found
#include <gtest/gtest.h>
         ^~~~~~~~~~~~~~~
In file included from /build/source/test/gtest/tuna_net.cpp:1:
In file included from /build/source/test/gtest/../gtest/ai_heuristics.hpp:29:
/build/source/test/gtest/../gtest/cba.hpp:30:10: fatal error: 'gtest/gtest.h' file not found
#include <gtest/gtest.h>
         ^~~~~~~~~~~~~~~
1 error generated when compiling for gfx906.

Putting gtest in buildInputs:

diff --git a/pkgs/development/rocm-modules/6/miopen/default.nix b/pkgs/development/rocm-modules/6/miopen/default.nix
index f78bcb602e69..63cc9c847441 100644
--- a/pkgs/development/rocm-modules/6/miopen/default.nix
+++ b/pkgs/development/rocm-modules/6/miopen/default.nix
@@ -30,7 +30,7 @@
 , roctracer
 , python3Packages
 , buildDocs ? false # Needs internet because of rocm-docs-core
-, buildTests ? false
+, buildTests ? true
 }:

 let
@@ -155,6 +155,7 @@ in stdenv.mkDerivation (finalAttrs: {
     python3Packages.breathe
     python3Packages.myst-parser
   ] ++ lib.optionals buildTests [
+    gtest
     zlib
   ];

The errors become missing symbol problems, e.g.:

[ 79%] Built target test_gpu_reference_kernel
[ 79%] Building CXX object test/CMakeFiles/test_tensor_copy.dir/tensor_copy.cpp.o
[ 79%] Linking CXX executable ../../bin/test_tuna_net
ld.lld: error: undefined symbol: testing::AssertionSuccess()
>>> referenced by platform.cpp
>>>               CMakeFiles/test_tuna_net.dir/platform.cpp.o:(DevMem::~DevMem())
>>> referenced by log.cpp
>>>               CMakeFiles/test_tuna_net.dir/log.cpp.o:(setEnvironmentVariable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>
>>> referenced by log.cpp
>>>               CMakeFiles/test_tuna_net.dir/log.cpp.o:(unSetEnvironmentVariable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
>>> referenced 8 more times

ld.lld: error: undefined symbol: testing::Message::Message()
>>> referenced by platform.cpp
>>>               CMakeFiles/test_tuna_net.dir/platform.cpp.o:(DevMem::~DevMem())
>>> referenced by log.cpp
>>>               CMakeFiles/test_tuna_net.dir/log.cpp.o:(TestLogFun(std::function<void (miopenTensorDescriptor* const&, miopenTensorDescriptor* const&, miopen>
>>> referenced by log.cpp
>>>               CMakeFiles/test_tuna_net.dir/log.cpp.o:(TestLogFun(std::function<void (miopenTensorDescriptor* const&, miopenTensorDescriptor* const&, miopen>
>>> referenced 22 more times

@dwf
Copy link
Contributor Author

dwf commented Mar 26, 2024

Confirming that rocmPackages_5.miopen appears to have the same issue.

@dwf dwf changed the title rocmPackages_6.miopen tests fail to build, otherwise broken rocmPackages_[56].miopen tests fail to build, otherwise broken Mar 26, 2024
@dwf
Copy link
Contributor Author

dwf commented Mar 27, 2024

I got most tests to build with dwf@60d972d but a few problems persist.

for miopen 5.7.1, I think I just don't have compatible hardware, or have to set some environment variable perhaps.

[ 96%] Linking CXX executable ../../bin/test_api_convbiasactiv
terminate called after throwing an instance of 'miopen::Exception'
  what():  /build/source/src/hip/handlehip.cpp:133: No device
CMake Error at /nix/store/17r6ld906midfv8y7997fd56s7a87vrg-cmake-3.28.3/share/cmake-3.28/Modules/GoogleTestAddTests.cmake:112 (message):
  Error running test executable.

    Path: '/build/source/build/bin/test_api_convbiasactiv'
    Result: Subprocess aborted
    Output:


Call Stack (most recent call first):
  /nix/store/17r6ld906midfv8y7997fd56s7a87vrg-cmake-3.28.3/share/cmake-3.28/Modules/GoogleTestAddTests.cmake:225 (gtest_discover_tests_impl)


make[3]: *** [test/gtest/CMakeFiles/test_api_convbiasactiv.dir/build.make:141: bin/test_api_convbiasactiv] Error 1
make[3]: *** Deleting file 'bin/test_api_convbiasactiv'
make[2]: *** [CMakeFiles/Makefile2:15718: test/gtest/CMakeFiles/test_api_convbiasactiv.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs....

For miopen 6.0.2, it's the error I encountered before:

[ 78%] Linking CXX executable ../../bin/test_tuna_net
/build/source/build/bin/test_tuna_net: symbol lookup error: /build/source/build/lib/libMIOpen.so.1: undefined symbol: LLVMInitializeX86TargetInfo
CMake Error at /nix/store/17r6ld906midfv8y7997fd56s7a87vrg-cmake-3.28.3/share/cmake-3.28/Modules/GoogleTestAddTests.cmake:112 (message):
  Error running test executable.

    Path: '/build/source/build/bin/test_tuna_net'
    Result: 127
    Output:


Call Stack (most recent call first):
  /nix/store/17r6ld906midfv8y7997fd56s7a87vrg-cmake-3.28.3/share/cmake-3.28/Modules/GoogleTestAddTests.cmake:225 (gtest_discover_tests_impl)


make[3]: *** [test/gtest/CMakeFiles/test_tuna_net.dir/build.make:140: bin/test_tuna_net] Error 1
make[3]: *** Deleting file 'bin/test_tuna_net'
make[2]: *** [CMakeFiles/Makefile2:16810: test/gtest/CMakeFiles/test_tuna_net.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs....

@dwf dwf changed the title rocmPackages_[56].miopen tests fail to build, otherwise broken rocmPackages_[56].miopen tests fail to build, rocmPackages_6.miopen is broken Mar 29, 2024
@dwf dwf changed the title rocmPackages_[56].miopen tests fail to build, rocmPackages_6.miopen is broken rocmPackages_[56].miopen tests fail to build, rocmPackages_6.miopen is broken Mar 29, 2024
@dwf
Copy link
Contributor Author

dwf commented Mar 29, 2024

I've sent a PR addressing the test build issue.

For the LLVM issue with miopen 6.0.2, I'm basically out of ideas but https://github.com/dwf/nixpkgs/tree/miopen_wip contains some attempts throwing stuff at the wall, in case someone else wants to take a run at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

No branches or pull requests

1 participant