Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] linking failed using custom cc_toolchain with platform_cpu, cannot detect multiplatform cuda library. #265

Open
ZhenshengLee opened this issue Aug 8, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@ZhenshengLee
Copy link

ZhenshengLee commented Aug 8, 2024

brief

NOTE: in the default platform, which is x86_64(k8) toolchain , the compile and linking works.
I wonder if it's a bug or just a misconfiguration during usage of this repo?

environment

bazel: version7.0.2
cctoolchain: //bazel/toolchains/v5l (a custom cc toolchain for cross compile in aarch64, like https://github.com/f0rmiga/gcc-toolchain/blob/main/toolchain/cc_toolchain_config.bzl)

├── toolchains
│   └── v5l
│       ├── BUILD
│       ├── v5l.BUILD
│       └── v5l_cc_toolchain_config.bzl

repro steps

simply compile the basic example with cu_library and report the following errors.
NOTE: in the default platform, which is x86_64(k8) toolchain , the compile works.

cc_binary(
    name = "module_cuda_main",
    srcs = ["tool/module_cuda_main.cpp"],
    includes = ["include"],
    tags = ["tool"],
    visibility = ["//main:__pkg__"],
    deps = [
        ":module_cuda"
    ]
(03:11:14) INFO: Current date is 2024-08-08
(03:11:14) INFO: Analyzed 323 targets (0 packages loaded, 21 targets configured).
(03:11:14) ERROR: /gw_demo/modules/team_demo/module_demo/BUILD:57:15: Linking modules/team_demo/module_demo/module_cuda_main failed: (Exit 1): aarch64-buildroot-linux-gnu-gcc failed: error executing CppLink command (from target //modules/team_demo/module_demo:module_cuda_main) 
  (cd /home/zs/.cache/bazel/_bazel_zs/2c098eac6c684e1fabebb74f5f4483bd/execroot/gaos && \
  exec env - \
    LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/lib:/usr/lib/x86_64-linux-gnu:/opt/rti.com/rti_connext_dds-6.0.1/lib/x64Linux4gcc7.3.0:/opt/ros/humble/opt/rviz_ogre_vendor/lib:/opt/ros/humble/lib/x86_64-linux-gnu:/opt/ros/humble/lib \
    PATH=/usr/local/cuda/bin:/opt/rti.com/rti_connext_dds-6.0.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/ros/humble/bin \
    PWD=/proc/self/cwd \
  external/v5l_cc_toolchain/bin/aarch64-buildroot-linux-gnu-gcc -o bazel-out/aarch64-dbg/bin/modules/team_demo/module_demo/module_cuda_main -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccudart___Ulib' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccudart___Ulib' -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccuda___Ulib_Sstubs' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccuda___Ulib_Sstubs' -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64 -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64 -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64 -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccudart___Ulib -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccuda___Ulib_Sstubs bazel-out/aarch64-dbg/bin/modules/team_demo/module_demo/_objs/module_cuda_main/module_cuda_main.pic.o bazel-out/aarch64-dbg/bin/modules/team_demo/module_demo/libmodule_cuda.a external/local_cuda/cuda/lib64/libcudadevrt.a -lcudart -l:libcudart.so.11.0 -l:libcudart.so.11.4.409 -lcudart -lcuda -pie -ldl -lpthread -lrt -Wl,-rpath,lib/ -L/drive/drive-linux/lib-target/ -L/drive/drive-linux/filesystem/targetfs/usr/lib/aarch64-linux-gnu/ -Wl,-rpath-link,/drive/drive-linux/filesystem/targetfs/usr/lib/aarch64-linux-gnu/ -Wl,-rpath-link,/drive/drive-linux/lib-target/ -Wl,-rpath-link,/usr/lib/aarch64-linux-gnu -Wl,-rpath-link,/usr/aarch64-linux-gnu -Wl,-rpath-link,/drive/drive-linux/filesystem/targetfs/lib/aarch64-linux-gnu -lgcov -lstdc++ -no-canonical-prefixes)
# Configuration: 93bfd7653555f545157f5fbb9812135069a379b953233ab0eef19c8f88c3340d
# Execution platform: @@local_config_platform//:host
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64/libcudart.so when searching for -lcudart
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64/libcudart.so.11.0 when searching for -l:libcudart.so.11.0
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64/libcudart.so.11.0 when searching for -l:libcudart.so.11.0
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: cannot find -l:libcudart.so.11.0
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64/libcudart.so.11.4.409 when searching for -l:libcudart.so.11.4.409
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64/libcudart.so.11.4.409 when searching for -l:libcudart.so.11.4.409
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: cannot find -l:libcudart.so.11.4.409
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64/libcudart.so when searching for -lcudart
collect2: error: ld returned 1 exit status
(03:11:14) INFO: Elapsed time: 0.602s, Critical Path: 0.08s
(03:11:14) INFO: 2 processes: 2 internal.
(03:11:14) ERROR: Build did NOT complete successfully

considerations

skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64/libcudart.so.11.0 when searching for -l:libcudart.so.11.0

which means the cuda libraries is still the lib64 version in usr/local/cuda/lib64.

actually the cuda libraries may be installed in other dirs and may consist multiple arch version. especially in nvidia AGX machines.

  • CUDA: Should be installed at /usr/local, CUDA for various platforms should be in the target directory of /usr/local/cuda-X
  • e.g. aarch64-linux CUDA 10.1 should be located at /usr/local/cuda-10.1/targets/aarch64-linux
  • CUDA-X DL Libs (i.e. TensorRT and cuDNN): Should be located at /usr/local/cuda-X/dl/targets/<PLATFORM>/{include, lib}
  • Other system dependencies: Dependencies should be located in /usr/local/{include, lib} for x86_64, /usr/aarch64-linux-gnu/ for aarch64-linux and /usr/aarch64-unknown-nto-qnx/aarch64le for aarch64-qnx

https://github.com/NVIDIA/DL4AGX/blob/9a4f60c2847d32e81372b9a2165299a3b65eabf1/CONTRIBUTING.md?plain=1#L201-L205

related info

there is an old version of cuda toolchain config which supports multiplatform_cpu compile in bazel, but the CROSSTOOL is outdated and not available in the latest version of bazel.

https://github.com/NVIDIA/DL4AGX/tree/master

EDIT: there already has an issue talking about resolving multiple version of cuda libraries, but I don't think the issue resolved by design #113

workaround(works)

add the library path manually should compile the binary successfully.

linkopts = [
        "-L/usr/local/cuda/targets/aarch64-linux/lib",
    ],
@ZhenshengLee
Copy link
Author

I've found that in the doc page

rules_cuda_dependencies(toolkit_path)
Populate the dependencies for rules_cuda. This will setup workspace dependencies (other bazel rules) and local toolchains.
Name Description Default Value
toolkit_path Optionally specify the path to CUDA toolkit. If not specified, it will be detected automatically.

Is there an example to show how to use it correctly?

@cloudhan
Copy link
Collaborator

cloudhan commented Aug 8, 2024

I think you are configuring it correctly. The root problem is the cross compiling is not addressed in this rule at the moment. So exec_compatible_with for tools and target_compatible_with for runtime are assumed to be the same, but they are not enforced so it is workaroundable.

@ZhenshengLee
Copy link
Author

I think you are configuring it correctly. The root problem is the cross compiling is not addressed in this rule at the moment.

OK, I will keep the issue open.

@ZhenshengLee ZhenshengLee changed the title [BUG] linking failed using custom cc_toolchain with platform_cpu, cannot detect multiplatform cuda library, [BUG] linking failed using custom cc_toolchain with platform_cpu, cannot detect multiplatform cuda library. Aug 27, 2024
@cloudhan cloudhan added the enhancement New feature or request label Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants