[BUG] linking failed using custom cc_toolchain with platform_cpu, cannot detect multiplatform cuda library. #265

ZhenshengLee · 2024-08-08T03:25:19Z

brief

NOTE: in the default platform, which is x86_64(k8) toolchain , the compile and linking works.
I wonder if it's a bug or just a misconfiguration during usage of this repo?

environment

bazel: version7.0.2
cctoolchain: //bazel/toolchains/v5l (a custom cc toolchain for cross compile in aarch64, like https://github.com/f0rmiga/gcc-toolchain/blob/main/toolchain/cc_toolchain_config.bzl)

├── toolchains
│   └── v5l
│       ├── BUILD
│       ├── v5l.BUILD
│       └── v5l_cc_toolchain_config.bzl

repro steps

simply compile the basic example with cu_library and report the following errors.
NOTE: in the default platform, which is x86_64(k8) toolchain , the compile works.

cc_binary(
    name = "module_cuda_main",
    srcs = ["tool/module_cuda_main.cpp"],
    includes = ["include"],
    tags = ["tool"],
    visibility = ["//main:__pkg__"],
    deps = [
        ":module_cuda"
    ]

(03:11:14) INFO: Current date is 2024-08-08
(03:11:14) INFO: Analyzed 323 targets (0 packages loaded, 21 targets configured).
(03:11:14) ERROR: /gw_demo/modules/team_demo/module_demo/BUILD:57:15: Linking modules/team_demo/module_demo/module_cuda_main failed: (Exit 1): aarch64-buildroot-linux-gnu-gcc failed: error executing CppLink command (from target //modules/team_demo/module_demo:module_cuda_main) 
  (cd /home/zs/.cache/bazel/_bazel_zs/2c098eac6c684e1fabebb74f5f4483bd/execroot/gaos && \
  exec env - \
    LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/lib:/usr/lib/x86_64-linux-gnu:/opt/rti.com/rti_connext_dds-6.0.1/lib/x64Linux4gcc7.3.0:/opt/ros/humble/opt/rviz_ogre_vendor/lib:/opt/ros/humble/lib/x86_64-linux-gnu:/opt/ros/humble/lib \
    PATH=/usr/local/cuda/bin:/opt/rti.com/rti_connext_dds-6.0.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/ros/humble/bin \
    PWD=/proc/self/cwd \
  external/v5l_cc_toolchain/bin/aarch64-buildroot-linux-gnu-gcc -o bazel-out/aarch64-dbg/bin/modules/team_demo/module_demo/module_cuda_main -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64' -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccudart___Ulib' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccudart___Ulib' -Xlinker -rpath -Xlinker '$ORIGIN/../../../_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccuda___Ulib_Sstubs' -Xlinker -rpath -Xlinker '$ORIGIN/module_cuda_main.runfiles/gaos/_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccuda___Ulib_Sstubs' -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64 -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64 -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64 -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccudart___Ulib -Lbazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@cuda_Uaarch64_Ulinux_S_S_Ccuda___Ulib_Sstubs bazel-out/aarch64-dbg/bin/modules/team_demo/module_demo/_objs/module_cuda_main/module_cuda_main.pic.o bazel-out/aarch64-dbg/bin/modules/team_demo/module_demo/libmodule_cuda.a external/local_cuda/cuda/lib64/libcudadevrt.a -lcudart -l:libcudart.so.11.0 -l:libcudart.so.11.4.409 -lcudart -lcuda -pie -ldl -lpthread -lrt -Wl,-rpath,lib/ -L/drive/drive-linux/lib-target/ -L/drive/drive-linux/filesystem/targetfs/usr/lib/aarch64-linux-gnu/ -Wl,-rpath-link,/drive/drive-linux/filesystem/targetfs/usr/lib/aarch64-linux-gnu/ -Wl,-rpath-link,/drive/drive-linux/lib-target/ -Wl,-rpath-link,/usr/lib/aarch64-linux-gnu -Wl,-rpath-link,/usr/aarch64-linux-gnu -Wl,-rpath-link,/drive/drive-linux/filesystem/targetfs/lib/aarch64-linux-gnu -lgcov -lstdc++ -no-canonical-prefixes)
# Configuration: 93bfd7653555f545157f5fbb9812135069a379b953233ab0eef19c8f88c3340d
# Execution platform: @@local_config_platform//:host
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64/libcudart.so when searching for -lcudart
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64/libcudart.so.11.0 when searching for -l:libcudart.so.11.0
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64/libcudart.so.11.0 when searching for -l:libcudart.so.11.0
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: cannot find -l:libcudart.so.11.0
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64/libcudart.so.11.4.409 when searching for -l:libcudart.so.11.4.409
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.4.409___Ucuda_Slib64/libcudart.so.11.4.409 when searching for -l:libcudart.so.11.4.409
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: cannot find -l:libcudart.so.11.4.409
/drive/toolchains/aarch64--glibc--stable-2022.03-1/bin/../lib/gcc/aarch64-buildroot-linux-gnu/9.3.0/../../../../aarch64-buildroot-linux-gnu/bin/ld: skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so___Ucuda_Slib64/libcudart.so when searching for -lcudart
collect2: error: ld returned 1 exit status
(03:11:14) INFO: Elapsed time: 0.602s, Critical Path: 0.08s
(03:11:14) INFO: 2 processes: 2 internal.
(03:11:14) ERROR: Build did NOT complete successfully

considerations

skipping incompatible bazel-out/aarch64-dbg/bin/_solib_aarch64-buildroot-linux-gnu/_U@@local_Ucuda_S_S_Clibcudart.so.11.0___Ucuda_Slib64/libcudart.so.11.0 when searching for -l:libcudart.so.11.0

which means the cuda libraries is still the lib64 version in usr/local/cuda/lib64.

actually the cuda libraries may be installed in other dirs and may consist multiple arch version. especially in nvidia AGX machines.

CUDA: Should be installed at /usr/local, CUDA for various platforms should be in the target directory of /usr/local/cuda-X

e.g. aarch64-linux CUDA 10.1 should be located at /usr/local/cuda-10.1/targets/aarch64-linux

CUDA-X DL Libs (i.e. TensorRT and cuDNN): Should be located at /usr/local/cuda-X/dl/targets/<PLATFORM>/{include, lib}

Other system dependencies: Dependencies should be located in /usr/local/{include, lib} for x86_64, /usr/aarch64-linux-gnu/ for aarch64-linux and /usr/aarch64-unknown-nto-qnx/aarch64le for aarch64-qnx

https://github.com/NVIDIA/DL4AGX/blob/9a4f60c2847d32e81372b9a2165299a3b65eabf1/CONTRIBUTING.md?plain=1#L201-L205

related info

there is an old version of cuda toolchain config which supports multiplatform_cpu compile in bazel, but the CROSSTOOL is outdated and not available in the latest version of bazel.

https://github.com/NVIDIA/DL4AGX/tree/master

EDIT: there already has an issue talking about resolving multiple version of cuda libraries, but I don't think the issue resolved by design #113

workaround(works)

add the library path manually should compile the binary successfully.

linkopts = [
        "-L/usr/local/cuda/targets/aarch64-linux/lib",
    ],

The text was updated successfully, but these errors were encountered:

ZhenshengLee · 2024-08-08T12:44:02Z

I've found that in the doc page

rules_cuda_dependencies(toolkit_path)
Populate the dependencies for rules_cuda. This will setup workspace dependencies (other bazel rules) and local toolchains.
Name Description Default Value
toolkit_path Optionally specify the path to CUDA toolkit. If not specified, it will be detected automatically.

Is there an example to show how to use it correctly?

cloudhan · 2024-08-08T13:23:53Z

I think you are configuring it correctly. The root problem is the cross compiling is not addressed in this rule at the moment. So exec_compatible_with for tools and target_compatible_with for runtime are assumed to be the same, but they are not enforced so it is workaroundable.

ZhenshengLee · 2024-08-09T01:17:48Z

I think you are configuring it correctly. The root problem is the cross compiling is not addressed in this rule at the moment.

OK, I will keep the issue open.

ZhenshengLee changed the title ~~[BUG] linking failed using custom cc_toolchain with platform_cpu, cannot detect multiplatform cuda library,~~ [BUG] linking failed using custom cc_toolchain with platform_cpu, cannot detect multiplatform cuda library. Aug 27, 2024

cloudhan added the enhancement New feature or request label Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] linking failed using custom cc_toolchain with platform_cpu, cannot detect multiplatform cuda library. #265

[BUG] linking failed using custom cc_toolchain with platform_cpu, cannot detect multiplatform cuda library. #265

ZhenshengLee commented Aug 8, 2024 •

edited

Loading

ZhenshengLee commented Aug 8, 2024

cloudhan commented Aug 8, 2024

ZhenshengLee commented Aug 9, 2024

[BUG] linking failed using custom cc_toolchain with platform_cpu, cannot detect multiplatform cuda library. #265

[BUG] linking failed using custom cc_toolchain with platform_cpu, cannot detect multiplatform cuda library. #265

Comments

ZhenshengLee commented Aug 8, 2024 • edited Loading

brief

environment

repro steps

considerations

related info

workaround(works)

ZhenshengLee commented Aug 8, 2024

cloudhan commented Aug 8, 2024

ZhenshengLee commented Aug 9, 2024

ZhenshengLee commented Aug 8, 2024 •

edited

Loading