Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: build libcugraph/pylibcugraph error #4056

Closed
2 tasks done
leo4183 opened this issue Dec 10, 2023 · 5 comments
Closed
2 tasks done

[BUG]: build libcugraph/pylibcugraph error #4056

leo4183 opened this issue Dec 10, 2023 · 5 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@leo4183
Copy link

leo4183 commented Dec 10, 2023

Version

23.12.00

Which installation method(s) does this occur on?

Pip, Source

Describe the bug.

cannot build libcugraph/pylibcugraph from source (nor pip)

several observations:

  • on sm_89 machine (eg. 4090), building libcugraph from source simply returns "ptxas error: Value of threads per SM for entry xxxxxx is out of range" (legacy release such as 22.12.00 would mis-identify sm_8x cards as sm_5x)
  • on sm_80 machine (eg. A100), building from source returns error missing /usr/include/gdeflate. this might be a bug on Redhat/Centos/Rocky machine given that dnf/yum wont have nvcomp installed (repository only contains nvidia-driver, cuda-1x-x and etc)
  • by manually adding nvcomp (including gdeflate) to the system dev environment, the gdeflate missing complain was gone. however the libcugraph building process doesnt seem to successfully have those dependencies installed (check the log output below. basically the cmake configure file couldnt locate those dependencies.)

ps: no $CONDA_PREFIX, $PREFIX, $CUGRAPH_HOME or any related env parameters defined under the user session

Minimum reproducible example

./build.sh --without_cugraphops

pip install cugraph-cu12 --extra-index-url=https://pypi.ngc.nvidia.com
(or pip install cugraph-cu11 --extra-index-url=https://pypi.nvidia.com)

Relevant log output

CMake Error at /usr/lib64/rapids/cmake/libcudacxx/libcudacxx-config-version.cmake:4 (file):
  file failed to open for reading (No such file or directory):

  /dev_admin/Downloads/cugraph/python/pylibcugraph/_libcudacxx_VERSION_INCLUDE_DIR-NOTFOUND/cuda/std/detail/__config
Call Stack (most recent call first):
  /usr/lib64/cmake/rmm/rmm-dependencies.cmake:29 (find_package)
  /usr/lib64/cmake/rmm/rmm-config.cmake:74 (include)
  /usr/lib64/cmake/cugraph/cugraph-dependencies.cmake:21 (find_package)
  /usr/lib64/cmake/cugraph/cugraph-config.cmake:72 (include)
  CMakeLists.txt:50 (find_package)    
      
CMake Error at /usr/lib64/rapids/cmake/libcudacxx/libcudacxx-config-version.cmake:14 (math):
  math cannot parse the expression: " / 1000000": syntax error, unexpected
  exp_DIVIDE (2).
Call Stack (most recent call first):
  /usr/lib64/cmake/rmm/rmm-dependencies.cmake:29 (find_package)
  /usr/lib64/cmake/rmm/rmm-config.cmake:74 (include)
  /usr/lib64/cmake/cugraph/cugraph-dependencies.cmake:21 (find_package)
  /usr/lib64/cmake/cugraph/cugraph-config.cmake:72 (include)
  CMakeLists.txt:50 (find_package)

......

CMake Error at /usr/lib64/rapids/cmake/libcudacxx/libcudacxx-config-version.cmake:15 (math):
  math cannot parse the expression: "( / 1000) % 1000": syntax error,
  unexpected exp_DIVIDE (3).
Call Stack (most recent call first):
  /usr/lib64/cmake/cuco/cuco-dependencies.cmake:22 (find_package)
  /usr/lib64/cmake/cuco/cuco-config.cmake:78 (include)
  /usr/local/cmake-3.28.0/share/cmake-3.28/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
  /usr/lib64/cmake/cugraph/cugraph-dependencies.cmake:25 (find_dependency)
  /usr/lib64/cmake/cugraph/cugraph-config.cmake:72 (include)
  CMakeLists.txt:50 (find_package)


CMake Error at /usr/lib64/rapids/cmake/libcudacxx/libcudacxx-config-version.cmake:16 (math):
  math cannot parse the expression: " % 1000": syntax error, unexpected
  exp_MOD (2).
Call Stack (most recent call first):
  /usr/lib64/cmake/cuco/cuco-dependencies.cmake:22 (find_package)
  /usr/lib64/cmake/cuco/cuco-config.cmake:78 (include)
  /usr/local/cmake-3.28.0/share/cmake-3.28/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
  /usr/lib64/cmake/cugraph/cugraph-dependencies.cmake:25 (find_dependency)
  /usr/lib64/cmake/cugraph/cugraph-config.cmake:72 (include)
  CMakeLists.txt:50 (find_package)

......

-- Found cuco: /usr/lib64/cmake/cuco/cuco-config.cmake (found version "0.0.1")
CMake Error at /usr/local/cmake-3.28.0/share/cmake-3.28/Modules/CMakeFindDependencyMacro.cmake:76 (find_package):
  By not providing "Findraft.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "raft", but
  CMake did not find one.

  Could not find a package configuration file provided by "raft" with any of
  the following names:

    raftConfig.cmake
    raft-config.cmake

  Add the installation prefix of "raft" to CMAKE_PREFIX_PATH or set
  "raft_DIR" to a directory containing one of the above files.  If "raft"
  provides a separate development package or SDK, be sure it has been
  installed.
Call Stack (most recent call first):
  /usr/lib64/cmake/cugraph/cugraph-dependencies.cmake:28 (find_dependency)
  /usr/lib64/cmake/cugraph/cugraph-config.cmake:72 (include)
  CMakeLists.txt:50 (find_package)


-- Configuring incomplete, errors occurred!
Traceback (most recent call last):
  File "/usr/local/Python-3.11.6/lib/python3.11/site-packages/skbuild/setuptools_wrap.py", line 666, in setup
    env = cmkr.configure(
          ^^^^^^^^^^^^^^^
  File "/usr/local/Python-3.11.6/lib/python3.11/site-packages/skbuild/cmaker.py", line 357, in configure
    raise SKBuildError(msg)

An error occurred while configuring with CMake.
  Command:
    /usr/local/bin/cmake /dev_admin/Downloads/cugraph/python/pylibcugraph -G Ninja --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/dev_admin/Downloads/cugraph/python/pylibcugraph/_skbuild/linux-x86_64-3.11/cmake-install -DPYTHON_VERSION_STRING:STRING=3.11.6 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/usr/local/Python-3.11.6/lib/python3.11/site-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/usr/local/bin/python -DPYTHON_INCLUDE_DIR:PATH=/usr/local/Python-3.11.6/include/python3.11 -DPYTHON_LIBRARY:PATH=/usr/local/Python-3.11.6/lib/libpython3.11.a -DPython_EXECUTABLE:PATH=/usr/local/bin/python -DPython_ROOT_DIR:PATH=/usr/local/Python-3.11.6 -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/usr/local/Python-3.11.6/include/python3.11 -DPython_NumPy_INCLUDE_DIRS:PATH=/usr/local/Python-3.11.6/lib/python3.11/site-packages/numpy/core/include -DPython3_EXECUTABLE:PATH=/usr/local/bin/python -DPython3_ROOT_DIR:PATH=/usr/local/Python-3.11.6 -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/usr/local/Python-3.11.6/include/python3.11 -DPython3_NumPy_INCLUDE_DIRS:PATH=/usr/local/Python-3.11.6/lib/python3.11/site-packages/numpy/core/include -DCMAKE_BUILD_TYPE:STRING=Release -DFIND_CUGRAPH_CPP=ON -DUSE_CUGRAPH_OPS=OFF
  Source directory:
    /dev_admin/Downloads/cugraph/python/pylibcugraph
  Working directory:
    /dev_admin/Downloads/cugraph/python/pylibcugraph/_skbuild/linux-x86_64-3.11/cmake-build
Please see CMake's output for more information.

[end of output]
  
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pylibcugraph
Failed to build pylibcugraph
ERROR: Could not build wheels for pylibcugraph, which is required to install pyproject.toml-based projects

Environment details

***OS Information***
NAME="Rocky Linux"
VERSION="9.3 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.3"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.3 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.3"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.3"
Rocky Linux release 9.3 (Blue Onyx)

***CMake***
/usr/local/bin/cmake
cmake version 3.28.0

CMake suite maintained and supported by Kitware (kitware.com/cmake).

***g++***
/usr/bin/g++
g++ (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


***nvcc***
/usr/local/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Nov__3_17:16:49_PDT_2023
Cuda compilation tools, release 12.3, V12.3.103
Build cuda_12.3.r12.3/compiler.33492891_0

***Python***
/usr/local/bin/python
Python 3.11.6

***Environment Variables***
PATH                            : /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
LD_LIBRARY_PATH                 : /usr/local/Python-3.11.6/lib/python3.11/site-packages/nvidia/cudnn/lib
NUMBAPRO_NVVM                   :
NUMBAPRO_LIBDEVICE              :
CONDA_PREFIX                    :
PYTHON_PATH                     :

conda not found
***pip packages***
/usr/local/bin/pip
Package                           Version
--------------------------------- ------------
......
Cython                            3.0.6
......
numpy                             1.25.2
......
nvidia-cublas-cu12                12.1.3.1
nvidia-cuda-cupti-cu12            12.1.105
nvidia-cuda-nvcc-cu12             12.2.140
nvidia-cuda-nvrtc-cu12            12.1.105
nvidia-cuda-runtime-cu12          12.1.105
nvidia-cudnn-cu12                 8.9.2.26
nvidia-cufft-cu12                 11.0.2.54
nvidia-curand-cu12                10.3.2.106
nvidia-cusolver-cu12              11.4.5.107
nvidia-cusparse-cu12              12.1.0.106
nvidia-nccl-cu12                  2.18.1
nvidia-nvjitlink-cu12             12.2.140
nvidia-nvtx-cu12                  12.1.105
......
scikit-build                      0.17.6
......

Other/Misc.

No response

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@leo4183 leo4183 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Dec 10, 2023
@leo4183 leo4183 changed the title [BUG]: build pylibcugraph error [BUG]: build libcugraph/pylibcugraph error Dec 10, 2023
@BradReesWork BradReesWork removed the ? - Needs Triage Need team to review and classify label Dec 12, 2023
@BradReesWork BradReesWork added this to the 24.02 milestone Dec 12, 2023
@rlratzel
Copy link
Contributor

Hi @leo4183 , sorry for the delay in responding.

I suspect some of the missing files cmake is reporting are normally provided by rapids-cmake. Do you know if that's being installed? I believe it gets installed via cmake's Fetch Content, as described here.

@robertmaynard may be able to provide more information about that as well as the other observations you made above.

@robertmaynard
Copy link
Contributor

@vyasr Have you seen these issues before with wheel builds? Are our wheels builds ever done on non Debain/Ubuntu machines?

@vyasr
Copy link
Contributor

vyasr commented Apr 23, 2024

We build wheels on Rockylinux 8, but that's not materially different here I don't think.

I'm a bit confused by some of the output above. Are you running build.sh, and then running the pip install command? If so, is the build.sh command completing successfully? You say that libcugraph didn't build successfully for you in a couple of scenarios, is the output of the above what you get if libcugraph does build but then you try to build pylibcugraph with pip?

I notice that you have a number of nvidia wheels installed. Do you also have the CTK installed on your system, or just the compiler? Your LD_LIBRARY_PATH is pointing into a cudnn wheel directory rather than a system library, is that coming from an installation of PyTorch or something else?

@rlratzel
Copy link
Contributor

@leo4183 I'll close this issue, but please re-open or file a new issue if you're still having problems.

@leo4183
Copy link
Author

leo4183 commented May 17, 2024

We build wheels on Rockylinux 8, but that's not materially different here I don't think.

I'm a bit confused by some of the output above. Are you running build.sh, and then running the pip install command? If so, is the build.sh command completing successfully? You say that libcugraph didn't build successfully for you in a couple of scenarios, is the output of the above what you get if libcugraph does build but then you try to build pylibcugraph with pip?

I notice that you have a number of nvidia wheels installed. Do you also have the CTK installed on your system, or just the compiler? Your LD_LIBRARY_PATH is pointing into a cudnn wheel directory rather than a system library, is that coming from an installation of PyTorch or something else?

sorry for the confusing. I tried both build.sh and pip build. both returned same error mentioned above.

I do have a customized LD_LIBRARY_PATH which points to some pip build cudnn lib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants