Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dockerfile.gpu #6452

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions docker/gpu/dockerfile.gpu
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM nvidia/cuda:8.0-cudnn5-devel
FROM nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04

#################################################################################################################
# Global
Expand Down Expand Up @@ -53,11 +53,12 @@ libboost-dev \
libboost-system-dev \
libboost-filesystem-dev \
gcc \
g++
g++ \
nvidia-driver-550

# Add OpenCL ICD files for LightGBM
# Add OpenCL ICD files for LightGBM (or you can use `find / -name libnvidia-opencl.so.1` command to get the actual path to the file)
RUN mkdir -p /etc/OpenCL/vendors && \
echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
echo "/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd

#################################################################################################################
# CONDA
Expand Down Expand Up @@ -88,7 +89,7 @@ RUN cd /usr/local/src && mkdir lightgbm && cd lightgbm && \

ENV PATH /usr/local/src/lightgbm/LightGBM:${PATH}

RUN /bin/bash -c "source activate py3 && cd /usr/local/src/lightgbm/LightGBM && sh ./build-python.sh install --precompile && source deactivate"
RUN /bin/bash -c "source activate py3 && cd /usr/local/src/lightgbm/LightGBM && sh ./build-python.sh install --gpu && source deactivate"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. lib_lightgbm.so has already been compiled a few lines up (the line running cmake --build build), so --precompile is necessary to build a Python package bundling it in.

Using --gpu makes that previous compilation unnecessary... and will not use the same OpenCL library and headers that was passed there.

This should be reverted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it. I'll try to work with it today if I have spare time and ckeck everything one more time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I turn back to --precompile I got this errors:

[LightGBM] [Warning] Using sparse features with CUDA is currently not supported.
[LightGBM] [Fatal] CUDA Tree Learner was not enabled in this build.
Please recompile with CMake option -DUSE_CUDA=1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That error suggests to me that you're passing {"device": "cuda"} through parameters. That isn't appropriate for this image, where the library hasn't been built with -DUSE_CUDA=1.

In this Dockerfile, lib_lightgbm is being built only with -DUSE_GPU=1, which means you'd need to pass {"device": "gpu"} through params.

Copy link
Author

@NisuSan NisuSan May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried different versions of building like cmake -DUSE_GPU=1 or cmake -DUSE_CUDA=1, and then in the installation command, I also tried all possible variants: sh ./build-python.sh install --gpu, sh ./build-python.sh install --cuda, and sh ./build-python.sh install --precompile as well. I even found your reply on StackOverflow and tried to change some installation steps, but it still didn't work.

The good news is that I fixed the missing files and driver in the Docker image, so now we just need to figure out how to install it properly :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb, @shiyu1994, Today I decide to install it with simple pip command like pip install --no-binary lightgbm --config-settings=cmake.define.USE_CUDA=ON 'lightgbm>=4.0.0' and after run code with device: cuda, I get already known error from this issue. This gave me an idea that promblem with instalation from the sorce can be inside the build-python.sh or cmakelists.txt files. I ask you to get look at this if you can

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult for me to help you because you're reporting error messages but not showing the code you ran the led to them.

This Dockerfile is about the -DUSE_GPU version of LightGBM (OpenCL-based), not the -DUSE_CUDA version (CUDA kernels). Please keep it that way.

Stop passing -DUSE_CUDA or using {"device": "cuda"} with images built from this Dockerfile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NisuSan are you still interested in working on this?

If you don't have the time / interest right now please tell us, so we can close this and someone else can work on fixing this Dockerfile.

#################################################################################################################
# System CleanUp
Expand Down
Loading