Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci_gpu docker image build #94

Closed
YukeWang96 opened this issue Mar 10, 2023 · 7 comments
Closed

ci_gpu docker image build #94

YukeWang96 opened this issue Mar 10, 2023 · 7 comments

Comments

@YukeWang96
Copy link

It seems that ci_gpu docker cannot build, it runs into errors like

 => CACHED [21/67] COPY install/ubuntu_install_sphinx.sh /install/ubuntu_install_sphinx.sh                                                    0.0s
 => CACHED [22/67] RUN bash /install/ubuntu_install_sphinx.sh                                                                                 0.0s
 => CACHED [23/67] RUN apt-get update && apt-install-and-clear -y doxygen libprotobuf-dev protobuf-compiler                                   0.0s
 => CACHED [24/67] COPY install/ubuntu_install_java.sh /install/ubuntu_install_java.sh                                                        0.0s
 => CACHED [25/67] RUN bash /install/ubuntu_install_java.sh                                                                                   0.0s
 => CACHED [26/67] COPY install/ubuntu_install_nodejs.sh /install/ubuntu_install_nodejs.sh                                                    0.0s
 => CACHED [27/67] RUN bash /install/ubuntu_install_nodejs.sh                                                                                 0.0s
 => CACHED [28/67] COPY install/ubuntu_install_rocm.sh /install/ubuntu_install_rocm.sh                                                        0.0s
 => CACHED [29/67] RUN bash /install/ubuntu_install_rocm.sh                                                                                   0.0s
 => CACHED [30/67] COPY install/ubuntu_install_mxnet.sh /install/ubuntu_install_mxnet.sh                                                      0.0s
 => CACHED [31/67] RUN bash /install/ubuntu_install_mxnet.sh                                                                                  0.0s
 => CACHED [32/67] COPY install/ubuntu_install_gluoncv.sh /install/ubuntu_install_gluoncv.sh                                                  0.0s
 => CACHED [33/67] RUN bash /install/ubuntu_install_gluoncv.sh                                                                                0.0s
 => CACHED [34/67] COPY install/ubuntu_install_coreml.sh /install/ubuntu_install_coreml.sh                                                    0.0s
 => CACHED [35/67] RUN bash /install/ubuntu_install_coreml.sh                                                                                 0.0s
 => CACHED [36/67] COPY install/ubuntu_install_tensorflow.sh /install/ubuntu_install_tensorflow.sh                                            0.0s
 => CACHED [37/67] RUN bash /install/ubuntu_install_tensorflow.sh                                                                             0.0s
 => CACHED [38/67] COPY install/ubuntu_install_darknet.sh /install/ubuntu_install_darknet.sh                                                  0.0s
 => CACHED [39/67] RUN bash /install/ubuntu_install_darknet.sh                                                                                0.0s
 => CACHED [40/67] COPY install/ubuntu_install_onnx.sh /install/ubuntu_install_onnx.sh                                                        0.0s
 => CACHED [41/67] RUN bash /install/ubuntu_install_onnx.sh                                                                                   0.0s
 => ERROR [42/67] COPY install/ubuntu_install_libtorch.sh /install/ubuntu_install_libtorch.sh                                                 0.0s
------
 > [ 2/67] COPY utils/apt-install-and-clear.sh /usr/local/bin/apt-install-and-clear:
------
------
 > [ 8/67] COPY install/ubuntu_install_googletest.sh /install/ubuntu_install_googletest.sh:
------
------
 > [42/67] COPY install/ubuntu_install_libtorch.sh /install/ubuntu_install_libtorch.sh:
------
Dockerfile.ci_gpu:90
--------------------
  88 |     RUN bash /install/ubuntu_install_onnx.sh
  89 |     
  90 | >>> COPY install/ubuntu_install_libtorch.sh /install/ubuntu_install_libtorch.sh
  91 |     RUN bash /install/ubuntu_install_libtorch.sh
  92 |     
--------------------
ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref moby::h1q96mex7f0l3at58zfksx485: "/install/ubuntu_install_libtorch.sh": not found
ERROR: docker build failed.
@yzh119
Copy link
Member

yzh119 commented Mar 11, 2023

We are not using .ci_gpu for SparseTIR, our CI depends on this file: https://github.com/uwsampl/SparseTIR/blob/ca59cbe4e81de959b798f4e5cdd9c5ab6ea7f801/docker/Dockerfile.ci_sparsetir_gpu

@YukeWang96
Copy link
Author

Just try the ci_sparsetir_gpu, it runs into errors like

------
 > [21/27] COPY docker/install/ubuntu_install_rat.sh /install/ubuntu_install_rat.sh:
------
Dockerfile.ci_sparsetir_gpu:57
--------------------
  55 |     RUN bash /install/ubuntu_install_torch.sh
  56 |     
  57 | >>> COPY docker/install/ubuntu_install_rat.sh /install/ubuntu_install_rat.sh
  58 |     RUN bash /install/ubuntu_install_rat.sh
  59 |     
--------------------
ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref moby::zgrsg9ttv2xlo576h717i052s: "/docker/install/ubuntu_install_rat.sh": not found
ERROR: docker build failed.

@yzh119
Copy link
Member

yzh119 commented Mar 11, 2023

Are you building the docker image in the project root? The Dockerfile.ci_sparsetir_gpu serves for CI and we build it at the project root directory in our CI configuration.

@YukeWang96
Copy link
Author

Got it, I previously follow the instruction on README.md using docker/build.sh ci_sparsetir_gpu and it reports the above error. It works now by using

docker build \
  . \
  --file docker/Dockerfile.ci_sparsetir_gpu \
  --tag ${{ steps.generate-tag.outputs.tag }}

Thanks!

@YukeWang96
Copy link
Author

Just encounter another problem when buildling the image.

#0 3.123 CMake Error at /usr/share/cmake-3.16/Modules/ExternalProject.cmake:2630 (message):
#0 3.123   No download info given for 'project_libbacktrace' and its source directory:
#0 3.123 
#0 3.123    /root/sparsetir/cmake/libs/../../3rdparty/libbacktrace
#0 3.123 
#0 3.123   is not an existing non-empty directory.  Please specify one of:
#0 3.123 
#0 3.123    * SOURCE_DIR with an existing non-empty directory
#0 3.123    * DOWNLOAD_COMMAND
#0 3.123    * URL
#0 3.123    * GIT_REPOSITORY
#0 3.123    * SVN_REPOSITORY
#0 3.123    * HG_REPOSITORY
#0 3.123    * CVS_REPOSITORY and CVS_MODULE
#0 3.123 Call Stack (most recent call first):
#0 3.123   /usr/share/cmake-3.16/Modules/ExternalProject.cmake:3236 (_ep_add_download_command)
#0 3.123   cmake/libs/Libbacktrace.cmake:36 (ExternalProject_Add)
#0 3.123   cmake/modules/Logging.cmake:41 (include)
#0 3.123   CMakeLists.txt:547 (include)
#0 3.123 
#0 3.123 
#0 3.135 -- Building with TVM Map...
#0 3.135 -- Build with thread support...
#0 3.136 -- Check if compiler accepts -pthread
#0 3.303 -- Check if compiler accepts -pthread - yes
#0 3.327 -- Configuring incomplete, errors occurred!
#0 3.327 See also "/root/sparsetir/build/CMakeFiles/CMakeOutput.log".
#0 3.327 See also "/root/sparsetir/build/CMakeFiles/CMakeError.log".
#0 3.342 make: *** No targets specified and no makefile found.  Stop.
#0 4.028 Obtaining file:///root/sparsetir/python
#0 4.028   Preparing metadata (setup.py): started
#0 4.254   Preparing metadata (setup.py): finished with status 'error'
#0 4.282   error: subprocess-exited-with-error
#0 4.282   
#0 4.282   × python setup.py egg_info did not run successfully.
#0 4.282   │ exit code: 1
#0 4.282   ╰─> [39 lines of output]
#0 4.282       Traceback (most recent call last):
#0 4.282         File "<string>", line 2, in <module>
#0 4.282         File "<pip-setuptools-caller>", line 34, in <module>
#0 4.282         File "/root/sparsetir/python/setup.py", line 100, in <module>
#0 4.282           LIB_LIST, __version__ = get_lib_path()
#0 4.282         File "/root/sparsetir/python/setup.py", line 51, in get_lib_path
#0 4.282           lib_path = libinfo["find_lib_path"]()
#0 4.282         File "/root/sparsetir/python/./tvm/_ffi/libinfo.py", line 146, in find_lib_path
#0 4.282           raise RuntimeError(message)
#0 4.282       RuntimeError: Cannot find libraries: ['libtvm.so', 'libtvm_runtime.so']
#0 4.282       List of candidates:
#0 4.282       /usr/lib/x86_64-linux-gnu/libtvm.so
#0 4.282       /usr/local/cuda-11.6/targets/x86_64-linux/lib/libtvm.so
#0 4.282       /usr/local/cuda-11.6/compat/libtvm.so
#0 4.282       /usr/local/cuda-11.6/bin/libtvm.so
#0 4.282       /usr/local/cuda-11.6/bin/libtvm.so
#0 4.282       /usr/local/sbin/libtvm.so
#0 4.282       /usr/local/bin/libtvm.so
#0 4.282       /usr/sbin/libtvm.so
#0 4.282       /usr/bin/libtvm.so
#0 4.282       /usr/sbin/libtvm.so
#0 4.282       /usr/bin/libtvm.so
#0 4.282       /root/sparsetir/python/tvm/libtvm.so
#0 4.282       /root/sparsetir/build/libtvm.so
#0 4.282       /root/libtvm.so
#0 4.282       /usr/lib/x86_64-linux-gnu/libtvm_runtime.so
#0 4.282       /usr/local/cuda-11.6/targets/x86_64-linux/lib/libtvm_runtime.so
#0 4.282       /usr/local/cuda-11.6/compat/libtvm_runtime.so
#0 4.282       /usr/local/cuda-11.6/bin/libtvm_runtime.so
#0 4.282       /usr/local/cuda-11.6/bin/libtvm_runtime.so
#0 4.282       /usr/local/sbin/libtvm_runtime.so
#0 4.282       /usr/local/bin/libtvm_runtime.so
#0 4.282       /usr/sbin/libtvm_runtime.so
#0 4.282       /usr/bin/libtvm_runtime.so
#0 4.282       /usr/sbin/libtvm_runtime.so
#0 4.282       /usr/bin/libtvm_runtime.so
#0 4.282       /root/sparsetir/python/tvm/libtvm_runtime.so
#0 4.282       /root/sparsetir/build/libtvm_runtime.so
#0 4.282       /root/libtvm_runtime.so
#0 4.282       [end of output]
#0 4.282   
#0 4.282   note: This error originates from a subprocess, and is likely not a problem with pip.
#0 4.286 error: metadata-generation-failed
#0 4.286 
#0 4.286 × Encountered error while generating package metadata.
#0 4.286 ╰─> See above for output.
#0 4.286 
#0 4.286 note: This is an issue with the package mentioned above, not pip.
#0 4.286 hint: See above for details.
#0 4.431 WARNING: You are using pip version 22.0.4; however, version 23.0.1 is available.
#0 4.431 You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
------
Dockerfile.ci_sparsetir_gpu:77
--------------------
  75 |     WORKDIR /root/sparsetir
  76 |     ADD . .
  77 | >>> RUN bash docker/install/install_sparsetir_gpu.sh
  78 |     
  79 |     # Install dependencies required by lint
--------------------
ERROR: failed to solve: process "/bin/sh -c bash docker/install/install_sparsetir_gpu.sh" did not complete successfully: exit code: 1

@YukeWang96 YukeWang96 reopened this Mar 13, 2023
@yzh119
Copy link
Member

yzh119 commented Mar 14, 2023

It seems that submodules are not cloned correctly.
You can either re-clone the repo via git clone [email protected]:uwsampl/SparseTIR.git --recursive
or update submodules via git submodule update --init --recursive

btw ${{ steps.generate-tag.outputs.tag }} was a GitHub workflow syntax and should not be used outside the workflow yaml file, you can use whatever name you want for the docker tag.

@YukeWang96
Copy link
Author

Thanks! @yzh119 The docker image can be successfully built now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants