-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docker image for icefall #1189
Conversation
Somehow the CUDA version inside the container shows error for me. I am not sure about the difference between teo@s64:~$ nvidia-smi
Thu Jul 27 20:43:08 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:18:00.0 Off | N/A |
| 0% 49C P8 30W / 350W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1758 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
teo@s64:~$ docker run --rm -it k2fsa/icefall:torch1.13.0-cuda11.6 /bin/bash
root@ae8acaf91ea0:/workspace/icefall# nvidia-smi
bash: nvidia-smi: command not found
root@ae8acaf91ea0:/workspace/icefall# exit
teo@s64:~$ docker run --rm -it --gpus=all k2fsa/icefall:torch1.13.0-cuda11.6 /bin/bash
root@c5f8c3bc2883:/workspace/icefall# nvidia-smi
Thu Jul 27 11:43:39 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: ERR! |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:18:00.0 Off | N/A |
| 0% 50C P8 42W / 350W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1758 G 4MiB |
+-----------------------------------------------------------------------------+
root@c5f8c3bc2883:/workspace/icefall# exit
teo@s64:~$ docker run --rm -it --runtime=nvidia k2fsa/icefall:torch1.13.0-cuda11.6 /bin/bash
root@1cb84e2f0eaa:/workspace/icefall# nvidia-smi
Thu Jul 27 11:43:51 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: ERR! |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:18:00.0 Off | N/A |
| 0% 49C P8 30W / 350W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1758 G 4MiB |
+-----------------------------------------------------------------------------+
root@1cb84e2f0eaa:/workspace/icefall# exit
teo@s64:~$ docker run --rm -it --runtime=nvidia --gpus=all k2fsa/icefall:torch1.13.0-cuda11.6 /bin/bash
root@bcc29879157e:/workspace/icefall# nvidia-smi
Thu Jul 27 11:44:02 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: ERR! |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:18:00.0 Off | N/A |
| 0% 49C P8 32W / 350W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1758 G 4MiB |
+-----------------------------------------------------------------------------+
root@bcc29879157e:/workspace/icefall# exit With this error, I tried running a
|
maybe need to run with nvidia-docker instead of docker if planning to use the GPU? |
I tried with There is a size difference between my image and the new image. It is partly because I installed sherpa into my image, but I am not sure if the entire 4.2GB came from sherpa. teo@s64:~$ nvidia-docker run --rm -it k2fsa/icefall:torch1.13.0-cuda11.6 /bin/bash
root@84b25aedbe2d:/workspace/icefall# nvidia-smi
Thu Jul 27 13:17:12 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: ERR! |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:18:00.0 Off | N/A |
| 0% 51C P8 31W / 350W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1758 G 4MiB |
+-----------------------------------------------------------------------------+
root@84b25aedbe2d:/workspace/icefall# exit
teo@s64:~$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
k2fsa/icefall torch1.13.0-cuda11.6 3b0e967ec78a 2 hours ago 12.1GB
icefall latest 3be62493eab5 5 days ago 16.3GB
...
[omitted]
...
teo@s64:~$ nvidia-docker run -it --rm icefall bash
root@2e812284fec7:/workspace/icefall# nvidia-smi
Thu Jul 27 13:17:37 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:18:00.0 Off | N/A |
| 0% 50C P8 30W / 350W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1758 G 4MiB |
+-----------------------------------------------------------------------------+ |
Thanks for testing it. The icefall docker image is based on
Could you test If not, could you try pytorch/pytorch:1.13.0-cuda11.6-cudnn8-devel Also, are there warnings from
|
Yeah, in my environment teo@s64:~$ nvidia-docker run -it --rm pytorch/pytorch:1.13.0-cuda11.6-cudnn8-runtime
Unable to find image 'pytorch/pytorch:1.13.0-cuda11.6-cudnn8-runtime' locally
1.13.0-cuda11.6-cudnn8-runtime: Pulling from pytorch/pytorch
a404e5416296: Already exists
d70bbcbd9fa5: Already exists
2f8d87f6e9b5: Already exists
f0869fc58250: Already exists
Digest: sha256:8711d55e2b5c42f3c070e1f2bacc2d1988c9b3b5b99694abc6691a852536efbe
Status: Downloaded newer image for pytorch/pytorch:1.13.0-cuda11.6-cudnn8-runtime
root@9f5ea5b4a8de:/workspace# nvidia-smi
Thu Jul 27 15:40:50 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:18:00.0 Off | N/A |
| 0% 50C P8 30W / 350W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1758 G 4MiB |
+-----------------------------------------------------------------------------+
root@9f5ea5b4a8de:/workspace# exit
teo@s64:~$ nvidia-docker run -it --rm pytorch/pytorch:1.13.0-cuda11.6-cudnn8-devel
Unable to find image 'pytorch/pytorch:1.13.0-cuda11.6-cudnn8-devel' locally
1.13.0-cuda11.6-cudnn8-devel: Pulling from pytorch/pytorch
a404e5416296: Already exists
c58c079e9b17: Pull complete
e5b80b8bbe91: Pull complete
888240790290: Pull complete
515fe5e34eb4: Pull complete
4e4521f12f5a: Pull complete
f6e1a56cb32d: Pull complete
c29b96e36bd0: Pull complete
304d3c6c28d0: Pull complete
20f82224b265: Pull complete
031e73b7201f: Pull complete
80568f2c07b0: Pull complete
2ae0d162c09b: Pull complete
Digest: sha256:d98a1b1f61166875882e5a3ffa63bdef89c3349ceca1954dda415c5cd67e06a0
Status: Downloaded newer image for pytorch/pytorch:1.13.0-cuda11.6-cudnn8-devel
root@9a7a885b4cc9:/workspace# nvidia-smi
Thu Jul 27 15:45:42 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:18:00.0 Off | N/A |
| 0% 51C P8 42W / 350W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1758 G 4MiB |
+-----------------------------------------------------------------------------+
root@9a7a885b4cc9:/workspace# exit This was returned for teo@s64:~$ nvidia-docker run --rm -it k2fsa/icefall:torch1.13.0-cuda11.6 /bin/bash
root@18070e2baeb4:/workspace/icefall# python -m torch.utils.collect_env
Collecting environment information...
/opt/conda/lib/python3.9/site-packages/torch/cuda/__init__.py:88: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library (Triggered internally at /opt/conda/conda-bld/pytorch_1666643016022/work/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
PyTorch version: 1.13.0
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.27
Python version: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-150-generic-x86_64-with-glibc2.27
Is CUDA available: False
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 510.47.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] k2==1.24.3.dev20230725+cuda11.6.torch1.13.0
[pip3] kaldifeat==1.25.0.dev20230726+cuda11.6.torch1.13.0
[pip3] numpy==1.22.3
[pip3] torch==1.13.0
[pip3] torchaudio==0.13.0+cu116
[pip3] torchtext==0.14.0
[pip3] torchvision==0.14.0
[conda] blas 1.0 mkl
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] k2 1.24.3.dev20230725+cuda11.6.torch1.13.0 pypi_0 pypi
[conda] kaldifeat 1.25.0.dev20230726+cuda11.6.torch1.13.0 pypi_0 pypi
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7f8727e_0
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] numpy 1.22.3 py39he7a7128_0
[conda] numpy-base 1.22.3 py39hf524024_0
[conda] pytorch 1.13.0 py3.9_cuda11.6_cudnn8.3.2_0 pytorch
[conda] pytorch-cuda 11.6 h867d48c_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 0.13.0+cu116 pypi_0 pypi
[conda] torchtext 0.14.0 py39 pytorch
[conda] torchvision 0.14.0 py39_cu116 pytorch |
Could you remove the following line from the icefall docker image
That line is added to fix a GitHub action test error, saying that I think Note: You don't need to re-build the icefall docker image. Just start the container and delete that file from the container and re-run |
You are right! I deleted root@d8cf0e860304:~# python -m k2.version
Collecting environment information...
k2 version: 1.24.3
Build type: Release
Git SHA1: 4c05309499a08454997adf500b56dcc629e35ae5
Git date: Tue Jul 25 16:23:36 2023
Cuda used to build k2: 11.6
cuDNN used to build k2: 8.3.2
Python version used to build k2: 3.9
OS used to build k2: CentOS Linux release 7.9.2009 (Core)
CMake version: 3.27.0
GCC version: 9.3.1
CMAKE_CUDA_FLAGS: -Wno-deprecated-gpu-targets -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_35,code=sm_35 -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_50,code=sm_50 -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_60,code=sm_60 -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_61,code=sm_61 -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_70,code=sm_70 -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_75,code=sm_75 -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_80,code=sm_80 -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w --expt-extended-lambda -gencode arch=compute_86,code=sm_86 -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-strict-overflow --compiler-options -Wno-unknown-pragmas
CMAKE_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-unused-variable -Wno-strict-overflow
PyTorch version used to build k2: 1.13.0+cu116
PyTorch is using Cuda: 11.6
NVTX enabled: True
With CUDA: True
Disable debug: True
Sync kernels : False
Disable checks: False
Max cpu memory allocate: 214748364800 bytes (or 200.0 GB)
k2 abort: False
__file__: /opt/conda/lib/python3.9/site-packages/k2/version/version.py
_k2.__file__: /opt/conda/lib/python3.9/site-packages/_k2.cpython-39-x86_64-linux-gnu.so I also ran a |
Thank you for testing it. I have fixed the dockerfile and everything should work as expected. |
Sorry to reuse this PR, but just a quick note that there is a typo in the link. |
Thanks! Just fixed it. |
Usage
Would be great if someone can test it.