Replies: 11 comments 4 replies
-
How did you install PyTorch? If you used Conda, can you try using pip [ref]? pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116 |
Beta Was this translation helpful? Give feedback.
-
What should I do if my the newest version of cuda my computer supports is 11.2? |
Beta Was this translation helpful? Give feedback.
-
If your hardware supports CUDA 11.2, it should support 11.6. Just follow the pip installation (it should install CUDA and all required libraries into your virtual environment). |
Beta Was this translation helpful? Give feedback.
-
So that fixed that, but now I'm getting the following: Looping over training data for penalty calculation: 0%| | 0/2426 [00:00<?, ?it/s]/cbica/projects/DBT_AI/.conda/envs/venv_gandlf_new/lib/python3.8/site-packages/torchio/data/io.py:36: UserWarning: Error loading image with SimpleITK: Trying NiBabel... Looping over training data for penalty calculation: 0%| | 0/2426 [00:00<?, ?it/s] |
Beta Was this translation helpful? Give feedback.
-
Hmmmm, This seems like an interesting error. Can you please let us know your ITK version and the output of |
Beta Was this translation helpful? Give feedback.
-
subjectID,channel_0,label ITK verison: 3.8.0 |
Beta Was this translation helpful? Give feedback.
-
Can you mention the output of the following command: # activate gandlf python environment
python -c "import SimpleITK as sitk;image=sitk.ReadImage('/cbica/projects/DBT_AI/Data/masks/75712684_PROC_LCC_RC_2/75712684_PROC_LCC_RC_mat.nii.gz');print(image.GetSize());mask=sitk.ReadImage('/cbica/projects/DBT_AI/Data/masks/75712684_PROC_LCC_RC_2/75712684_PROC_LCC_RC_mask.nii.gz');print(mask.GetSize())" Also, it would be great if you can post at least the mask so that we can debug further. |
Beta Was this translation helpful? Give feedback.
-
(1996, 2457, 73) What do you mean by posting the mask? |
Beta Was this translation helpful? Give feedback.
-
This is also at the top of the error file: |
Beta Was this translation helpful? Give feedback.
-
Hmm, if the piece of code I replied with is giving this output, it means that the IO is working as expected.
I meant uploading it here for us to debug. But it doesn't matter, since the IO is working correctly (as seen from the output of the command I sent).
This is not unrelated to GaNDLF, and is dependent on the host machine. |
Beta Was this translation helpful? Give feedback.
-
I tried running the same job with 1/5 of the training data and it was able to run without an error, however, I'm getting this: |
Beta Was this translation helpful? Give feedback.
-
GaNDLF Version
0.0.17-devVersion information of the GaNDLF package in the virtual environment.
Desktop (please complete the following information):
How did you install GaNDLF
Please provide all steps followed during installation.
Dataset description
Describe the data (radiology/histology/so on, dimensions, etc.).
Radiology, 3D, breast images
Describe your question/problem
A clear and concise description of what issue you are facing.
I just pulled the newest code and am starting to have issues:
"python ./gandlf_verifyInstall" had no issues
cuda version : 11.2
Command: python /home/GaNDLF/gandlf_run -c /home/config_20.yaml -i /home/train.csv -m /home/output -t True -d cuda
I am getting the following output when trying to train:
Traceback (most recent call last):
File "/cbica/home/ahluwalv/GaNDLF/gandlf_run", line 11, in
from GANDLF.cli import main_run, copyrightMessage
File "/gpfs/fs001/cbica/home/ahluwalv/GaNDLF/GANDLF/cli/init.py", line 1, in
from .patch_extraction import patch_extraction
File "/gpfs/fs001/cbica/home/ahluwalv/GaNDLF/GANDLF/cli/patch_extraction.py", line 7, in
from GANDLF.data.patch_miner.opm.patch_manager import PatchManager
File "/gpfs/fs001/cbica/home/ahluwalv/GaNDLF/GANDLF/data/init.py", line 1, in
from torch.utils.data import DataLoader
File "/cbica/projects/DBT_AI/.conda/envs/venv_gandlf_new/lib/python3.8/site-packages/torch/init.py", line 217, in
_load_global_deps()
File "/cbica/projects/DBT_AI/.conda/envs/venv_gandlf_new/lib/python3.8/site-packages/torch/init.py", line 178, in _load_global_deps
_preload_cuda_deps()
File "/cbica/projects/DBT_AI/.conda/envs/venv_gandlf_new/lib/python3.8/site-packages/torch/init.py", line 158, in _preload_cuda_deps
ctypes.CDLL(cublas_path)
File "/cbica/projects/DBT_AI/.conda/envs/venv_gandlf_new/lib/python3.8/ctypes/init.py", line 373, in init
self._handle = _dlopen(self._name, mode)
OSError: /cbica/projects/DBT_AI/.conda/envs/venv_gandlf_new/lib/python3.8/site-packages/nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
Beta Was this translation helpful? Give feedback.
All reactions