train error #19

Wohoholo · 2021-01-12T01:50:43Z

Hello!
I found a problem about seg loss in training with my own dataset. My segment datasets were converted to "L". In ori_big.py, model would predict segment with size[x, 2, x, x]. But I got error when training was at CrossEntropyLoss2d. Can you give some help? Thanks!

Error:
/pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:106: cunn_SpatialClassNLLCriterion_updateOutput_kernel: block: [4,0,0], thread: [189,0,0] Assertion t >= 0 && t < n_classes failed.
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED (createCuDNNHandle at /pytorch/aten/src/ATen/cudnn/Handle.cpp:9)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7f81564a5536 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x10a0c28 (0x7f81579a1c28 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::native::getCudnnHandle() + 0xe54 (0x7f81579a3404 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #3: + 0xf19f4c (0x7f815781af4c in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0xf1afe1 (0x7f815781bfe1 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: + 0xf1f01b (0x7f815782001b in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: at::native::cudnn_convolution_backward_input(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0xb2 (0x7f8157820572 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xf86090 (0x7f8157887090 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #8: + 0xfca928 (0x7f81578cb928 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #9: at::native::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x4fa (0x7f8157821c0a in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #10: + 0xf863bb (0x7f81578873bb in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #11: + 0xfca984 (0x7f81578cb984 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #12: + 0x2c80736 (0x7f8191037736 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: + 0x2ccff44 (0x7f8191086f44 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #14: torch::autograd::generated::CudnnConvolutionBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x378 (0x7f8190c4f908 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #15: + 0x2d89705 (0x7f8191140705 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x7f819113da03 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x7f819113e7e2 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f8191136e59 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #19: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f819da7e968 in /home/derek/anaconda3/envs/jim/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #20: + 0xc819d (0x7f81ac9d019d in /home/derek/anaconda3/envs/jim/bin/../lib/libstdc++.so.6)
frame #21: + 0x76db (0x7f81ae1696db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #22: clone + 0x3f (0x7f81ade9271f in /lib/x86_64-linux-gnu/libc.so.6)

The text was updated successfully, but these errors were encountered:

Wohoholo · 2021-01-12T02:42:33Z

Target Segment size: torch.Size([6, 512, 680])
Pred_segment size: torch.Size([6, 2, 512, 680])

Wohoholo closed this as completed Jan 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train error #19

train error #19

Wohoholo commented Jan 12, 2021

Wohoholo commented Jan 12, 2021

train error #19

train error #19

Comments

Wohoholo commented Jan 12, 2021

Wohoholo commented Jan 12, 2021