Skip to content

RuntimeError: CUDA error: an illegal memory access was encountered #41

Open
@seohoiki3215

Description

@seohoiki3215

Hello, I was surprised by your work and tried to reproduce it with the code you've provided.
However, every time I tried to run the code, it always failed to run with the runtime error i mentioned on the title.

Traceback (most recent call last):
File "train.py", line 213, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint)
File "train.py", line 87, in training
loss = (1.0 - opt.lambda_dssim) * Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image))
File "/home/seohoiki/Research/NeRF/gaussian-splatting/utils/loss_utils.py", line 38, in ssim
window = window.cuda(img1.get_device())
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Training progress: 0%| | 0/30000 [00:00<?, ?it/s]

I tried all the methods you've told in other issues, but failed.
My system & settings:
RTX4090
Ubuntu 22.04 LTS
Exact environment with given .yml file

Strangely, my colleague who has system with RTX 3090 / Ubuntu 20.04 runs the code without any problem.(Except them, all the settings are exactly the same including CUDA SDK version)

I hope I can get some solution for this problem!

Thank you.

=====================================
Results with cuda-memcheck

========= CUDA-MEMCHECK
========= This tool is deprecated and will be removed in a future release of the CUDA toolkit
========= Please use the compute-sanitizer tool as a drop-in replacement
Optimizing
Output folder: ./output/54877260-0 [17/07 19:21:51]
Tensorboard not available: not logging progress [17/07 19:21:51]
Found transforms_train.json file, assuming Blender data set! [17/07 19:21:51]
Reading Training Transforms [17/07 19:21:51]
Reading Test Transforms [17/07 19:21:53]
Loading Training Cameras [17/07 19:21:56]
Loading Test Cameras [17/07 19:21:57]
Number of points at initialisation : 100000 [17/07 19:21:57]
Training progress: 0%| | 0/30000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 213, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint)
File "train.py", line 87, in training
loss = (1.0 - opt.lambda_dssim) * Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image))
File "/home/seohoiki/Research/NeRF/gaussian-splatting/utils/loss_utils.py", line 38, in ssim
window = window.cuda(img1.get_device())
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Training progress: 0%| | 0/30000 [00:00<?, ?it/s]
========= ERROR SUMMARY: 0 errors

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions