[Bug] RuntimeError: CUDA error: out of memory Exception raised from ROIAlignRotatedForwardCUDAKernelLauncher #957

2597883929 · 2023-10-31T08:01:35Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version (master) or latest version (1.x).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

1.x branch https://github.com/open-mmlab/mmrotate/tree/1.x

Environment

sys.platform: linux
Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.0, V11.0.194
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.12.1
PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.3.2 (built against CUDA 11.5)
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.13.1
OpenCV: 4.8.1
MMEngine: 0.9.0
MMRotate: 1.0.0rc1+fd60bef

Reproduces the problem - code sample

I just run the demo in the tutorial.

Copyright (c) OpenMMLab. All rights reserved.

from argparse import ArgumentParser

import mmcv
from mmdet.apis import inference_detector, init_detector
import torch
from mmrotate.registry import VISUALIZERS
from mmrotate.utils import register_all_modules
import os

def parse_args():
parser = ArgumentParser()
parser.add_argument('img', help='Image file')
parser.add_argument('config', help='Config file')
parser.add_argument('checkpoint', help='Checkpoint file')
parser.add_argument('--out-file', default=None, help='Path to output file')
parser.add_argument(
'--device', default='cuda:6', help='Device used for inference')
parser.add_argument(
'--palette',
default='dota',
choices=['dota', 'sar', 'hrsc', 'random'],
help='Color palette used for visualization')
parser.add_argument(
'--score-thr', type=float, default=0.3, help='bbox score threshold')
args = parser.parse_args()
return args

def main(args):
# register all modules in mmrotate into the registries
register_all_modules()

# build the model from a config file and a checkpoint file
model = init_detector(
    args.config, args.checkpoint, palette=args.palette, device=args.device)

# init visualizer
visualizer = VISUALIZERS.build(model.cfg.visualizer)
# the dataset_meta is loaded from the checkpoint and
# then pass to the model in init_detector
visualizer.dataset_meta = model.dataset_meta

# test a single image
result = inference_detector(model, args.img)

# show the results
img = mmcv.imread(args.img)
img = mmcv.imconvert(img, 'bgr', 'rgb')
visualizer.add_datasample(
    'result',
    img,
    data_sample=result,
    draw_gt=False,
    show=args.out_file is None,
    wait_time=0,
    out_file=args.out_file,
    pred_score_thr=args.score_thr)

if name == 'main':
os.environ['CUDA_LAUNCH_BLOCKING']='1'
torch.cuda._initialized = True
args = parse_args()
main(args)

Reproduces the problem - command or script

python demo/image_demo.py demo/demo.jpg oriented-rcnn-le90_r50_fpn_1x_dota.py oriented_rcnn_r50_fpn_1x_dota_le90-6d2b2ce0.pth --out-file result.jpg

Reproduces the problem - error message

Loads checkpoint by local backend from path: oriented_rcnn_r50_fpn_1x_dota_le90-6d2b2ce0.pth /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmengine/visualization/visualizer.py:196: UserWarning: Failed to add <class 'mmengine.visualization.vis_backend.LocalVisBackend'>, please provide the save_dir argument. warnings.warn(f'Failed to add {vis_backend.class}, ' Traceback (most recent call last): File "demo/image_demo.py", line 66, in main(args) File "demo/image_demo.py", line 46, in main result = inference_detector(model, args.img) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmdet/apis/inference.py", line 189, in inference_detector results = model.test_step(data_)[0] File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step return self._run_forward(data, mode='predict') # type: ignore File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 346, in _run_forward results = self(data, mode=mode) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, **kwargs) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 94, in forward return self.predict(inputs, data_samples) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmdet/models/detectors/two_stage.py", line 238, in predict results_list = self.roi_head.predict( File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmdet/models/roi_heads/base_roi_head.py", line 118, in predict results_list = self.predict_bbox( File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmdet/models/roi_heads/standard_roi_head.py", line 335, in predict_bbox bbox_results = self._bbox_forward(x, rois) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmdet/models/roi_heads/standard_roi_head.py", line 163, in _bbox_forward bbox_feats = self.bbox_roi_extractor( File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in call_impl return forward_call(*input, **kwargs) File "/data/xzf/model/mmrotate/mmrotate/models/roi_heads/roi_extractors/rotate_single_level_roi_extractor.py", line 128, in forward roi_feats_t = self.roi_layers[i](feats[i], rois) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, **kwargs) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/ops/roi_align_rotated.py", line 175, in forward return RoIAlignRotatedFunction.apply(input, rois, self.output_size, File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/ops/roi_align_rotated.py", line 65, in forward ext_module.roi_align_rotated_forward( RuntimeError: CUDA error: out of memory Exception raised from ROIAlignRotatedForwardCUDAKernelLauncher at /tmp/mmcv/mmcv/ops/csrc/pytorch/cuda/roi_align_rotated_cuda.cu:24 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7ff7ee2db497 in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: c10::CUDAError::Error(c10::SourceLocation, std::string) + 0x30 (0x7ff7ad253dac in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #2: ROIAlignRotatedForwardCUDAKernelLauncher(at::Tensor, at::Tensor, float, int, bool, bool, int, int, int, int, int, int, at::Tensor) + 0x1a8 (0x7ff7ad34065e in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #3: roi_align_rotated_forward_cuda(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool) + 0x228 (0x7ff7ad296798 in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #4: auto Dispatch<DeviceRegistry<void ()(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool), &(roi_align_rotated_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool))>, at::Tensor&, at::Tensor&, at::Tensor&, int&, int&, float&, int&, bool&, bool&>(DeviceRegistry<void ()(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool), &(roi_align_rotated_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool))> const&, char const, at::Tensor&, at::Tensor&, at::Tensor&, int&, int&, float&, int&, bool&, bool&) + 0x11e (0x7ff7ad4a933e in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #5: roi_align_rotated_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool) + 0x8e (0x7ff7ad4a8dee in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #6: roi_align_rotated_forward(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool) + 0x7a (0x7ff7ad4a8eaa in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #7: + 0x378af5 (0x7ff7ad4a1af5 in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #8: + 0x35a7f1 (0x7ff7ad4837f1 in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #15: THPFunction_apply(_object, _object*) + 0x5d6 (0x7ff82e206a06 in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #20: python() [0x4f5154] frame #26: python() [0x5ab487] frame #31: python() [0x4f5154] frame #37: python() [0x5ab487] frame #40: python() [0x4f4ff6] frame #43: python() [0x4f50db] frame #46: python() [0x4f50db] frame #49: python() [0x4f50db] frame #53: python() [0x4f5154] frame #60: python() [0x5ab487]

Additional information

No response

The text was updated successfully, but these errors were encountered:

July-1024 · 2024-07-23T01:09:12Z

Hi，I meet the same problem.Did you solve it？

2597883929 · 2024-07-26T02:16:37Z

Hi，I meet the same problem.Did you solve it？

I dont know why this works. But I changed my pytorch to 1.13.1 and my cuda to 11.6, and then it works. Maybe you can try it too

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] RuntimeError: CUDA error: out of memory Exception raised from ROIAlignRotatedForwardCUDAKernelLauncher #957

[Bug] RuntimeError: CUDA error: out of memory Exception raised from ROIAlignRotatedForwardCUDAKernelLauncher #957

2597883929 commented Oct 31, 2023

July-1024 commented Jul 23, 2024

2597883929 commented Jul 26, 2024

[Bug] RuntimeError: CUDA error: out of memory Exception raised from ROIAlignRotatedForwardCUDAKernelLauncher #957

[Bug] RuntimeError: CUDA error: out of memory Exception raised from ROIAlignRotatedForwardCUDAKernelLauncher #957

Comments

2597883929 commented Oct 31, 2023

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Copyright (c) OpenMMLab. All rights reserved.

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

July-1024 commented Jul 23, 2024

2597883929 commented Jul 26, 2024