Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: cuda runtime error (2) #69

Open
jacedang opened this issue Jul 24, 2021 · 5 comments
Open

RuntimeError: cuda runtime error (2) #69

jacedang opened this issue Jul 24, 2021 · 5 comments

Comments

@jacedang
Copy link

THCudaCheck FAIL file=c:\programdata\miniconda3\conda-bld\pytorch_1524546354046\work\aten\src\thc\generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "main.py", line 24, in
main()
File "main.py", line 20, in main
t.test()
File "C:\Users\DIC\Meta-SR-Pytorch\trainer.py", line 218, in test
sr = self.model(lr, idx_scale,scale_coord_map)
File "D:\3rd_semester\anaconda3\envs\python35\lib\site-packages\torch\nn\modules\module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "C:\Users\DIC\Meta-SR-Pytorch\model_init_.py", line 54, in forward
return self.model(x,pos_mat)
File "D:\3rd_semester\anaconda3\envs\python35\lib\site-packages\torch\nn\modules\module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "C:\Users\DIC\Meta-SR-Pytorch\model\metardn.py", line 134, in forward
local_weight = local_weight.contiguous().view(x.size(2),scale_int, x.size(3),scale_int,-1,3).permute(1,3,0,2,4,5).contiguous()
RuntimeError: cuda runtime error (2) : out of memory at c:\programdata\miniconda3\conda-bld\pytorch_1524546354046\work\aten\src\thc\generic/THCStorage.cu:58

if anyone encountered this error while running the test demo. and any solution for it. Looking forward to your help. Thank you

@dflateau
Copy link

dflateau commented Sep 1, 2021

I also am stuck with this when I try to test. Any ideas?

Pytorch 0.4.0, cuda 8 (have also tried cuda 9, 10, with same results).

@jacedang
Copy link
Author

jacedang commented Sep 2, 2021

I also am stuck with this when I try to test. Any ideas?

Pytorch 0.4.0, cuda 8 (have also tried cuda 9, 10, with same results).

Hi. This is because GPU is not enough. So my solution for this is I equip another GPU for my computer and reduce the batch size from 16 to 4. Then I could run the training and testing.

@XuecaiHu
Copy link
Owner

XuecaiHu commented Sep 2, 2021

when you test on big images, you should input the images by patches to solve this problem

@XuecaiHu
Copy link
Owner

XuecaiHu commented Sep 2, 2021

    with torch.no_grad():
        for idx_scale, scale in enumerate(self.scale):
            eval_acc = 0
            self.loader_test.dataset.set_scale(idx_scale)
            #tqdm_test = tqdm(self.loader_test, ncols=80)
            for idx_img, (lr, hr, filename, _) in enumerate(self.loader_test):
                #print(filename)
                filename = filename[0]
                no_eval = (hr.nelement() == 1)
                #no_eval = True
                #print(lr.size())
                #print(hr.size())
                if not no_eval:
                    lr, hr = self.prepare(lr, hr)
                else:
                    lr, = self.prepare(lr)
                
                timer_test.tic()   
                N,C,H,W = lr.size()
                #print(lr.size())
                #print(hr.size())
                if self.args.patch_test:
                    img_sr = torch.zeros(N,C,int(scale*H),int(scale*W))
                    img_sr = img_sr.cuda()
                    if scale < 2:
                        patch_size = 40
                    elif scale < 3:
                        patch_size = 100
                    else:
                        patch_size = 50
                
                    for di in range(H//patch_size+1):
                        for dj in range(W//patch_size+1):
                            x_i = di*patch_size
                            y_j = dj*patch_size
                            if di== (H//patch_size-1):
                                if (H%patch_size<= patch_size/2):
                                    x_2 = H
                                else:
                                    x_2 = (di+1)*patch_size + 10
                            elif di==(H//patch_size):
                                if (H%patch_size > patch_size/2):
                                    x_2 = H
                                else:
                                    continue
                            else:
                                x_2 = (di+1)*patch_size + 10
                            if dj == (W//patch_size-1):
                                if (W%patch_size <= patch_size/2):
                                    y_2 = W
                                else:
                                    y_2 = (dj+1)*patch_size + 10
                            elif dj==(W//patch_size):
                                if (W%patch_size >patch_size/2):
                                    y_2 =W
                                else:
                                    continue
                            else:
                                y_2 = (dj+1)*patch_size + 10
                            lr_patch = lr[:,:,x_i:x_2,y_j:y_2]

                            h = x_2 - x_i
                            w = y_2 - y_j
                            outH,outW = int(h*scale),int(w*scale)
                
                            sr = self.model(lr_patch, idx_scale)
                            sr_h = int(di*scale*patch_size)

                            sr_w = int(dj*scale*patch_size)
                            if not di==0:
                                deta_h = int(10*scale)
                                sr_h = sr_h +deta_h
                                outH = outH - deta_h
                            else:
                                deta_h = 0
                            if not dj==0:
                                deta_w = int(10*scale)
                                outW = outW-deta_w
                                sr_w = sr_w + deta_w
                            else:
                                deta_w = 0
                            img_sr[:,:,sr_h:sr_h+outH,sr_w:sr_w+outW]=sr[:,:,deta_h:outH+deta_h,deta_w:deta_w+outW]
                else:
                    img_sr = self.model(lr, idx_scale)

like this

@jacedang
Copy link
Author

local_weight = local_weight.contiguous().view(x.size(2),scale_int, x.size(3),scale_int,-1,3).permute(1,3,0,2,4,5).contiguous()
RuntimeError: invalid argument 2: size '[110 x 2 x 110 x 2 x -1 x 3]' is invalid for input with 1665930240 elements at ..\aten\src\TH\THStorage.cpp:80

Hello, I still couldn't test on big images (from Manga109 test set). It showed the above error. I revised the code based on the above code you provided but it seems like it did not work. Any further suggestions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants