-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault while training #70
Comments
same problem |
I fixed this problem by transferring the original CUDA code from c++ to python(numba.cuda). Maybe it avoids some complying bugs. (I tried to use pybind11 but it failed as well) |
@WEIIEW97 Actually, i was using this code because another work depends on it. and i use it to re-implement that work. So i didn't dive in the python code of this work. But after i change the I changed some code in c++ cuda, but i'm not sure whether it helps. I'd like to know whether you fixed this bug in your python(numba.cuda) code. And it's better if you can share your numbda.cuda file. Looking forward to your reply. Thanks. |
@ironheads Thanks for your response and sharing! I did not dive deep as you did thus I'd like to appreciate your information. |
@WEIIEW97 # LUT0 = Generator3DLUT_identity()
# LUT1 = Generator3DLUT_zero()
# LUT2 = Generator3DLUT_zero()
#...
#...
# img = some images in the dataset whose shape is [batch_size,3,width,height], make sure that the values are in range [0,1] (My segmentation fault comes from this)
# the following code comes from image_adaptive_lut_train_paired.py
pred = classifier(img).squeeze() # the img is still [batch_size,3,width,height]
# then you should modify the codes as following
new_img = img.permute(1,0,2,3).contiguous()
gen_A0 = LUT0(new_img)
gen_A1 = LUT1(new_img)
gen_A2 = LUT2(new_img)
combine_A = new_img.new(new_img.size())
for b in range(new_img.size(1)):
combine_A[:,b,:,:] = pred[b,0] * gen_A0[:,b,:,:] + pred[b,1] * gen_A1[:,b,:,:] + pred[b,2] * gen_A2[:,b,:,:] #+ pred[b,3] * gen_A3[:,b,:,:] + pred[b,4] * gen_A4[:,b,:,:]
result_A = combine_A.permute(1,0,2,3) #get the [batch_size,3,width,height] combined image
# the key is make the LUT's input in shape [3,batch_size,width,height], there maybe some other codes need to be modified when using LUT. I don't list them all because i do not use all the python codes in this work.
# another important modification is TrilinearInterpolationFunction, it should be modified as following.
class TrilinearInterpolationFunction(torch.autograd.Function):
@staticmethod
def forward(ctx, lut, x):
x = x.contiguous()
output = x.new(x.size())
dim = lut.size()[-1]
shift = dim ** 3
binsize = 1.000001 / (dim-1)
W = x.size(2)
H = x.size(3)
batch = x.size(1) # this changes
# print(batch)
assert 1 == trilinear.forward(lut,
x,
output,
dim,
shift,
binsize,
W,
H,
batch)
int_package = torch.IntTensor([dim, shift, W, H, batch])
float_package = torch.FloatTensor([binsize])
variables = [lut, x, int_package, float_package]
ctx.save_for_backward(*variables)
return lut, output
@staticmethod
def backward(ctx, lut_grad, x_grad):
lut, x, int_package, float_package = ctx.saved_variables
dim, shift, W, H, batch = int_package
dim, shift, W, H, batch = int(dim), int(shift), int(W), int(H), int(batch)
binsize = float(float_package[0])
assert 1 == trilinear.backward(x,
x_grad,
lut_grad,
dim,
shift,
binsize,
W,
H,
batch)
return lut_grad, x_grad all the modifications aim to make the input of LUT in shape [3,batch_size,width,height] and then reshape result of LUT into [batch_size,3,width,height]. Another important thing is that the input values should be in range [0,1] I don't know whether this will help If your code fails when batch_size > 1. |
@ironheads |
It occured 'Segmentation fault (core dumped)' for cpu version and 'cudaCheckError() failed : invalid device function. Segmentation fault (core dumped)' for CUDA version every time when I trained this network. How could it be solved? Thanks in advance.
The text was updated successfully, but these errors were encountered: