Out of memory error #1

ghost · 2016-06-07T00:22:22Z

When I execute

CUDA_VISIBLE_DEVICES=0 th feedforward_neural_doodle.lua -model_name skip_noise_4 -masks_hdf5 data/starry/gen_doodles.hdf5 -batch_size 4 -num_mask_noise_times 0 -num_noise_channels 0 -learning_rate 1e-1 -half false

I get the followingh result:

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192
Successfully loaded data/pretrained/VGG_ILSVRC_19_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
Setting up style layer 2 : relu1_1
Replacing max pooling at layer 5 with average pooling
Setting up style layer 7 : relu2_1
Replacing max pooling at layer 10 with average pooling
Setting up style layer 12 : relu3_1
Replacing max pooling at layer 19 with average pooling
Setting up style layer 21 : relu4_1
Replacing max pooling at layer 28 with average pooling
Setting up style layer 30 : relu5_1
Optimize
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-7288/cutorch/lib/THC/generic/THCStorage.cu line=41 error=2 : out of memory
/home/andrew/torch/install/bin/luajit: /home/andrew/torch/install/share/lua/5.1/nn/Container.lua:67:
In 3 module of nn.Sequential:
In 1 module of nn.Sequential:
/home/andrew/torch/install/share/lua/5.1/nn/THNN.lua:109: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7288/cutorch/lib/THC/generic/THCStorage.cu:41
stack traceback:
[C]: in function 'v'
/home/andrew/torch/install/share/lua/5.1/nn/THNN.lua:109: in function 'SpatialReplicationPadding_updateGradInput'
...h/install/share/lua/5.1/nn/SpatialReplicationPadding.lua:41: in function 'updateGradInput'
/home/andrew/torch/install/share/lua/5.1/nn/Module.lua:31: in function </home/andrew/torch/install/share/lua/5.1/nn/Module.lua:29>
[C]: in function 'xpcall'
/home/andrew/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/andrew/torch/install/share/lua/5.1/nn/Sequential.lua:88: in function </home/andrew/torch/install/share/lua/5.1/nn/Sequential.lua:78>
[C]: in function 'xpcall'
/home/andrew/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/andrew/torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
feedforward_neural_doodle.lua:167: in function 'opfunc'
/home/andrew/torch/install/share/lua/5.1/optim/adam.lua:33: in function 'optim_method'
feedforward_neural_doodle.lua:199: in main chunk
[C]: in function 'dofile'
...drew/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

I'm running with multiple GTX 980s, so GPU m,emory should not be an issue.

I have tried running with both -backend cudnn and -backen nn with no difference to the outcome.

I have been able to run the fast-neural-doodle project with no problems on my machine, so prerequisites such as python, torch and cuda appear to have been set up correctly.

Any idea of the cause of this problem?

DmitryUlyanov · 2016-06-07T09:58:03Z

Hello, I tested everything using 12GB card, so all the parameters tuned to work in my settings. You can try to decrease batch_size to 1 to see if it not fails, but it will train much worse with this small batch size.

You can reduce the image size to decrease memory consumption. I used 512x images, you can always go to any dimensions which are factor of 32, try 384x for example with the same batch_size.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory error #1

Out of memory error #1

ghost commented Jun 7, 2016

DmitryUlyanov commented Jun 7, 2016 •

edited

Loading

Out of memory error #1

Out of memory error #1

Comments

ghost commented Jun 7, 2016

DmitryUlyanov commented Jun 7, 2016 • edited Loading

DmitryUlyanov commented Jun 7, 2016 •

edited

Loading