Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi_gpu error #6

Open
gattia opened this issue Jan 31, 2017 · 6 comments
Open

multi_gpu error #6

gattia opened this issue Jan 31, 2017 · 6 comments

Comments

@gattia
Copy link

gattia commented Jan 31, 2017

Hi,

I've tried to run the multi_gpu program to parallelize a convolution neural network that is based on U-Net. I am trying to parallelize on a g2.8xlarge to take advantage of the 4 GPUs.

Anyways, when trying to run the code I got an error. Below are both the full error, as well as the function used to define the model/call the multi_gpu function (make_parallel). The part that calls make_parallel is pretty much at the very end of the script/this post.

This may be super simple but I have no experience with tf and am just starting with keras. Any suggestions would be greatly appreciated.

Thanks,

Anthony.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-b2175f08a7a9> in <module>()
----> 1 modelTest = get_unet()
      2 print(modelTest.summary())

<ipython-input-18-7318b7dd3672> in get_unet()
     40 
     41     model = Model(input=inputs, output=conv10)
---> 42     model = make_parallel(model,3)
     43     model.compile(optimizer=Adam(lr=1e-5), loss=dice_coef_loss, metrics=[dice_coef])
     44 

/vol/programs/keras-extras/utils/multi_gpu.pyc in make_parallel(model, gpu_count)
     29                     inputs.append(slice_n)
     30 
---> 31                 outputs = model(inputs)
     32 
     33                 if not isinstance(outputs, list):

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in __call__(self, x, mask)
    483         if inbound_layers:
    484             # this will call layer.build() if necessary
--> 485             self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
    486             input_added = True
    487 

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in add_inbound_node(self, inbound_layers, node_indices, tensor_indices)
    541         # creating the node automatically updates self.inbound_nodes
    542         # as well as outbound_nodes on inbound layers.
--> 543         Node.create_node(self, inbound_layers, node_indices, tensor_indices)
    544 
    545     def get_output_shape_for(self, input_shape):

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in create_node(cls, outbound_layer, inbound_layers, node_indices, tensor_indices)
    146 
    147         if len(input_tensors) == 1:
--> 148             output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
    149             output_masks = to_list(outbound_layer.compute_mask(input_tensors[0], input_masks[0]))
    150             # TODO: try to auto-infer shape if exception is raised by get_output_shape_for

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in call(self, input, mask)
   1920             return self._output_tensor_cache[cache_key]
   1921         else:
-> 1922             output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
   1923             return output_tensors
   1924 

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in run_internal_graph(self, inputs, masks)
   2062                     if len(computed_data) == 1:
   2063                         computed_tensor, computed_mask = computed_data[0]
-> 2064                         output_tensors = to_list(layer.call(computed_tensor, computed_mask))
   2065                         output_masks = to_list(layer.compute_mask(computed_tensor, computed_mask))
   2066                         computed_tensors = [computed_tensor]

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/layers/convolutional.pyc in call(self, x, mask)
   1062     def call(self, x, mask=None):
   1063         return K.resize_images(x, self.size[0], self.size[1],
-> 1064                                self.dim_ordering)
   1065 
   1066     def get_config(self):

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/backend/tensorflow_backend.pyc in resize_images(X, height_factor, width_factor, dim_ordering)
    506         X = tf.image.resize_nearest_neighbor(X, new_shape)
    507         X = permute_dimensions(X, [0, 3, 1, 2])
--> 508         X.set_shape((None, None, original_shape[2] * height_factor, original_shape[3] * width_factor))
    509         return X
    510     elif dim_ordering == 'tf':

TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
def get_unet():
    inputs = Input((1, img_rows, img_cols))
    conv1 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(inputs)
    conv1 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

    conv2 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(pool1)
    conv2 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

    conv3 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(pool2)
    conv3 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

    conv4 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(pool3)
    conv4 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(conv4)
    pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

    conv5 = Convolution2D(512, 3, 3, activation='relu', border_mode='same')(pool4)
    conv5 = Convolution2D(512, 3, 3, activation='relu', border_mode='same')(conv5)

    up6 = merge([UpSampling2D(size=(2, 2))(conv5), conv4], mode='concat', concat_axis=1)
    
    conv6 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(up6)
    conv6 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(conv6)

    up7 = merge([UpSampling2D(size=(2, 2))(conv6), conv3], mode='concat', concat_axis=1)
    conv7 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(up7)
    conv7 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(conv7)

    up8 = merge([UpSampling2D(size=(2, 2))(conv7), conv2], mode='concat', concat_axis=1)
    conv8 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(up8)
    conv8 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(conv8)

    up9 = merge([UpSampling2D(size=(2, 2))(conv8), conv1], mode='concat', concat_axis=1)
    conv9 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(up9)
    conv9 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(conv9)

    conv10 = Convolution2D(1, 1, 1, activation='sigmoid')(conv9)

    model = Model(input=inputs, output=conv10)
    model = make_parallel(model,3)
    
    model.compile(optimizer=Adam(lr=1e-5), loss=dice_coef_loss, metrics=[dice_coef])
    
    

    return model 
@jakubLangr
Copy link

+1 same problem here @gattia did you solve it in the end?

@gattia
Copy link
Author

gattia commented Apr 2, 2017

@jakubLangr I actually didn't. My solution ended up being to use a p2.xlarge. They are a bit more expensive, but if you use spot instances you can get them for an average of about 0.14$/hr, or at least that's been my experience for the last month or so.

@jakubLangr
Copy link

jakubLangr commented Apr 2, 2017 via email

@jakubLangr
Copy link

jakubLangr commented Apr 3, 2017 via email

@gattia
Copy link
Author

gattia commented Apr 11, 2017

Sorry, I dont really have time to look into this and have not been using the multi-gpu at the moment. I think that eventually when Keras is integrated into tensorflow that distributed training will be relatively easy.

I will report back if I do get around to looking into this.

Anthony.

@zippeurfou
Copy link

Any update on this? I am seeing a similar behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants