Cannot call `load_model` on network trained using `multi_gpu`. #3

jrosebr1 · 2016-11-30T12:03:52Z

I came across your post on Medium and was instantly hooked. Nice job!

I've been developing a series of deep learning experiments that use only a single GPU and decided to switch them over to a multi-GPU setting. After training the models are serialized to disk via model.save.

However, when I try to call load_model on to load the pre-trained network for disk I get an error:

[INFO] loading model...
Traceback (most recent call last):
  File "rank_accuracy.py", line 28, in 
    model = load_model(config.MODEL_PATH)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/models.py", line 140, in load_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/models.py", line 189, in model_from_config
    return layer_from_config(config, custom_objects=custom_objects)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/utils/layer_utils.py", line 34, in layer_from_config
    return layer_class.from_config(config['config'])
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2395, in from_config
    process_layer(layer_data)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2390, in process_layer
    layer(input_tensors[0])
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 517, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 571, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 155, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/layers/core.py", line 587, in call
    return self.function(x, **arguments)
  File "/home/ubuntu/deep-learning-book/dataset_to_hdf5/multi_gpu.py", line 9, in get_slice
    shape = tf.shape(data)
NameError: global name 'tf' is not defined

Looking at multi_gpu.py it's clear that TensorFlow is imported so I'm not sure why the error is being generated.

The text was updated successfully, but these errors were encountered:

kuza55 · 2016-11-30T20:40:00Z

This looks like an issue with how Keras serializes/deserializes models; unless you really need to de/serialize the multi-gpu version, I would recommend keeping a copy of the original single GPU model around, and saving /loading that model, rather than the parallelized model. The weights are shared between the original model and the new model.

jrosebr1 · 2016-12-01T11:02:40Z

Thanks, I'll give this a try.

tstandley · 2016-12-02T09:18:08Z

I have an ugly but functional workaround:

Change the return statement to the following code:

new_model = Model(input=model.inputs, output=merged)
funcType = type(model.save)

# monkeypatch the save to save just the underlying model
def new_save(self_,filepath, overwrite=True):
    model.save(filepath, overwrite)
new_model.save=funcType(new_save, new_model)
return new_model

This monkey-patches the old model's save onto the new model's save (calling the parallel model's save will call the simple model's save)

When loading, you must load the simple model before creating the parallel model.

pswpswpsw · 2016-12-03T03:32:34Z

Just one simple question:

for i in xrange(gpu_count):
with tf.device('/gpu:%d' % i):
with tf.name_scope('tower_%d' % i) as scope:

Is the first line running in GPU or CPU? If in CPU, then we should expect the GPU is running in an order, say after first GPU finish, second GPU begin. So I don't see the parallism inside. Could you explain to me?

tstandley · 2016-12-04T04:28:30Z

@pswpswpsw The actual python code just runs once (in serial on the cpu taking only seconds) and just sets up a graph and tells tensorflow which parts of the graph should be computed with which GPU's. This is true of most of the tensorflow/keras code you write. It only runs once, and it isn't very important to optimize it for speed. The code that actually does the training will run when fit() is called. The code in this file simply tells tensorflow to run fit() in parallel on all of the GPU's.

aman-tiwari · 2016-12-08T19:09:54Z

Is there any way to recover a saved multi-gpu model? I see that above fix has to applied before the model is saved, but is there a way to load an already saved one?

aschampion · 2017-02-07T20:10:53Z

@aman-tiwari and anyone else who may stumble across this: you can recover an already saved multi GPU model simply. Temporarily edit your virtualenv's keras/layers/core.py to have the necessary import: import tensorflow as tf. Then:

from keras.models import load_model

multi_model = load_model('your_multi_gpu_model.hdf5')
old_model = multi_model.layers[-2] # The last layer is the merge layer from make_parallel
old_model.save('single_gpu_model.hdf5')

burgalon · 2017-09-22T13:56:11Z

Use custom_objects like this:
model=keras.models.load_model('model.hdf5', custom_objects={"tf": tf})

texastony · 2017-09-27T19:50:29Z

In reply to the original comment, I found I was able to get the original model by going to model_multi_gpu.layers[-2]. That would return a keras.models.Sequential or keras.models.Model object that I could load or save to.

It is was also how I modified which layers I wanted to train. But after editing this original model, I would rerun make_parallel on it as I was uncertain if I was working on a copy of what is in layers[-2] or the original. It is not a perfect solution, as making a model parallel can take sometime, but it works.

Note: where model_multi_gpu = make_parallel(model_orig)

munsanje mentioned this issue Aug 10, 2017

Potential incompatibility with keras model checkpointing #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot call `load_model` on network trained using `multi_gpu`. #3

Cannot call `load_model` on network trained using `multi_gpu`. #3

jrosebr1 commented Nov 30, 2016

kuza55 commented Nov 30, 2016

jrosebr1 commented Dec 1, 2016

tstandley commented Dec 2, 2016 •

edited

Loading

pswpswpsw commented Dec 3, 2016

tstandley commented Dec 4, 2016

aman-tiwari commented Dec 8, 2016

aschampion commented Feb 7, 2017 •

edited

Loading

burgalon commented Sep 22, 2017

texastony commented Sep 27, 2017 •

edited

Loading

Cannot call load_model on network trained using multi_gpu. #3

Cannot call load_model on network trained using multi_gpu. #3

Comments

jrosebr1 commented Nov 30, 2016

kuza55 commented Nov 30, 2016

jrosebr1 commented Dec 1, 2016

tstandley commented Dec 2, 2016 • edited Loading

pswpswpsw commented Dec 3, 2016

tstandley commented Dec 4, 2016

aman-tiwari commented Dec 8, 2016

aschampion commented Feb 7, 2017 • edited Loading

burgalon commented Sep 22, 2017

texastony commented Sep 27, 2017 • edited Loading

Cannot call `load_model` on network trained using `multi_gpu`. #3

Cannot call `load_model` on network trained using `multi_gpu`. #3

tstandley commented Dec 2, 2016 •

edited

Loading

aschampion commented Feb 7, 2017 •

edited

Loading

texastony commented Sep 27, 2017 •

edited

Loading