Error with make_parallel function #18

AgrawalAmey · 2017-06-27T11:52:44Z

I got the following error while trying to use make_parallel function,

Traceback (most recent call last):
  File "model_language2motion.py", line 1335, in <module>
    main(parser.parse_args())
  File "model_language2motion.py", line 1202, in main
    args.func(args)
  File "model_language2motion.py", line 723, in train
    train_data, valid_data, model, optimizer = prepare_for_training(output_path, args)
  File "model_language2motion.py", line 677, in prepare_for_training
    model = make_parallel(model, 8)
  File "/workspace/deepAnim/make_parallel.py", line 31, in make_parallel
    outputs = model(inputs)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 635, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 172, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks))
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2247, in call
    output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2390, in run_internal_graph
    computed_mask))
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/recurrent.py", line 235, in call
    constants = self.get_constants(x)
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/recurrent.py", line 884, in get_constants
    ones = K.tile(ones, (1, int(input_dim)))
TypeError: int() argument must be a string or a number, not 'NoneType'

PS: The code works if the call to make_parallel is removed.

The text was updated successfully, but these errors were encountered:

ChristianLagares · 2017-06-30T09:49:12Z

My two cents on the problem. I’ve been working on a g2.8xlarge as well, but I came across a similar issue. I manage to work around it by making the total number of samples divisible by the batch size. If you are multiplying your batch size by the number of gpus then your samples must be divisible by that equivalent batch size. For example, if you have 257000 samples and a per gpu batch of 16 (128 for 8 GPUs), then pass to the model a slice of 256000. I’m not sure if this is your case. Let me know how it goes.

AgrawalAmey · 2017-06-30T10:16:49Z

My initial conclusion was wrong. I had been running different configurations on g2.8xlarge and p2.8xlarge so that that the model could fit on the smaller cards K520. But strangely it seems to be somehow related to batch normalisation. It works only when I use batch normalisation. I couldn't figure out how exactly they are related yet.

AgrawalAmey · 2017-06-30T12:38:05Z

The one with the batch-norm works the other doesn't. Further, digging down it seems that only the first batch normalisation layer is important for the network to work. I tried to replace it with a linear activation layer but did not work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with make_parallel function #18

Error with make_parallel function #18

AgrawalAmey commented Jun 27, 2017

ChristianLagares commented Jun 30, 2017

AgrawalAmey commented Jun 30, 2017

AgrawalAmey commented Jun 30, 2017 •

edited

Loading

Error with make_parallel function #18

Error with make_parallel function #18

Comments

AgrawalAmey commented Jun 27, 2017

ChristianLagares commented Jun 30, 2017

AgrawalAmey commented Jun 30, 2017

AgrawalAmey commented Jun 30, 2017 • edited Loading

AgrawalAmey commented Jun 30, 2017 •

edited

Loading