Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with make_parallel function #18

Open
AgrawalAmey opened this issue Jun 27, 2017 · 3 comments
Open

Error with make_parallel function #18

AgrawalAmey opened this issue Jun 27, 2017 · 3 comments

Comments

@AgrawalAmey
Copy link

I got the following error while trying to use make_parallel function,

Traceback (most recent call last):
  File "model_language2motion.py", line 1335, in <module>
    main(parser.parse_args())
  File "model_language2motion.py", line 1202, in main
    args.func(args)
  File "model_language2motion.py", line 723, in train
    train_data, valid_data, model, optimizer = prepare_for_training(output_path, args)
  File "model_language2motion.py", line 677, in prepare_for_training
    model = make_parallel(model, 8)
  File "/workspace/deepAnim/make_parallel.py", line 31, in make_parallel
    outputs = model(inputs)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 635, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 172, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks))
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2247, in call
    output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2390, in run_internal_graph
    computed_mask))
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/recurrent.py", line 235, in call
    constants = self.get_constants(x)
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/recurrent.py", line 884, in get_constants
    ones = K.tile(ones, (1, int(input_dim)))
TypeError: int() argument must be a string or a number, not 'NoneType'

PS: The code works if the call to make_parallel is removed.

@ChristianLagares
Copy link

My two cents on the problem. I’ve been working on a g2.8xlarge as well, but I came across a similar issue. I manage to work around it by making the total number of samples divisible by the batch size. If you are multiplying your batch size by the number of gpus then your samples must be divisible by that equivalent batch size. For example, if you have 257000 samples and a per gpu batch of 16 (128 for 8 GPUs), then pass to the model a slice of 256000. I’m not sure if this is your case. Let me know how it goes.

@AgrawalAmey
Copy link
Author

My initial conclusion was wrong. I had been running different configurations on g2.8xlarge and p2.8xlarge so that that the model could fit on the smaller cards K520. But strangely it seems to be somehow related to batch normalisation. It works only when I use batch normalisation. I couldn't figure out how exactly they are related yet.

@AgrawalAmey
Copy link
Author

AgrawalAmey commented Jun 30, 2017

The one with the batch-norm works the other doesn't. Further, digging down it seems that only the first batch normalisation layer is important for the network to work. I tried to replace it with a linear activation layer but did not work.

screenshot from 2017-06-30 18-04-32
screenshot from 2017-06-30 18-04-52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants