Incompatible shapes #7

smhoang · 2017-02-08T17:15:04Z

I am running make_parallel with 2 GPUs, the error occurred with gradients/sub_grad/BroadcastGradientArgs:
"InvalidArgumentError (see above for traceback): Incompatible shapes: [483,1] vs. [482,1]
[[Node: gradients/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@sub"], _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_grad/Shape, gradients/sub_grad/Shape_1/_79)]]"

asaluja · 2017-02-21T21:24:09Z

I get the exact same error. Would appreciate some help on this.

xulabs · 2017-02-28T01:22:35Z

i get similar error.
i guess it is due to the last minibatch has an odd number of samples, however the paralleled model only produced even number of predictions

Caduceus96 · 2017-03-08T14:36:08Z

Did you hardcode the batch size in your first layer input (batch_input_shape), or give input_dim ?

asaluja · 2017-03-09T00:17:53Z

@Caduceus96 just gave input_dim. Batch size is hardcoded when I call fit

ktamiola · 2017-03-24T01:51:43Z

The same error here! Running Keras 2.0.2 with Tensorflow 0.12.1

InvalidArgumentError (see above for traceback): Incompatible shapes: [6376,256] vs. [6379,256]
         [[Node: gradients/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@sub"], _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_grad/Shape/_459, gradients/sub_grad/Shape_1)]]
         [[Node: gradients/concatenate_1/concat_grad/Slice_7/_491 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:7", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_3913_gradients/concatenate_1/concat_grad/Slice_7", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:7"]()]]

Eric2333 · 2017-04-03T00:42:35Z

It might be related to the function get_slice. I found out that if the number of input data is a multiple of your batch size, then there is no such error

Eric2333 · 2017-04-03T20:48:04Z

OK, I'm probably wrong. The error seems to come from my callback function. If I don't do callbacks, everything is fine no matter how many rows of input data.

miguelroboso · 2017-04-04T23:28:38Z

I actually see this error when I try to run the example in the website.

sumethy · 2017-04-21T12:41:14Z

Same as @Eric2333 , don't use callbacks or change them to lambda functions and it works fine.

jgustave · 2017-05-14T15:28:22Z

Also ran in to this error with Keras 2.0.3 and TensorFlow 1.1.0
It happens at the end of the first epoch of training. Possibly in calculating validation.
(I do use callbacks for checkpoint and early stopping).. will try without.

73997312/73997516 [============================>.] - ETA: 0s - loss: 12.1832/home/ubuntu/devhome/tensorwords2/multi_gpu.py:45: UserWarning: The merge function is deprecated and will be removed after 08/2017. Use instead layers from keras.layers.merge, e.g. add, concatenate, etc.
merged.append(merge(outputs, mode='concat', concat_axis=0))
/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/legacy/layers.py:460: UserWarning: The Merge layer is deprecated and will be removed after 08/2017. Use instead layers from keras.layers.merge, e.g. add, concatenate, etc.
name=name)
/home/ubuntu/devhome/tensorwords2/multi_gpu.py:47: UserWarning: Update your Model call to the Keras 2 API: Model(inputs=[<tf.Tenso..., outputs=[<tf.Tenso...)
return Model(input=model.inputs, output=merged)
Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1039, in _do_call
return fn(*args)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1021, in _run_fn
status, run_metadata)
File "/home/ubuntu/.pyenv/versions/3.6.1/lib/python3.6/contextlib.py", line 89, in exit
next(self.gen)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [204,34] vs. [200,34]
[[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_merge_1_target_0/_9, Log)]]
[[Node: gradients/merge_1/concat_grad/Slice_3/_529 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:3", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_12702_gradients/merge_1/concat_grad/Slice_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:3"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./TextGenLearn3.py", line 293, in
main()
File "./TextGenLearn3.py", line 290, in main
prep.gofit(model,(inputTrain,responseTrain),(inputValid,responseValid), args.output, args.epoch, args.patience, batchSize)
File "./TextGenLearn3.py", line 174, in gofit
initial_epoch=nextEpoch)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/engine/training.py", line 1486, in fit
initial_epoch=initial_epoch)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/engine/training.py", line 1141, in _fit_loop
outs = f(ins_batch)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2103, in call
feed_dict=feed_dict)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [204,34] vs. [200,34]
[[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_merge_1_target_0/_9, Log)]]
[[Node: gradients/merge_1/concat_grad/Slice_3/_529 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:3", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_12702_gradients/merge_1/concat_grad/Slice_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:3"]]

Caused by op 'mul', defined at:
File "./TextGenLearn3.py", line 293, in
main()
File "./TextGenLearn3.py", line 280, in main
model = prep.createModel(args.seqlen,numChars,args.lstmsize,args.numlayers,args.dropout,args.learnrate, args.parallel)
File "./TextGenLearn3.py", line 151, in createModel
optimizer=optimizer) # Categorical since we are 1-hot categorical.
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/engine/training.py", line 899, in compile
sample_weight, mask)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/engine/training.py", line 430, in weighted
score_array = fn(y_true, y_pred)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/losses.py", line 37, in categorical_crossentropy
return K.categorical_crossentropy(y_pred, y_true)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2582, in categorical_crossentropy
return - tf.reduce_sum(target * tf.log(output),
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 821, in binary_op_wrapper
return func(x, y, name=name)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1044, in _mul_dispatch
return gen_math_ops._mul(x, y, name=name)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1434, in _mul
result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [204,34] vs. [200,34]
[[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_merge_1_target_0/_9, Log)]]
[[Node: gradients/merge_1/concat_grad/Slice_3/_529 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:3", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_12702_gradients/merge_1/concat_grad/Slice_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:3"]]

jwilt1 · 2017-05-30T15:52:16Z

The number of samples just needs to be a mutiple of the total number of GPUs.
Ex. I had 68531 samples in in my input, and once I shaved that down to 68528 with 8 GPUs, it worked fine.

vense · 2017-06-05T01:24:39Z

@jwilt1 Thanks!! Your example is nice work.
I modified my code, the input sample size must be n_gpu times.

szhitansky · 2017-06-25T07:15:27Z

If you have large training set it's not an issue and you can always cut it like:

train_cut = len(train_index)%GPUs
train_index = train_index[:-train_cut]

And it works fine. But after training I have issue with predictions, it have to be multiple by GPUs as well.
Any ideas?

Caduceus96 · 2017-06-25T17:58:58Z

You can use the same kind of trick as for training, but instead of removing the last remainder elements you pad the end of your dataset to make it divisible by # of gpus, then select the unpadded indices as your actual prediction.

…

Sent from my iPhone

On Jun 25, 2017, at 3:15 AM, Sergey Zhitansky ***@***.***> wrote: If you have large training set it's not an issue and you can always cut it like: train_cut = len(train_index)%GPUs train_index = train_index[:-train_cut] And it works fine. But after training I have issue with predictions, it have to be multiple by GPUs as well. Any ideas? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jianglinghan · 2017-07-17T13:33:04Z

@Caduceus96 I sliced my training data into multiples of gpus, the first epoch runs well, but when it comes to the second epoch, error raises
3792/3800 [============================>.] - ETA: 0s - loss: 11.5726 - mean_squared_error: 1.9049Traceback (most recent call last):

......

InvalidArgumentError (see above for traceback): Incompatible shapes: [12,3] vs. [14,3] [[Node: sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](concatenate_2/concat/_851, _recv_concatenate_2_target_0/_853)]] [[Node: add_3/_857 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_3571_add_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

train_shape=[(3800, none, 1)] * 10,
valid_shape=[(254, none, 1)] * 10, corresponding to train_shape,
num_gpu = 4,
train_batch=16

Caduceus96 · 2017-07-17T14:12:45Z

Is your training set size evenly divisible by gpu #?

…

Sent from my iPhone

On Jul 17, 2017, at 9:33 AM, Ling-han Jiang ***@***.***> wrote: I sliced my training data into multiples of gpus, the first epoch runs well, but when it comes to the second epoch, error raises InvalidArgumentError (see above for traceback): Incompatible shapes: [12,3] vs. [14,3] [[Node: sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](concatenate_2/concat/_851, _recv_concatenate_2_target_0/_853)]] [[Node: add_3/_857 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_3571_add_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jianglinghan · 2017-07-17T14:23:08Z

@Caduceus96 I guess so, 3800/4 =950.

ktamiola · 2017-07-17T15:39:11Z

@JiangLing-han it is evident you are using small batch sizes during your training (as the progress bar output from your Keras model.train routine stops at 3792/3800.

You need to make sure your batches are of equal size and divisible by 4.

jianglinghan · 2017-07-17T17:00:04Z

@ktamiola @Caduceus96 I solved this problem by set size of validation set to multiples of 4.
The model was copied, valid data was sliced as well as train data. Many thanks for you. :)

jwilt1 · 2017-08-02T19:49:47Z

If you want to predict just one at a time, instead of a multiple of the GPUs used during training, you can create a 2nd model that is identical and load the weights of your parallelized model.

Create a model named model1
Create model2 by applying the make_parallel fuction to model1
Train model2 with 8 GPUs
Set model1 weights to weights of model2. model.set_weights(model2.get_weights())
Predict however many you want at a time using model1

model1.predict(val[0:10,:,:]) -> success
model2.predict(val[0:10,:,:]) -> ValueError: could not broadcast input array from shape (8,2) into shape (10,2)

DarkForte · 2017-08-22T03:49:46Z

Many thanks to your code!
I would suggest adding a note at the beginning of the make_parallel function to notify that the size of training/validation data should be divisible by the number of gpus. It would be opaque for a user to see why training is okay but after an epoch an exception of imcompatible shapes is thrown.

CeadeS · 2017-08-22T04:05:01Z

Has anyone else faced an error using regularizers? Using Layers like this:

def` conv2d_bn(x, nb_filter, nb_row, nb_col, padding='same', strides=(1, 1), bias=False):

 """
    Utility function to apply conv + BN.
    (Slightly modified from https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.py)
    """
    if K.image_data_format() == "channels_first":
        channel_axis = 1
    else:
        channel_axis = -1
    x = Convolution2D(nb_filter, (nb_row, nb_col),
                      strides=strides,
                      padding=padding,
                      use_bias=bias,
                      kernel_regularizer=regularizers.l2(0.00004), ##<---- causes error because no _loss 
                      kernel_initializer=initializers.VarianceScaling(scale=2.0, mode='fan_in', distribution='normal',
                                                                      seed=None))(x)
    x = BatchNormalization(axis=channel_axis, momentum=0.9997, scale=False)(x)
    x = Activation('relu')(x)
    return x

I get the error:
„AttributeError: 'Model' object has no attribute '_losses'„
caused by outputs = model (inputs) that merges the outputs of the different splits in one model.

DNXie · 2018-02-25T08:56:44Z

batch size : 64
number of batches : 20
number of GPUs: 2
The error I got:
InvalidArgumentError: Incompatible shapes: [64,2] vs. [128,2]
How can I deal with this?

zyxue · 2018-04-05T21:32:48Z

@DNXie, I am having the same error, the shape[0] gets halfed. Did you find a solution?

A related issue: keras-team/keras#9449

ghost · 2018-05-08T05:40:44Z

Same issue here with the latest Keras version.

umashgh · 2018-10-14T08:34:52Z

Hi, was a fix issued for this error? I am facing the same issue. model.fit works for batch size 64 when not using multi GPU. But when I put the same model through multi_gpu_model and call fit on it, it is raising error that 16 and 64 are incompatible shapes.

jayanti-prasad · 2018-12-14T07:08:30Z

I am getting the error
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [7600] vs. [400,19]
some of the pointers are as follows:

I get this error only when run my code on a GPU node (Tesla k80)
I do not get the error for batch_size = 1
I do not get the error when I do not use metrics=['accuracy'] in compile.
I get the error only for some particular architecture
All the problems reported above have problems with arrays of the same dimensionality [n1,n2]
vs [m1,m2] but my case is [n] vs [n/r, r]

full error is as follows:
MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Epoch 1/10
Traceback (most recent call last):
File "driver_training.py", line 66, in
history = ED.fit_model()
File "/home/ubuntu/2018-December/models/commom/v1/seq2seq_trainig.py", line 114, in fit_model
callbacks=callback(self.cfg))
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call
return self._call(inputs)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1382, in call
run_metadata_ptr)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [7600] vs. [400,19]
[[Node: metrics/acc/Equal = Equal[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](metrics/acc/Reshape, metrics/acc/Cast)]]
[[Node: loss/mul/_253 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4325_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

jayanti-prasad · 2018-12-14T08:57:28Z

here is full code

import numpy as np
from keras.models import Model
from keras import optimizers
from keras.layers import Input, Dense, Embedding
import keras

num_decoder_tokens=40
len_label_vector=20
latent_dim=300

train_labels_vecs = np.random.randint(num_decoder_tokens, size=(100, len_label_vector))

decoder_input_data = train_labels_vecs[:, :-1]
decoder_target_data = train_labels_vecs[:, 1:]

decoder_inputs = Input(shape=(None,), name='Decoder-Input') # for teacher forcing
x = Embedding(num_decoder_tokens, latent_dim, name='Decoder-Word-Embedding', mask_zero=False)(decoder_inputs)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax', name='Final-Output-Dense') (x)

seq2seq_Model = Model([decoder_inputs], decoder_outputs)

print(seq2seq_Model.summary())

seq2seq_Model.compile(optimizer=optimizers.Nadam(lr=0.001),
loss='sparse_categorical_crossentropy', metrics=['accuracy'])

history = seq2seq_Model.fit([decoder_input_data],
np.expand_dims(decoder_target_data, -1),validation_split=0.12,epochs=10,batch_size=2)

davidkorea · 2019-02-12T08:05:51Z

@jayanti-prasad

same error and the followings are completely true when i run a seq2seq architecture on a local pc.

I do not get the error for batch_size = 1
I do not get the error when I do not use metrics=['accuracy'] in compile.

BUT, there is no error when i run the codes on a kaggle kernel with the same tf version1.12.0 and the keras version2.2.4.

TianrenWang · 2019-03-25T08:43:06Z

I also have a very similar error and changing the batch size and sample size to fit the multiple of GPU doesn't solve the problem. My error is as follows:

InvalidArgumentError: Incompatible shapes: [128,32,32,3] vs. [256,32,32,3]
	 [[{{node replica_1/sequential_1/conv_lst_m2d_1/while/mul_3}} = Mul[T=DT_FLOAT, _class=["loc:@train...rayWriteV3"], _device="/job:localhost/replica:0/task:0/device:GPU:1"](replica_1/sequential_1/conv_lst_m2d_1/while/TensorArrayReadV3, replica_1/sequential_1/conv_lst_m2d_1/while/mul_3/Enter)]]
	 [[{{node loss/mul/_305}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5049_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

This problem only happens when the model has a ConvLSTM2D layer, without it the code runs just fine. As for other properties:

I am using 2 GPUs
Sample size 2048
batch size 256
Each of my input sample has shape [21, 32, 32, 1] where 21 is the temporal size, 32 x 32 image, 1 channel

andrenatal · 2019-05-12T00:42:13Z

Same here:

Training LSTMs
4 GPUs
changing the batch and sample size to make then multiple to the # of gpus doesn't work
Worked when removed metrics=['accuracy']
I do not get the error for batch_size = 1

Keras 2.2.4
TF 1.13.1

dagseyithan · 2019-05-15T14:22:50Z

Getting the same error at the end of the first epoch with only 1 GPU. I am using a generator (Sequence), and when I set shuffle = True, the error gets thrown in the middle of somewhere during the first epoch instead of the end.

Keras 2.1.6
tf 1.13.1

Update:
I solved the problem. Apparently the generator has problem with the last batch. If the number of samples in the last batch is less than the others, this error is somehow thrown. Thereby the only thing to do is to bypass the last batch. To achieve this I edited the __len__ function of the generator, added -1:

def __init__(self, x_set, y_set, batch_size):
    self.x, self.y = x_set, y_set
    self.batch_size = batch_size

def __len__(self):
    return int(np.ceil(len(self.x) / float(self.batch_size))) - 1

def __getitem__(self, idx):
    batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
    batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
                                       
    return batch_x, batch_y

jayanti-prasad · 2019-06-20T08:20:37Z

I again the same error and can reproduce with the code I have pasted earlier. With batch_size=1 no problem.

I have -

tensorflow==1.14.0
keras==2.2.4-tf

machine : Intel(R) Xeon(R) Platinum 8153 CPU @ 2.00GHz

Traceback (most recent call last):
File "test1.py", line 28, in
np.expand_dims(decoder_target_data, -1),validation_split=0.12,epochs=10,batch_size=2)
File "/home/u26958/Software/codx_env1/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/home/u26958/Software/codx_env1/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/home/u26958/Software/codx_env1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call
return self._call(inputs)
File "/home/u26958/Software/codx_env1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/home/u26958/Software/codx_env1/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [38] vs. [2,19]

bhavyakariwal9 · 2019-07-08T10:23:57Z

I am very new to deep learning and getting familiar with its theory slowly.I am also getting a similar kind of error. Can anyone explain what and why this error could have occured, does it hacve something to do with the weight's size?.

InvalidArgumentError: Incompatible shapes: [786432] vs. [131072]
[[{{node training/Adam/gradients/loss_1/conv2d_24_loss/mul_1_grad/BroadcastGradientArgs}}]]

It would be great if someone could help me out here.

Atakey · 2019-08-11T15:22:54Z

I am very new to deep learning and getting familiar with its theory slowly.I am also getting a similar kind of error. Can anyone explain what and why this error could have occured, does it hacve something to do with the weight's size?.

InvalidArgumentError: Incompatible shapes: [786432] vs. [131072]
[[{{node training/Adam/gradients/loss_1/conv2d_24_loss/mul_1_grad/BroadcastGradientArgs}}]]

It would be great if someone could help me out here.

Have you solve it? I met the similar kind of error about "BroadcastGradientArgs".It would be great if you could reply to me here. Thx. @bhavyakariwal9

zhaoyue3513247 · 2020-06-03T09:35:15Z

don't use callbacks or change them to lambda functions and it works fine.

it dose not worked.Still error occur

zhaoyue3513247 · 2020-06-03T09:37:28Z

@jwilt1 Thanks!! Your example is nice work.
I modified my code, the input sample size must be n_gpu times.

I think this answer is too simple that everybody can find it.But it sitll not work

Mellak · 2021-01-08T08:35:53Z

The number of samples just needs to be a mutiple of the total number of GPUs.
Ex. I had 68531 samples in in my input, and once I shaved that down to 68528 with 8 GPUs, it worked fine.

This worked fine for me, thanks a looooot

srv-sh · 2021-05-02T14:48:09Z

i got similar problem but i have no GPU in my system. how can i solve this error.
InvalidArgumentError: Incompatible shapes: [128,100,64] vs. [36,64]
[[node gradient_tape/model/patch_encoder_1/add/BroadcastGradientArgs (defined at :3) ]] [Op:__inference_train_function_49081]

amritangshudey · 2023-02-14T17:51:17Z

i also faced the same error. It was resolved by making two changes:-

input size should be multiple of batch size.
2)batch size should be equal to num_heads.

But i dont know how or why it works , if someone can explain??

bzamecnik mentioned this issue Jul 27, 2017

The speed of using multi gpus #21

Open

woctezuma mentioned this issue Nov 7, 2021

Changing batch size and using multiple gpu makes Incompatible shapes issue. google-research/xmcgan_image_generation#9

Open

Incompatible shapes #7

Incompatible shapes #7

Comments

smhoang commented Feb 8, 2017

asaluja commented Feb 21, 2017

xulabs commented Feb 28, 2017

Caduceus96 commented Mar 8, 2017

asaluja commented Mar 9, 2017 • edited Loading

ktamiola commented Mar 24, 2017

Eric2333 commented Apr 3, 2017 • edited Loading

Eric2333 commented Apr 3, 2017

miguelroboso commented Apr 4, 2017

sumethy commented Apr 21, 2017

jgustave commented May 14, 2017 • edited Loading

jwilt1 commented May 30, 2017

vense commented Jun 5, 2017

szhitansky commented Jun 25, 2017

Caduceus96 commented Jun 25, 2017 via email

jianglinghan commented Jul 17, 2017 • edited Loading

Caduceus96 commented Jul 17, 2017 via email

jianglinghan commented Jul 17, 2017

ktamiola commented Jul 17, 2017

jianglinghan commented Jul 17, 2017

jwilt1 commented Aug 2, 2017

DarkForte commented Aug 22, 2017

CeadeS commented Aug 22, 2017 • edited Loading

DNXie commented Feb 25, 2018

zyxue commented Apr 5, 2018 • edited Loading

ghost commented May 8, 2018

umashgh commented Oct 14, 2018

jayanti-prasad commented Dec 14, 2018

jayanti-prasad commented Dec 14, 2018

davidkorea commented Feb 12, 2019

TianrenWang commented Mar 25, 2019

andrenatal commented May 12, 2019 • edited Loading

dagseyithan commented May 15, 2019 • edited Loading

jayanti-prasad commented Jun 20, 2019

machine : Intel(R) Xeon(R) Platinum 8153 CPU @ 2.00GHz

bhavyakariwal9 commented Jul 8, 2019

Atakey commented Aug 11, 2019 • edited Loading

zhaoyue3513247 commented Jun 3, 2020

zhaoyue3513247 commented Jun 3, 2020

Mellak commented Jan 8, 2021

srv-sh commented May 2, 2021

amritangshudey commented Feb 14, 2023

asaluja commented Mar 9, 2017 •

edited

Loading

Eric2333 commented Apr 3, 2017 •

edited

Loading

jgustave commented May 14, 2017 •

edited

Loading

jianglinghan commented Jul 17, 2017 •

edited

Loading

CeadeS commented Aug 22, 2017 •

edited

Loading

zyxue commented Apr 5, 2018 •

edited

Loading

andrenatal commented May 12, 2019 •

edited

Loading

dagseyithan commented May 15, 2019 •

edited

Loading

Atakey commented Aug 11, 2019 •

edited

Loading