Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible shapes #7

Open
smhoang opened this issue Feb 8, 2017 · 40 comments
Open

Incompatible shapes #7

smhoang opened this issue Feb 8, 2017 · 40 comments

Comments

@smhoang
Copy link

smhoang commented Feb 8, 2017

I am running make_parallel with 2 GPUs, the error occurred with gradients/sub_grad/BroadcastGradientArgs:
"InvalidArgumentError (see above for traceback): Incompatible shapes: [483,1] vs. [482,1]
[[Node: gradients/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@sub"], _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_grad/Shape, gradients/sub_grad/Shape_1/_79)]]"

@asaluja
Copy link

asaluja commented Feb 21, 2017

I get the exact same error. Would appreciate some help on this.

@xulabs
Copy link

xulabs commented Feb 28, 2017

i get similar error.
i guess it is due to the last minibatch has an odd number of samples, however the paralleled model only produced even number of predictions

@Caduceus96
Copy link

Did you hardcode the batch size in your first layer input (batch_input_shape), or give input_dim ?

@asaluja
Copy link

asaluja commented Mar 9, 2017

@Caduceus96 just gave input_dim. Batch size is hardcoded when I call fit

@ktamiola
Copy link

The same error here! Running Keras 2.0.2 with Tensorflow 0.12.1

InvalidArgumentError (see above for traceback): Incompatible shapes: [6376,256] vs. [6379,256]
         [[Node: gradients/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@sub"], _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_grad/Shape/_459, gradients/sub_grad/Shape_1)]]
         [[Node: gradients/concatenate_1/concat_grad/Slice_7/_491 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:7", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_3913_gradients/concatenate_1/concat_grad/Slice_7", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:7"]()]]

@Eric2333
Copy link

Eric2333 commented Apr 3, 2017

It might be related to the function get_slice. I found out that if the number of input data is a multiple of your batch size, then there is no such error

@Eric2333
Copy link

Eric2333 commented Apr 3, 2017

OK, I'm probably wrong. The error seems to come from my callback function. If I don't do callbacks, everything is fine no matter how many rows of input data.

@miguelroboso
Copy link

I actually see this error when I try to run the example in the website.

@sumethy
Copy link

sumethy commented Apr 21, 2017

Same as @Eric2333 , don't use callbacks or change them to lambda functions and it works fine.

@jgustave
Copy link

jgustave commented May 14, 2017

Also ran in to this error with Keras 2.0.3 and TensorFlow 1.1.0
It happens at the end of the first epoch of training. Possibly in calculating validation.
(I do use callbacks for checkpoint and early stopping).. will try without.

73997312/73997516 [============================>.] - ETA: 0s - loss: 12.1832/home/ubuntu/devhome/tensorwords2/multi_gpu.py:45: UserWarning: The merge function is deprecated and will be removed after 08/2017. Use instead layers from keras.layers.merge, e.g. add, concatenate, etc.
merged.append(merge(outputs, mode='concat', concat_axis=0))
/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/legacy/layers.py:460: UserWarning: The Merge layer is deprecated and will be removed after 08/2017. Use instead layers from keras.layers.merge, e.g. add, concatenate, etc.
name=name)
/home/ubuntu/devhome/tensorwords2/multi_gpu.py:47: UserWarning: Update your Model call to the Keras 2 API: Model(inputs=[<tf.Tenso..., outputs=[<tf.Tenso...)
return Model(input=model.inputs, output=merged)
Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1039, in _do_call
return fn(*args)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1021, in _run_fn
status, run_metadata)
File "/home/ubuntu/.pyenv/versions/3.6.1/lib/python3.6/contextlib.py", line 89, in exit
next(self.gen)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [204,34] vs. [200,34]
[[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_merge_1_target_0/_9, Log)]]
[[Node: gradients/merge_1/concat_grad/Slice_3/_529 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:3", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_12702_gradients/merge_1/concat_grad/Slice_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:3"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./TextGenLearn3.py", line 293, in
main()
File "./TextGenLearn3.py", line 290, in main
prep.gofit(model,(inputTrain,responseTrain),(inputValid,responseValid), args.output, args.epoch, args.patience, batchSize)
File "./TextGenLearn3.py", line 174, in gofit
initial_epoch=nextEpoch)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/engine/training.py", line 1486, in fit
initial_epoch=initial_epoch)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/engine/training.py", line 1141, in _fit_loop
outs = f(ins_batch)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2103, in call
feed_dict=feed_dict)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [204,34] vs. [200,34]
[[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_merge_1_target_0/_9, Log)]]
[[Node: gradients/merge_1/concat_grad/Slice_3/_529 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:3", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_12702_gradients/merge_1/concat_grad/Slice_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:3"]]

Caused by op 'mul', defined at:
File "./TextGenLearn3.py", line 293, in
main()
File "./TextGenLearn3.py", line 280, in main
model = prep.createModel(args.seqlen,numChars,args.lstmsize,args.numlayers,args.dropout,args.learnrate, args.parallel)
File "./TextGenLearn3.py", line 151, in createModel
optimizer=optimizer) # Categorical since we are 1-hot categorical.
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/engine/training.py", line 899, in compile
sample_weight, mask)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/engine/training.py", line 430, in weighted
score_array = fn(y_true, y_pred)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/losses.py", line 37, in categorical_crossentropy
return K.categorical_crossentropy(y_pred, y_true)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2582, in categorical_crossentropy
return - tf.reduce_sum(target * tf.log(output),
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 821, in binary_op_wrapper
return func(x, y, name=name)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1044, in _mul_dispatch
return gen_math_ops._mul(x, y, name=name)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1434, in _mul
result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/ubuntu/.pyenv/versions/tensor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [204,34] vs. [200,34]
[[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_merge_1_target_0/_9, Log)]]
[[Node: gradients/merge_1/concat_grad/Slice_3/_529 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:3", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_12702_gradients/merge_1/concat_grad/Slice_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:3"]]

@jwilt1
Copy link

jwilt1 commented May 30, 2017

The number of samples just needs to be a mutiple of the total number of GPUs.
Ex. I had 68531 samples in in my input, and once I shaved that down to 68528 with 8 GPUs, it worked fine.

@vense
Copy link

vense commented Jun 5, 2017

@jwilt1 Thanks!! Your example is nice work.
I modified my code, the input sample size must be n_gpu times.

@szhitansky
Copy link

If you have large training set it's not an issue and you can always cut it like:

train_cut = len(train_index)%GPUs
train_index = train_index[:-train_cut]

And it works fine. But after training I have issue with predictions, it have to be multiple by GPUs as well.
Any ideas?

@Caduceus96
Copy link

Caduceus96 commented Jun 25, 2017 via email

@jianglinghan
Copy link

jianglinghan commented Jul 17, 2017

@Caduceus96 I sliced my training data into multiples of gpus, the first epoch runs well, but when it comes to the second epoch, error raises
3792/3800 [============================>.] - ETA: 0s - loss: 11.5726 - mean_squared_error: 1.9049Traceback (most recent call last):

......

InvalidArgumentError (see above for traceback): Incompatible shapes: [12,3] vs. [14,3] [[Node: sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](concatenate_2/concat/_851, _recv_concatenate_2_target_0/_853)]] [[Node: add_3/_857 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_3571_add_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

train_shape=[(3800, none, 1)] * 10,
valid_shape=[(254, none, 1)] * 10, corresponding to train_shape,
num_gpu = 4,
train_batch=16

@Caduceus96
Copy link

Caduceus96 commented Jul 17, 2017 via email

@jianglinghan
Copy link

@Caduceus96 I guess so, 3800/4 =950.

@ktamiola
Copy link

@JiangLing-han it is evident you are using small batch sizes during your training (as the progress bar output from your Keras model.train routine stops at 3792/3800.

You need to make sure your batches are of equal size and divisible by 4.

@jianglinghan
Copy link

@ktamiola @Caduceus96 I solved this problem by set size of validation set to multiples of 4.
The model was copied, valid data was sliced as well as train data. Many thanks for you. :)

@jwilt1
Copy link

jwilt1 commented Aug 2, 2017

If you want to predict just one at a time, instead of a multiple of the GPUs used during training, you can create a 2nd model that is identical and load the weights of your parallelized model.

  1. Create a model named model1
  2. Create model2 by applying the make_parallel fuction to model1
  3. Train model2 with 8 GPUs
  4. Set model1 weights to weights of model2. model.set_weights(model2.get_weights())
  5. Predict however many you want at a time using model1

model1.predict(val[0:10,:,:]) -> success
model2.predict(val[0:10,:,:]) -> ValueError: could not broadcast input array from shape (8,2) into shape (10,2)

@DarkForte
Copy link

Many thanks to your code!
I would suggest adding a note at the beginning of the make_parallel function to notify that the size of training/validation data should be divisible by the number of gpus. It would be opaque for a user to see why training is okay but after an epoch an exception of imcompatible shapes is thrown.

@CeadeS
Copy link

CeadeS commented Aug 22, 2017

Has anyone else faced an error using regularizers? Using Layers like this:

def` conv2d_bn(x, nb_filter, nb_row, nb_col, padding='same', strides=(1, 1), bias=False):

 """
    Utility function to apply conv + BN.
    (Slightly modified from https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.py)
    """
    if K.image_data_format() == "channels_first":
        channel_axis = 1
    else:
        channel_axis = -1
    x = Convolution2D(nb_filter, (nb_row, nb_col),
                      strides=strides,
                      padding=padding,
                      use_bias=bias,
                      kernel_regularizer=regularizers.l2(0.00004), ##<---- causes error because no _loss 
                      kernel_initializer=initializers.VarianceScaling(scale=2.0, mode='fan_in', distribution='normal',
                                                                      seed=None))(x)
    x = BatchNormalization(axis=channel_axis, momentum=0.9997, scale=False)(x)
    x = Activation('relu')(x)
    return x

I get the error:
„AttributeError: 'Model' object has no attribute '_losses'„
caused by outputs = model (inputs) that merges the outputs of the different splits in one model.

@DNXie
Copy link

DNXie commented Feb 25, 2018

batch size : 64
number of batches : 20
number of GPUs: 2
The error I got:
InvalidArgumentError: Incompatible shapes: [64,2] vs. [128,2]
How can I deal with this?

@zyxue
Copy link

zyxue commented Apr 5, 2018

@DNXie, I am having the same error, the shape[0] gets halfed. Did you find a solution?

A related issue: keras-team/keras#9449

@ghost
Copy link

ghost commented May 8, 2018

Same issue here with the latest Keras version.

@umashgh
Copy link

umashgh commented Oct 14, 2018

Hi, was a fix issued for this error? I am facing the same issue. model.fit works for batch size 64 when not using multi GPU. But when I put the same model through multi_gpu_model and call fit on it, it is raising error that 16 and 64 are incompatible shapes.

@jayanti-prasad
Copy link

I am getting the error
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [7600] vs. [400,19]
some of the pointers are as follows:

  1. I get this error only when run my code on a GPU node (Tesla k80)
  2. I do not get the error for batch_size = 1
  3. I do not get the error when I do not use metrics=['accuracy'] in compile.
  4. I get the error only for some particular architecture
  5. All the problems reported above have problems with arrays of the same dimensionality [n1,n2]
    vs [m1,m2] but my case is [n] vs [n/r, r]

full error is as follows:
MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Epoch 1/10
Traceback (most recent call last):
File "driver_training.py", line 66, in
history = ED.fit_model()
File "/home/ubuntu/2018-December/models/commom/v1/seq2seq_trainig.py", line 114, in fit_model
callbacks=callback(self.cfg))
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call
return self._call(inputs)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1382, in call
run_metadata_ptr)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [7600] vs. [400,19]
[[Node: metrics/acc/Equal = Equal[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](metrics/acc/Reshape, metrics/acc/Cast)]]
[[Node: loss/mul/_253 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4325_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

@jayanti-prasad
Copy link

here is full code

import numpy as np
from keras.models import Model
from keras import optimizers
from keras.layers import Input, Dense, Embedding
import keras

num_decoder_tokens=40
len_label_vector=20
latent_dim=300

train_labels_vecs = np.random.randint(num_decoder_tokens, size=(100, len_label_vector))

decoder_input_data = train_labels_vecs[:, :-1]
decoder_target_data = train_labels_vecs[:, 1:]

decoder_inputs = Input(shape=(None,), name='Decoder-Input') # for teacher forcing
x = Embedding(num_decoder_tokens, latent_dim, name='Decoder-Word-Embedding', mask_zero=False)(decoder_inputs)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax', name='Final-Output-Dense') (x)

seq2seq_Model = Model([decoder_inputs], decoder_outputs)

print(seq2seq_Model.summary())

seq2seq_Model.compile(optimizer=optimizers.Nadam(lr=0.001),
loss='sparse_categorical_crossentropy', metrics=['accuracy'])

history = seq2seq_Model.fit([decoder_input_data],
np.expand_dims(decoder_target_data, -1),validation_split=0.12,epochs=10,batch_size=2)

@davidkorea
Copy link

@jayanti-prasad

same error and the followings are completely true when i run a seq2seq architecture on a local pc.

  • I do not get the error for batch_size = 1
  • I do not get the error when I do not use metrics=['accuracy'] in compile.

BUT, there is no error when i run the codes on a kaggle kernel with the same tf version1.12.0 and the keras version2.2.4.

@TianrenWang
Copy link

I also have a very similar error and changing the batch size and sample size to fit the multiple of GPU doesn't solve the problem. My error is as follows:

InvalidArgumentError: Incompatible shapes: [128,32,32,3] vs. [256,32,32,3]
	 [[{{node replica_1/sequential_1/conv_lst_m2d_1/while/mul_3}} = Mul[T=DT_FLOAT, _class=["loc:@train...rayWriteV3"], _device="/job:localhost/replica:0/task:0/device:GPU:1"](replica_1/sequential_1/conv_lst_m2d_1/while/TensorArrayReadV3, replica_1/sequential_1/conv_lst_m2d_1/while/mul_3/Enter)]]
	 [[{{node loss/mul/_305}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5049_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

This problem only happens when the model has a ConvLSTM2D layer, without it the code runs just fine. As for other properties:

  • I am using 2 GPUs
  • Sample size 2048
  • batch size 256
  • Each of my input sample has shape [21, 32, 32, 1] where 21 is the temporal size, 32 x 32 image, 1 channel

@andrenatal
Copy link

andrenatal commented May 12, 2019

Same here:

  • Training LSTMs
  • 4 GPUs
  • changing the batch and sample size to make then multiple to the # of gpus doesn't work
  • Worked when removed metrics=['accuracy']
  • I do not get the error for batch_size = 1

Keras 2.2.4
TF 1.13.1

@dagseyithan
Copy link

dagseyithan commented May 15, 2019

Getting the same error at the end of the first epoch with only 1 GPU. I am using a generator (Sequence), and when I set shuffle = True, the error gets thrown in the middle of somewhere during the first epoch instead of the end.

Keras 2.1.6
tf 1.13.1

Update:
I solved the problem. Apparently the generator has problem with the last batch. If the number of samples in the last batch is less than the others, this error is somehow thrown. Thereby the only thing to do is to bypass the last batch. To achieve this I edited the __len__ function of the generator, added -1:

def __init__(self, x_set, y_set, batch_size):
    self.x, self.y = x_set, y_set
    self.batch_size = batch_size

def __len__(self):
    return int(np.ceil(len(self.x) / float(self.batch_size))) - 1

def __getitem__(self, idx):
    batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
    batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
                                       
    return batch_x, batch_y

@jayanti-prasad
Copy link

I again the same error and can reproduce with the code I have pasted earlier. With batch_size=1 no problem.

I have -

tensorflow==1.14.0
keras==2.2.4-tf

machine : Intel(R) Xeon(R) Platinum 8153 CPU @ 2.00GHz

Traceback (most recent call last):
File "test1.py", line 28, in
np.expand_dims(decoder_target_data, -1),validation_split=0.12,epochs=10,batch_size=2)
File "/home/u26958/Software/codx_env1/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/home/u26958/Software/codx_env1/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/home/u26958/Software/codx_env1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call
return self._call(inputs)
File "/home/u26958/Software/codx_env1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/home/u26958/Software/codx_env1/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [38] vs. [2,19]

@bhavyakariwal9
Copy link

I am very new to deep learning and getting familiar with its theory slowly.I am also getting a similar kind of error. Can anyone explain what and why this error could have occured, does it hacve something to do with the weight's size?.

InvalidArgumentError: Incompatible shapes: [786432] vs. [131072]
[[{{node training/Adam/gradients/loss_1/conv2d_24_loss/mul_1_grad/BroadcastGradientArgs}}]]

It would be great if someone could help me out here.

@Atakey
Copy link

Atakey commented Aug 11, 2019

I am very new to deep learning and getting familiar with its theory slowly.I am also getting a similar kind of error. Can anyone explain what and why this error could have occured, does it hacve something to do with the weight's size?.

InvalidArgumentError: Incompatible shapes: [786432] vs. [131072]
[[{{node training/Adam/gradients/loss_1/conv2d_24_loss/mul_1_grad/BroadcastGradientArgs}}]]

It would be great if someone could help me out here.

Have you solve it? I met the similar kind of error about "BroadcastGradientArgs".It would be great if you could reply to me here. Thx. @bhavyakariwal9

@zhaoyue3513247
Copy link

don't use callbacks or change them to lambda functions and it works fine.

it dose not worked.Still error occur

@zhaoyue3513247
Copy link

@jwilt1 Thanks!! Your example is nice work.
I modified my code, the input sample size must be n_gpu times.

I think this answer is too simple that everybody can find it.But it sitll not work

@Mellak
Copy link

Mellak commented Jan 8, 2021

The number of samples just needs to be a mutiple of the total number of GPUs.
Ex. I had 68531 samples in in my input, and once I shaved that down to 68528 with 8 GPUs, it worked fine.

This worked fine for me, thanks a looooot

@srv-sh
Copy link

srv-sh commented May 2, 2021

i got similar problem but i have no GPU in my system. how can i solve this error.
InvalidArgumentError: Incompatible shapes: [128,100,64] vs. [36,64]
[[node gradient_tape/model/patch_encoder_1/add/BroadcastGradientArgs (defined at :3) ]] [Op:__inference_train_function_49081]

@amritangshudey
Copy link

i also faced the same error. It was resolved by making two changes:-

  1. input size should be multiple of batch size.
    2)batch size should be equal to num_heads.

But i dont know how or why it works , if someone can explain??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests