Description
When I train a model built with a keras.applications app using a tensorflow backend, it never finishes a batch. When I use pytorch as a back end it trains fine.
Here is a working example:
mdl = keras.applications.MobileNet()
op = mdl.layers[85]
op2 = keras.layers.Conv2DTranspose(1, (32, 32), (32, 32))(op.output)
model2 = keras.models.Model(inputs = mdl.inputs, outputs = op2)
for layer in model2.layers:
if layer.name == "train_me":
print("training")
else:
layer.trainable = False
x = numpy.random.random( (4, 224, 224, 3))
y = numpy.random.random( (4, 224, 224, 1))
model2.compile(optimizer = keras.optimizers.Adam(0.0001), loss="mean_squared_error")
model2.fit(x, y)
If I run this with a tensorflow backend, then it never finishes. If I run it with a pytorch backend then it finishes very quickly, less than 1 second. I haven't seen the tensorflow version finish yet.
This is a warning I get from tensorflow:
2024-01-16 10:44:20.910445: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng27{k2=0,k12=-1,k13=2,k14=3,k15=0,k17=171,k18=1,k23=0} for conv (f32[1,4,224,224]{3,2,1,0}, u8[0]{0}) custom-call(f32[1,1024,32,32]{3,2,1,0}, f32[4,1024,193,193]{3,2,1,0}, f32[4]{0}), window={size=193x193 pad=192_192x192_192 rhs_reversal=1x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0} is taking a while...