Skip to content

Training a keras application doesn't work with tensorflow backend, but it does work with pytorch. #19061

Open
@odinsbane

Description

@odinsbane

When I train a model built with a keras.applications app using a tensorflow backend, it never finishes a batch. When I use pytorch as a back end it trains fine.

Here is a working example:

    mdl = keras.applications.MobileNet()
    op = mdl.layers[85]
    op2 = keras.layers.Conv2DTranspose(1, (32, 32), (32, 32))(op.output)
    model2 = keras.models.Model(inputs = mdl.inputs, outputs = op2)

    for layer in model2.layers:
        if layer.name == "train_me":
            print("training")
        else:
            layer.trainable = False
    
    x = numpy.random.random( (4, 224, 224, 3))
    y = numpy.random.random( (4, 224, 224, 1))

    model2.compile(optimizer = keras.optimizers.Adam(0.0001), loss="mean_squared_error")

    model2.fit(x, y)

If I run this with a tensorflow backend, then it never finishes. If I run it with a pytorch backend then it finishes very quickly, less than 1 second. I haven't seen the tensorflow version finish yet.

This is a warning I get from tensorflow:

2024-01-16 10:44:20.910445: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng27{k2=0,k12=-1,k13=2,k14=3,k15=0,k17=171,k18=1,k23=0} for conv (f32[1,4,224,224]{3,2,1,0}, u8[0]{0}) custom-call(f32[1,1024,32,32]{3,2,1,0}, f32[4,1024,193,193]{3,2,1,0}, f32[4]{0}), window={size=193x193 pad=192_192x192_192 rhs_reversal=1x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0} is taking a while...

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions