Support for Transfer-Learning/Layer-Freezing #150

JoshuaBillson · 2023-07-14T03:59:57Z

Motivation and description

A common practice in machine learning is to take a pre-trained model and fine-tune it on a particular dataset. This typically involves freezing the weights in some layers while fitting the output layer(s) on the new data.

Unfortunately, this functionally appears to be incompatible with the current implementation of the ToDevice callback based on the following code:

function on(::EpochBegin, ::Phase, cb::ToDevice, learner)
    model!(learner, cb.movemodelfn(learner.model))
end

function model!(learner, model)
    learner.model = model
    learner.params = setupoptimstate(model, learner.optimizer)
end

setupoptimstate(model, ::Flux.Optimise.AbstractOptimiser) = Flux.params(model)

setupoptimstate(model, optim) = Optimisers.setup(optim, model)

This essentially means that learner.params is set to the parameters of the full model at the start of each epoch. Thus, even if we try to freeze the layers manually with Flux.freeze!(learner.params.layers[1:end-1]), this will be undone by ToDevice.

Possible Implementation

One solution that would work with Flux's new explicit optimizers would be to create a callback to freeze layers after ToDevice is executed. An example is given below:

mutable struct LayerFreezing{F} <: FluxTraining.Callback
    accessor::F
end

function FluxTraining.stateaccess(scheduler::LayerFreezing)
    return (;params = FluxTraining.Write())
end

function FluxTraining.on(
    event::FluxTraining.EpochBegin, 
    phase::FluxTraining.AbstractTrainingPhase, 
    freezer::LayerFreezing, 
    learner)
    Flux.freeze!(freezer.accessor(learner.params))
end

FluxTraining.runafter(::LayerFreezing) = (FluxTraining.ToDevice,)

However, perhaps we should consider whether it's necessary for ToDevice to move the model to the GPU at the start of every epoch. Maybe we could extend the Callback interface to allow for some one-time setup code to run before the first epoch is executed?

The text was updated successfully, but these errors were encountered:

JoshuaBillson · 2023-07-14T04:38:20Z

I think this issue may also be related to #148. In particular, the memory leak appears to be caused by ToDevice resetting the optimizer in each epoch. We could potentially kill two birds with one stone by changing this behaviour.

vargonis · 2024-06-03T19:38:20Z

Any update on this? Also, I'd really appreciate if the potential implementation above is turned into a complete example to build upon (for users like me who know nothing about the internals of FluxTraining.jl). Thanks!

ToucheSir added enhancement New feature or request help wanted Extra attention is needed labels Jul 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Transfer-Learning/Layer-Freezing #150

Support for Transfer-Learning/Layer-Freezing #150

JoshuaBillson commented Jul 14, 2023 •

edited

Loading

JoshuaBillson commented Jul 14, 2023

vargonis commented Jun 3, 2024

Support for Transfer-Learning/Layer-Freezing #150

Support for Transfer-Learning/Layer-Freezing #150

Comments

JoshuaBillson commented Jul 14, 2023 • edited Loading

Motivation and description

Possible Implementation

JoshuaBillson commented Jul 14, 2023

vargonis commented Jun 3, 2024

JoshuaBillson commented Jul 14, 2023 •

edited

Loading