Skip to content
This repository has been archived by the owner on Nov 1, 2021. It is now read-only.

autograd.optim wrapper does not update state value #121

Open
hbqjzj opened this issue May 10, 2016 · 5 comments
Open

autograd.optim wrapper does not update state value #121

hbqjzj opened this issue May 10, 2016 · 5 comments

Comments

@hbqjzj
Copy link

hbqjzj commented May 10, 2016

It seems that autograd.optim wrapper doesn't update "state" value. It outputs "states", but it is impossible to use it iteratively.

local function wrap(optimfn)
   return function(fn, state, params)
      local states = { }
      local flatParams = util.sortedFlatten(params)
      for i = 1, #flatParams do
         states[i] = util.deepCopy(state)   --this is a deep copy of state, so it is not updated
      end
      return function(...)
         local out = {fn(params, ...)}
         local grads, loss = out[1], out[2]
         local flatGrads = util.sortedFlatten(grads)
         for i = 1, #flatGrads do
            local grad = flatGrads[i]
            optimfn(function()
               return loss, grad
            end, flatParams[i], states[i])
         end
         return table.unpack(out)
      end, states   --now, states is a table of states, which is impossible to pass back to the wrapper
   end
end
@ghostcow
Copy link
Contributor

ghostcow commented Jul 7, 2016

That's fine, it gives you the states table it uses so you can manually change things per weight tensor (if you wish), but you don't actually pass it back to any function.
It's saved inside the wrapper closure already.

Here's how to use the optim wrapper:

  • optim/init.lua calls wrap() exactly once on initialization (per optimization method):
for k, v in pairs(require 'optim') do
   opt[k] = wrap(v)
end

return opt
  • you call autograd.optim.sgd(df,state,params) ONCE to get the optimizing function:
local df = autograd(f, {optimize = true})
local state = {learningRate=1e-2}
local optimizer, states = autograd.optim.sgd(df, state, params)
  • you then use optimizer to update your weights at each iteration:
local grads, loss = optimizer(data, target)

*Example adapted from optim tests here and here

@hbqjzj
Copy link
Author

hbqjzj commented Jul 7, 2016

The first test case works because the learning rate does not decay. However, in most cases we want the learning rate to decay, and that needs the number of iteration, which is stored in the variable states. Additionally, the moments matrices are also stored in the variable states. The call

local grads, loss = optimizer(data, target)

will always assume the number of iteration is zero (or one) and moment matrices are empty in every iteration.

@szagoruyko
Copy link
Contributor

@eugenium there doesn't seem to be an issue. state is created once and then kept as a local variable in optimizer.

@eugenium
Copy link

Ah yea I see now. thanks.

@synchro--
Copy link

@ghostcow So, is there a complete working example (like the mnist one) on how to use Optim?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants