Skip to content
bayerj edited this page Feb 28, 2012 · 1 revision

This is a draft.

Optimizer classes are objects that work adapt a parameter wrt to minimize an objective function. For writing the optimizers, the following contract should hold.

Signature of __init__

  • The first parameter is always wrt,
  • All callables of a functional form of the loss (that is the loss itself, its derivatives, etc,called functions from now on) follow,
  • meta parameters come next,
  • iterators of arguments to the functions are the first optional keyword arguments.

Nature of wrt

There are two ways wrt can behave.

  1. It is a numpy array with one dimension. If the parameters to be optimized have to have another form (e.g. a two dimensional matrix) the corresponding views have to be created from it.
  2. It is a pair (get_wrt, set_wrt) which represents two functions. The first retrieves the latest parameter vector as a numpy array, the second sets the current parameter vector to the given value (which is a numpy array again).

To elaborate on the second: In some cases, like GPUs, we cannot expect all the operations to be defined on the corresponding class. Since we want to use certain operations on them, we need to explicitly convert them to numpy arrays, which, sad but true, requires us to copy.

This might of course lead to performance issues in some cases.

Behaviour of iteration and the info dictionary

The optimizer basically is an iterator because it's main logic lives in __iter__. Also, it can be used by either a for loop or by via convenience functions which essentially run that for loop.

Some things to respect when writing an optimizer:

  • an optimizer never stops iteration except if an error occurs; in that case, an exception should be thrown,
  • an optimizer never calculates more than needed for the optimization process (e.g. GradientDescent never calculates the actual loss) which can be calculated from the outside.