-
Notifications
You must be signed in to change notification settings - Fork 66
Optimizer contract
This is a draft.
Optimizer classes are objects that work adapt a parameter wrt
to minimize an objective function. For writing the optimizers, the following contract should hold.
- The first parameter is always
wrt
, - All callables of a functional form of the loss (that is the loss itself, its derivatives, etc,called functions from now on) follow,
- meta parameters come next,
- iterators of arguments to the functions are the first optional keyword arguments.
There are two ways wrt
can behave.
- It is a numpy array with one dimension. If the parameters to be optimized have to have another form (e.g. a two dimensional matrix) the corresponding views have to be created from it.
- It is a pair (get_wrt, set_wrt) which represents two functions. The first retrieves the latest parameter vector as a numpy array, the second sets the current parameter vector to the given value (which is a numpy array again).
To elaborate on the second: In some cases, like GPUs, we cannot expect all the operations to be defined on the corresponding class. Since we want to use certain operations on them, we need to explicitly convert them to numpy arrays, which, sad but true, requires us to copy.
This might of course lead to performance issues in some cases.
The optimizer basically is an iterator because it's main logic lives in __iter__
. Also, it can be used by either a for loop or by via convenience functions which essentially run that for loop.
Some things to respect when writing an optimizer:
- an optimizer never stops iteration except if an error occurs; in that case, an exception should be thrown,
- an optimizer never calculates more than needed for the optimization process (e.g. GradientDescent never calculates the actual loss) which can be calculated from the outside.