-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pytensor.tensor.optimize
#1182
base: main
Are you sure you want to change the base?
Conversation
We could swap the scipy method to powell if pt.grad raises? |
pytensor/tensor/optimize.py
Outdated
|
||
# TODO: Does clone replace do what we want? It might need a merge optimization pass afterwards | ||
replace = dict(zip(self.fgraph.inputs, (x_star, *args), strict=True)) | ||
grad_f_wrt_x_star, *grad_f_wrt_args = clone_replace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should avoid my TODO concern above
grad_f_wrt_x_star, *grad_f_wrt_args = clone_replace( | |
grad_f_wrt_x_star, *grad_f_wrt_args = graph_replace( |
Is there a path to later add other optimizers, e.g. jax or pytensor? Could then optimize on the GPU. Lasagne has a bunch of optimizers implemented in theano: https://lasagne.readthedocs.io/en/latest/modules/updates.html |
Yes, the MinimizeOp can be dispatched to JaxOpt on the jax backend. On the Python/C-backend we can also replace the MinimizeOp by any equilavent optimizer as well. As @jessegrabowski mentioned, we may even analyze the graph to decide what to use, such as scalar_root when that's adequate. |
The Lasagne optimizers are for SGD in minibatch settings, so it's slightly different from what I have in mind here. This functionality would be useful in cases where you want to solve a sub-problem and then use the result in a downstream computation. For example, I think @theorashid wanted to use this for INLA to integrate out nuisance parameters via optimization before running MCMC on the remaining parameters of interest. Another use case would be an agent-based model where we assume agents behave optimally. For example, we could try to estimate investor risk aversion parameters, assuming some utility function. The market prices would be the result of portfolio optimization subject to risk aversion (estimated), expected return vector, and market covariance matrix. Or use it in an RL-type scheme where agents have to solve a Bellman equation to get (an approximation to) their value function. I'm looking forward to cooking up some example models using this. |
We can definitely change the method via rewrites. There are several gradient-free (or approximate gradient) options in that respect. Optimizers can be fussy though, so I'm a bit hesitant to take this type of configuration out of the hands of the user. |
Let users choose but try to provide the best default? |
Carlos its interested in this 👀 |
Description
Implement scipy optimization routines, with implicit gradients. This PR should add:
optimize.minimize
optimize.root
optimize.scalar_minimize
optimize.scalar_root
It would also be nice to have rewrites to transform e.g.
root
toscalar_root
when we know that there is only one input.The implementation @ricardoV94 and I cooked up (ok ok it was mostly him) uses the graph to implicitly define the inputs to the objective function. For example:
We optimize
out
with respect tox
, sox
becomes the control variable. By graph inspection we find thatout
also depends ona
andc
, so the generated graph includes them as parameters. In scipy lingo, we end up with:We get the following graph. The inner graph includes the gradients of the cost function by default, which is automatically used by scipy.
We can also ask for the gradients of the maximum value with respect to parameters:
Related Issue
Checklist
Type of change
📚 Documentation preview 📚: https://pytensor--1182.org.readthedocs.build/en/1182/