Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pytensor.tensor.optimize #1182

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jessegrabowski
Copy link
Member

@jessegrabowski jessegrabowski commented Jan 31, 2025

Description

Implement scipy optimization routines, with implicit gradients. This PR should add:

  • optimize.minimize
  • optimize.root
  • optimize.scalar_minimize
  • optimize.scalar_root

It would also be nice to have rewrites to transform e.g. root to scalar_root when we know that there is only one input.

The implementation @ricardoV94 and I cooked up (ok ok it was mostly him) uses the graph to implicitly define the inputs to the objective function. For example:

    import pytensor.tensor as pt
    from pytensor.tensor.optimize import minimize
    x = pt.scalar("x")
    a = pt.scalar("a")
    c = pt.scalar("c")

    b = a * 2
    b.name = "b"
    out = (x - b * c) ** 2

    minimized_x, success = minimize(out, x, debug=False)

We optimize out with respect to x, so x becomes the control variable. By graph inspection we find that out also depends on a and c, so the generated graph includes them as parameters. In scipy lingo, we end up with:

minimize(fun=out, x0=x, args=(a, c)

We get the following graph. The inner graph includes the gradients of the cost function by default, which is automatically used by scipy.

MinimizeOp.0 [id A]
 ├─ x [id B]
 └─ Mul [id C]
    ├─ 2.0 [id D]
    ├─ a [id E]
    └─ c [id F]

Inner graphs:

MinimizeOp [id A]
 ← Pow [id G]
    ├─ Sub [id H]
    │  ├─ x [id I]
    │  └─ <Scalar(float64, shape=())> [id J]
    └─ 2 [id K]
 ← Mul [id L]
    ├─ Mul [id M]
    │  ├─ Second [id N]
    │  │  ├─ Pow [id G]
    │  │  │  └─ ···
    │  │  └─ 1.0 [id O]
    │  └─ 2 [id K]
    └─ Pow [id P]
       ├─ Sub [id H]
       │  └─ ···
       └─ Sub [id Q]
          ├─ 2 [id K]
          └─ DimShuffle{order=[]} [id R]
             └─ 1 [id S]

We can also ask for the gradients of the maximum value with respect to parameters:

x_grad, a_grad, c_grad = pt.grad(minimized_x, [x, a, c])

# x_grad.dprint()
0.0 [id A]

# a_grad.dprint()
Mul [id A]
 ├─ 2.0 [id B]
 └─ c [id C]

# c_grad.dprint()
Mul [id A]
 ├─ 2.0 [id B]
 └─ a [id C]

Related Issue

Checklist

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pytensor--1182.org.readthedocs.build/en/1182/

@ricardoV94
Copy link
Member

We could swap the scipy method to powell if pt.grad raises?


# TODO: Does clone replace do what we want? It might need a merge optimization pass afterwards
replace = dict(zip(self.fgraph.inputs, (x_star, *args), strict=True))
grad_f_wrt_x_star, *grad_f_wrt_args = clone_replace(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should avoid my TODO concern above

Suggested change
grad_f_wrt_x_star, *grad_f_wrt_args = clone_replace(
grad_f_wrt_x_star, *grad_f_wrt_args = graph_replace(

@twiecki
Copy link
Member

twiecki commented Jan 31, 2025

Is there a path to later add other optimizers, e.g. jax or pytensor? Could then optimize on the GPU. Lasagne has a bunch of optimizers implemented in theano: https://lasagne.readthedocs.io/en/latest/modules/updates.html

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 31, 2025

Is there a path to later add other optimizers, e.g. jax or pytensor? Could then optimize on the GPU. Lasagne has a bunch of optimizers implemented in theano: https://lasagne.readthedocs.io/en/latest/modules/updates.html

Yes, the MinimizeOp can be dispatched to JaxOpt on the jax backend. On the Python/C-backend we can also replace the MinimizeOp by any equilavent optimizer as well. As @jessegrabowski mentioned, we may even analyze the graph to decide what to use, such as scalar_root when that's adequate.

@jessegrabowski
Copy link
Member Author

Is there a path to later add other optimizers, e.g. jax or pytensor? Could then optimize on the GPU. Lasagne has a bunch of optimizers implemented in theano: https://lasagne.readthedocs.io/en/latest/modules/updates.html

The Lasagne optimizers are for SGD in minibatch settings, so it's slightly different from what I have in mind here. This functionality would be useful in cases where you want to solve a sub-problem and then use the result in a downstream computation. For example, I think @theorashid wanted to use this for INLA to integrate out nuisance parameters via optimization before running MCMC on the remaining parameters of interest.

Another use case would be an agent-based model where we assume agents behave optimally. For example, we could try to estimate investor risk aversion parameters, assuming some utility function. The market prices would be the result of portfolio optimization subject to risk aversion (estimated), expected return vector, and market covariance matrix. Or use it in an RL-type scheme where agents have to solve a Bellman equation to get (an approximation to) their value function. I'm looking forward to cooking up some example models using this.

@jessegrabowski
Copy link
Member Author

We could swap the scipy method to powell if pt.grad raises?

We can definitely change the method via rewrites. There are several gradient-free (or approximate gradient) options in that respect. Optimizers can be fussy though, so I'm a bit hesitant to take this type of configuration out of the hands of the user.

@ricardoV94
Copy link
Member

Optimizers can be fussy though, so I'm a bit hesitant to take this type of configuration out of the hands of the user.

Let users choose but try to provide the best default?

@cetagostini
Copy link

Carlos its interested in this 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants