Add `pytensor.tensor.optimize` #1182

jessegrabowski · 2025-01-31T16:06:48Z

Description

Implement scipy optimization routines, with implicit gradients. This PR should add:

optimize.minimize
optimize.root
optimize.scalar_minimize
optimize.scalar_root

It would also be nice to have rewrites to transform e.g. root to scalar_root when we know that there is only one input.

The implementation @ricardoV94 and I cooked up (ok ok it was mostly him) uses the graph to implicitly define the inputs to the objective function. For example:

    import pytensor.tensor as pt
    from pytensor.tensor.optimize import minimize
    x = pt.scalar("x")
    a = pt.scalar("a")
    c = pt.scalar("c")

    b = a * 2
    b.name = "b"
    out = (x - b * c) ** 2

    minimized_x, success = minimize(out, x, debug=False)

We optimize out with respect to x, so x becomes the control variable. By graph inspection we find that out also depends on a and c, so the generated graph includes them as parameters. In scipy lingo, we end up with:

minimize(fun=out, x0=x, args=(a, c)

We get the following graph. The inner graph includes the gradients of the cost function by default, which is automatically used by scipy.

MinimizeOp.0 [id A]
 ├─ x [id B]
 └─ Mul [id C]
    ├─ 2.0 [id D]
    ├─ a [id E]
    └─ c [id F]

Inner graphs:

MinimizeOp [id A]
 ← Pow [id G]
    ├─ Sub [id H]
    │  ├─ x [id I]
    │  └─ <Scalar(float64, shape=())> [id J]
    └─ 2 [id K]
 ← Mul [id L]
    ├─ Mul [id M]
    │  ├─ Second [id N]
    │  │  ├─ Pow [id G]
    │  │  │  └─ ···
    │  │  └─ 1.0 [id O]
    │  └─ 2 [id K]
    └─ Pow [id P]
       ├─ Sub [id H]
       │  └─ ···
       └─ Sub [id Q]
          ├─ 2 [id K]
          └─ DimShuffle{order=[]} [id R]
             └─ 1 [id S]

We can also ask for the gradients of the maximum value with respect to parameters:

x_grad, a_grad, c_grad = pt.grad(minimized_x, [x, a, c])

# x_grad.dprint()
0.0 [id A]

# a_grad.dprint()
Mul [id A]
 ├─ 2.0 [id B]
 └─ c [id C]

# c_grad.dprint()
Mul [id A]
 ├─ 2.0 [id B]
 └─ a [id C]

Related Issue

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

📚 Documentation preview 📚: https://pytensor--1182.org.readthedocs.build/en/1182/

ricardoV94 · 2025-01-31T16:35:20Z

We could swap the scipy method to powell if pt.grad raises?

ricardoV94 · 2025-01-31T16:39:30Z

pytensor/tensor/optimize.py

+
+        # TODO: Does clone replace do what we want? It might need a merge optimization pass afterwards
+        replace = dict(zip(self.fgraph.inputs, (x_star, *args), strict=True))
+        grad_f_wrt_x_star, *grad_f_wrt_args = clone_replace(


This should avoid my TODO concern above

Suggested change

grad_f_wrt_x_star, *grad_f_wrt_args = clone_replace(

grad_f_wrt_x_star, *grad_f_wrt_args = graph_replace(

twiecki · 2025-01-31T18:03:42Z

Is there a path to later add other optimizers, e.g. jax or pytensor? Could then optimize on the GPU. Lasagne has a bunch of optimizers implemented in theano: https://lasagne.readthedocs.io/en/latest/modules/updates.html

ricardoV94 · 2025-01-31T18:06:12Z

Is there a path to later add other optimizers, e.g. jax or pytensor? Could then optimize on the GPU. Lasagne has a bunch of optimizers implemented in theano: https://lasagne.readthedocs.io/en/latest/modules/updates.html

Yes, the MinimizeOp can be dispatched to JaxOpt on the jax backend. On the Python/C-backend we can also replace the MinimizeOp by any equilavent optimizer as well. As @jessegrabowski mentioned, we may even analyze the graph to decide what to use, such as scalar_root when that's adequate.

jessegrabowski · 2025-02-01T05:55:03Z

Is there a path to later add other optimizers, e.g. jax or pytensor? Could then optimize on the GPU. Lasagne has a bunch of optimizers implemented in theano: https://lasagne.readthedocs.io/en/latest/modules/updates.html

The Lasagne optimizers are for SGD in minibatch settings, so it's slightly different from what I have in mind here. This functionality would be useful in cases where you want to solve a sub-problem and then use the result in a downstream computation. For example, I think @theorashid wanted to use this for INLA to integrate out nuisance parameters via optimization before running MCMC on the remaining parameters of interest.

Another use case would be an agent-based model where we assume agents behave optimally. For example, we could try to estimate investor risk aversion parameters, assuming some utility function. The market prices would be the result of portfolio optimization subject to risk aversion (estimated), expected return vector, and market covariance matrix. Or use it in an RL-type scheme where agents have to solve a Bellman equation to get (an approximation to) their value function. I'm looking forward to cooking up some example models using this.

jessegrabowski · 2025-02-01T05:59:12Z

We could swap the scipy method to powell if pt.grad raises?

We can definitely change the method via rewrites. There are several gradient-free (or approximate gradient) options in that respect. Optimizers can be fussy though, so I'm a bit hesitant to take this type of configuration out of the hands of the user.

ricardoV94 · 2025-02-01T08:09:08Z

Optimizers can be fussy though, so I'm a bit hesitant to take this type of configuration out of the hands of the user.

Let users choose but try to provide the best default?

cetagostini · 2025-02-10T17:37:50Z

Carlos its interested in this 👀

Implement optimize.minimize

21c10a1

ricardoV94 added enhancement New feature or request Op implementation SciPy compatibility labels Jan 31, 2025

ricardoV94 reviewed Jan 31, 2025

View reviewed changes

jessegrabowski added 4 commits February 1, 2025 18:49

Add more tests

08b8c3c

Code cleanup

7b53914

Add RootOp, fix gradient tests (they are failing)

92db737

Correct gradients for minimize

4af61b5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `pytensor.tensor.optimize` #1182

Add `pytensor.tensor.optimize` #1182

jessegrabowski commented Jan 31, 2025 •

edited by ricardoV94

Loading

ricardoV94 commented Jan 31, 2025

ricardoV94 Jan 31, 2025

twiecki commented Jan 31, 2025

ricardoV94 commented Jan 31, 2025 •

edited

Loading

jessegrabowski commented Feb 1, 2025

jessegrabowski commented Feb 1, 2025

ricardoV94 commented Feb 1, 2025

cetagostini commented Feb 10, 2025

	grad_f_wrt_x_star, *grad_f_wrt_args = clone_replace(
	grad_f_wrt_x_star, *grad_f_wrt_args = graph_replace(

Add pytensor.tensor.optimize #1182

Are you sure you want to change the base?

Add pytensor.tensor.optimize #1182

Conversation

jessegrabowski commented Jan 31, 2025 • edited by ricardoV94 Loading

Description

Related Issue

Checklist

Type of change

ricardoV94 commented Jan 31, 2025

ricardoV94 Jan 31, 2025

Choose a reason for hiding this comment

twiecki commented Jan 31, 2025

ricardoV94 commented Jan 31, 2025 • edited Loading

jessegrabowski commented Feb 1, 2025

jessegrabowski commented Feb 1, 2025

ricardoV94 commented Feb 1, 2025

cetagostini commented Feb 10, 2025

Add `pytensor.tensor.optimize` #1182

Add `pytensor.tensor.optimize` #1182

jessegrabowski commented Jan 31, 2025 •

edited by ricardoV94

Loading

ricardoV94 commented Jan 31, 2025 •

edited

Loading