Reuse LU decomposition in Solve #1396

ricardoV94 · 2025-05-08T16:57:16Z

Requires #1394

This PR adds the rewrites to reuse an LU decomposition of the same A matrix across multiple solves:

Distinct Solve operations such as in a graph containing both the forward and backward pass of Solve)
Blockwise Solve where A is broadcasted by b
Scan Solve where A is a composed only of non-sequences inputs, but not b

It does not propagate the check_finite flag which I think we should rework out of Solve

📚 Documentation preview 📚: https://pytensor--1396.org.readthedocs.build/en/1396/

ricardoV94 · 2025-05-09T13:38:11Z

There's a test failing @jessegrabowski, it seems like our lu_solve gives different precision output compared to getrs (I assume worse, but I didn't check).

Anyway this makes me wonder, don't we want to wrap that lapack Op instead of using our double triangular and pivots op thing? It would reduce the cost of splitting the Op, because all that logic will be inside Lapack. It's also a more clean graph?

jessegrabowski · 2025-05-09T14:04:50Z

Yes we can directly wrap the Op. I was just having trouble with gradients when I did it that way. If you recall, we had a call where I compared the two approaches (core ops vs new op)

ricardoV94 · 2025-05-09T14:34:31Z

You couldn't figure out the gradients of just the lapack Op or something else?

jessegrabowski · 2025-05-09T15:16:39Z

Yes, I couldn't get correct gradients. What I thought would be the correct, straight-forward answer ended up being wrong. I got frustrated quickly and didn't spend a super long time on it

ricardoV94 · 2025-05-09T16:05:26Z

Okay, I mean I'm okay with this, just need to tweak the tolerance, and perhaps keep an issue to revisit more carefully later if it turns out problematic. It's a float32 thing and the differences are not crazy

jessegrabowski · 2025-05-10T00:18:58Z

My guess is that there's sequential loss of precision by doing two solves vs one. We do have the fig leaf of "well, this is what jax does" at least!

I'll at least open a branch/PR with the Op version of lu_solve, and we can work on getting the gradient to work when we have some free time (never).

Copilot

Pull Request Overview

This PR enables the reuse of LU decompositions across Solve calls for distinct, blockwise, and scan-based operations. Key changes include updated test tolerances and modes in blockwise and rewriting tests, new tests to verify the LU rewrite behavior, and implementation of several LU-decomposition rewrite functions along with adjustments in the scan rewriting module.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/tensor/test_blockwise.py	Adjusted tolerance settings and updated mode exclusion parameters.
tests/tensor/linalg/test_rewriting.py	Added tests to validate the new LU decomposition rewrite logic.
pytensor/tensor/rewriting/linalg.py	Minor update to is_matrix_transpose to correctly handle expanded dims.
pytensor/tensor/_linalg/solve/rewriting.py	Implemented LU reuse rewrites and helper functions for Solve operations.
pytensor/tensor/_linalg/solve/init.py	Registered the new solve rewrites.
pytensor/tensor/_linalg/init.py	Registered LU decomposition rewrites.
pytensor/tensor/init.py	Imported the updated linalg module.
pytensor/scan/rewriting.py	Adjusted positions for scan rewrite registrations.

tests/tensor/linalg/test_rewriting.py

codecov · 2025-05-14T05:27:38Z

Codecov Report

Attention: Patch coverage is 92.45283% with 8 lines in your changes missing coverage. Please review.

Project coverage is 82.12%. Comparing base (24a2234) to head (7278076).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
pytensor/tensor/_linalg/solve/rewriting.py	94.11%	2 Missing and 4 partials ⚠️
pytensor/tensor/rewriting/linalg.py	0.00%	1 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (92.45%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1396      +/-   ##
==========================================
+ Coverage   82.08%   82.12%   +0.04%     
==========================================
  Files         208      211       +3     
  Lines       49565    49682     +117     
  Branches     8792     8812      +20     
==========================================
+ Hits        40685    40802     +117     
+ Misses       6706     6702       -4     
- Partials     2174     2178       +4

Files with missing lines	Coverage Δ
pytensor/compile/mode.py	`84.72% <ø> (ø)`
pytensor/scan/rewriting.py	`82.89% <ø> (ø)`
pytensor/tensor/_linalg/__init__.py	`100.00% <100.00%> (ø)`
pytensor/tensor/_linalg/solve/__init__.py	`100.00% <100.00%> (ø)`
pytensor/tensor/rewriting/linalg.py	`92.28% <0.00%> (-0.48%)`	⬇️
pytensor/tensor/_linalg/solve/rewriting.py	`94.11% <94.11%> (ø)`

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jessegrabowski · 2025-05-14T04:45:34Z

pytensor/scan/rewriting.py

@@ -2605,7 +2603,7 @@ def scan_push_out_dot1(fgraph, node):
    "more_mem",
    "scan",
    "scan_pushout",
-    position=5,
+    position=6,


Is this ordering necessary?

Yeah we want the rewrite that splits the LU before the pushout which is the one that actually removes it from the inner graph.

I could have used decimals, but it makes sense to have something whole between the previous rewrite and this

jessegrabowski · 2025-05-14T04:47:46Z