Quadratic forms are not scalars? Question about scalar-valued functions #637

doctor-phil · 2024-02-06T23:41:06Z

doctor-phil
Feb 6, 2024

This is maybe a conceptual issue or I am missing something so I was hoping someone could point me in the right direction.

Consider the following example:

import pytensor as pt
import pytensor.tensor as tn
A = tn.dmatrix('A')
x = tn.col('x')
f = -x.T @ A @ x

Clearly in this example there is only one possible shape for f wherever it is defined. However, returning f shows it is a Blockwise{dot, (m,k),(k,n)->(m,n)}.0, and trying to find the gradient with pt.grad(f,x) throws a TypeError: Cost must be a scalar.

What am I doing wrong here?

Edit: So, I got it working by wrapping the expression in tn.trace(). (So that f=tn.trace(-x.T @ A @ x)). This seems like a hacky workaround but whatever. Anyway, printing the graph for f gives:

Neg [id A]
 └─ Sum{axis=0} [id B]
    └─ ExtractDiag{offset=0, axis1=0, axis2=1, view=False} [id C]
       └─ Blockwise{dot, (m,k),(k,n)->(m,n)} [id D]
          ├─ Blockwise{dot, (m,k),(k,n)->(m,n)} [id E]
          │  ├─ Transpose{axes=[1, 0]} [id F] 'x.T'
          │  │  └─ x [id G]
          │  └─ A [id H]
          └─ x [id G]

Meanwhile, if I compile f into a function (as ff = pt.function([x,A], f) I am now left with

Neg [id A] 11
 └─ Sum{axes=None} [id B] 10
    └─ ExtractDiag{offset=0, axis1=0, axis2=1, view=False} [id C] 9
       └─ ExpandDims{axis=0} [id D] 8
          └─ Gemv{inplace} [id E] 7
             ├─ AllocEmpty{dtype='float64'} [id F] 6
             │  └─ 1 [id G]
             ├─ 1.0 [id H]
             ├─ Transpose{axes=[1, 0]} [id I] 'x.T' 5
             │  └─ x [id J]
             ├─ Gemv{inplace} [id K] 4
             │  ├─ AllocEmpty{dtype='float64'} [id L] 3
             │  │  └─ Shape_i{1} [id M] 2
             │  │     └─ A [id N]
             │  ├─ 1.0 [id H]
             │  ├─ Transpose{axes=[1, 0]} [id O] 'A.T' 1
             │  │  └─ A [id N]
             │  ├─ DropDims{axis=1} [id P] 0
             │  │  └─ x [id J]
             │  └─ 0.0 [id Q]
             └─ 0.0 [id Q]

I was under the impression that compiling with pt.function should simplify the expression, not make it more verbose.

For my actual application, I am really just looking for a way to simplify a certain graph and retrieve it as a symbolic expression.

ricardoV94 · 2024-02-08T10:21:14Z

ricardoV94
Feb 8, 2024
Maintainer

Hi @doctor-phil thanks for reaching out.

When dprinting, you may want to pass the flag print_type=True:

import pytensor
import pytensor.tensor as pt

A = pt.dmatrix('A')
x = pt.col('x')
f = -x.T @ A @ x

pytensor.dprint(f, print_type=True)

This will show that f is a single valued tensor, but a Matrix nonetheless.

Blockwise{dot, (m,k),(k,n)->(m,n)} [id A] <Matrix(float64, shape=(1, 1))>
 ├─ Blockwise{dot, (m,k),(k,n)->(m,n)} [id B] <Matrix(float64, shape=(1, ?))>
 │  ├─ Neg [id C] <Matrix(float64, shape=(1, ?))>
 │  │  └─ Transpose{axes=[1, 0]} [id D] <Matrix(float64, shape=(1, ?))> 'x.T'
 │  │     └─ x [id E] <Matrix(float64, shape=(?, 1))>
 │  └─ A [id F] <Matrix(float64, shape=(?, ?))>
 └─ x [id E] <Matrix(float64, shape=(?, 1))>

Grad explicitly requires you to have a scalar input, so you can call .squeeze:

pytensor.dprint(f.squeeze(), print_type=True)

DropDims{axes=[0, 1]} [id A] <Scalar(float64, shape=())>
 └─ Blockwise{dot, (m,k),(k,n)->(m,n)} [id B] <Matrix(float64, shape=(1, 1))>
    ├─ Blockwise{dot, (m,k),(k,n)->(m,n)} [id C] <Matrix(float64, shape=(1, ?))>
    │  ├─ Neg [id D] <Matrix(float64, shape=(1, ?))>
    │  │  └─ Transpose{axes=[1, 0]} [id E] <Matrix(float64, shape=(1, ?))> 'x.T'
    │  │     └─ x [id F] <Matrix(float64, shape=(?, 1))>
    │  └─ A [id G] <Matrix(float64, shape=(?, ?))>
    └─ x [id F] <Matrix(float64, shape=(?, 1))>

And now you should be able to call grad just fine. Usually we always do expr.sum() to get any gradient, which in this case will do the very same after rewrites.

On your second question. The goal of compiling a function is not to simplify an expression, but to optimize it for speed/memory consumption. In this case we have replaced the Dot by an efficient GemV Blas operation. It looks complicated because the function has a very specfic signature (5 inputs in this case).

If you are only interested in simplifying an expression, you don't even need to compile which not only does rewrites that increase "complexity" but also wastes time doing C compilaiton. Instead you can use rewrite_graph, and control which kind of rewrites are actually applied:

from pytensor.graph import rewrite_graph

grad_f_wrt_x = pt.grad(f.squeeze(), x)
simpler_grad_f_wrt_x = rewrite_graph(grad_f_wrt_x, include=("canonicalize", "specialize"))

pytensor.dprint(simpler_grad_f_wrt_x)

Sub [id A]
 ├─ dot [id B]
 │  ├─ dot [id C]
 │  │  ├─ Transpose{axes=[1, 0]} [id D] 'A.T'
 │  │  │  └─ A [id E]
 │  │  └─ Neg [id F]
 │  │     └─ x [id G]
 │  └─ [[1.]] [id H]
 └─ dot [id I]
    ├─ A [id E]
    └─ dot [id J]
       ├─ x [id G]
       └─ [[1.]] [id H]

A bit better, although there are some stupid dot(x, 1) that should be rewritten away.

However, my hand-picked arguments to rewrite_graph won't always result in the "simplest graph". We should add a rewrite database with that specific goal!

Anyway let me know if this helps.

4 replies

ricardoV94 Feb 8, 2024
Maintainer

Opened an issue for the dot with 1: #638

ricardoV94 Feb 8, 2024
Maintainer

And one for a specialized database for symbolic simplification: #639

doctor-phil Feb 13, 2024
Author

Hi @ricardoV94 , thanks for the info. This was able to solve the problem for this simpler case of the quadratic form.

I think your insights here would help me solve the full problem (which is far more complex). However, I am running into a bizarre error message when trying to solve the full problem. It goes like this:

ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: local_subtensor_shape_constant
ERROR (pytensor.graph.rewriting.basic): node: Subtensor{i}(Shape.0, 0)
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
  File "c:\Users\Phil\miniconda3\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1922, in process_node
    replacements = node_rewriter.transform(fgraph, node)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Phil\miniconda3\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1082, in transform
    return self.fn(fgraph, node)
           ^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Phil\miniconda3\Lib\site-packages\pytensor\tensor\rewriting\subtensor.py", line 1646, in local_subtensor_shape_constant
    shape_parts = shape_arg.type.broadcastable[idx_val]
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
IndexError: tuple index out of range

                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Phil\miniconda3\Lib\site-packages\pytensor\graph\rewriting\basic.py", line 1082, in transform
    return self.fn(fgraph, node)
           ^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Phil\miniconda3\Lib\site-packages\pytensor\tensor\rewriting\subtensor.py", line 1646, in local_subtensor_shape_constant
    shape_parts = shape_arg.type.broadcastable[idx_val]
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
IndexError: tuple index out of range

Many of these will show up, but there is no more traceback to help me diagnose the issue.

This only appears when I am trying to simplify or compile the expression (in other words, the expression seems to evaluate correctly and the tree appears albeit not simplified). I will continue to look for a MWE that is possible to post for clarity, but in the meantime, could this be related to my use of col?

ricardoV94 Feb 13, 2024
Maintainer

That likely comes from a shape error in your graph (or a bug in PyTensor). A MWE should help figuring out which one it is

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quadratic forms are not scalars? Question about scalar-valued functions #637

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Quadratic forms are not scalars? Question about scalar-valued functions #637

Uh oh!

Uh oh!

doctor-phil Feb 6, 2024

Replies: 1 comment · 4 replies

Uh oh!

Uh oh!

ricardoV94 Feb 8, 2024 Maintainer

Uh oh!

ricardoV94 Feb 8, 2024 Maintainer

Uh oh!

ricardoV94 Feb 8, 2024 Maintainer

Uh oh!

doctor-phil Feb 13, 2024 Author

Uh oh!

ricardoV94 Feb 13, 2024 Maintainer

doctor-phil
Feb 6, 2024

Replies: 1 comment 4 replies

ricardoV94
Feb 8, 2024
Maintainer

ricardoV94 Feb 8, 2024
Maintainer

ricardoV94 Feb 8, 2024
Maintainer

doctor-phil Feb 13, 2024
Author

ricardoV94 Feb 13, 2024
Maintainer