gradient documentation subtleties #11019

inakleinbottle · 2022-06-07T20:50:14Z

inakleinbottle
Jun 7, 2022

I was just reading the autograd cookbook in the documentation (https://jax.readthedocs.io/en/latest/notebooks/autodiff_cookbook.html#starting-with-grad) and I have some comments and questions about the wording of the documentation. The claim made there is that if func is a Python function that represents some mathematical function $f$ then grad(func) represents the gradient $\nabla f$. This is certainly true for functions of single value, but for functions of two or more variables this equivalence does not seem to hold: if $f$ is a function from $\Bbb R^d$ into $\Bbb R$, say, then the gradient of $f$ should be a member of $\Bbb R^d$ (a $d$-dimensional vector) where the $i$ th coordinate is the partial derivative of $f$ with respect to the $i$ th coordinate $\partial_i f$. However, if I have a function in Python, say

def func(x, y):
    return x + 2.*y

then grad(func) is, without any additional arguments, the 0th partial derivative of $f(x,y) = x + 2y$; that is $\partial_x f(x, y) = 1$. The corresponding partial derivative with respect to the 1th variable is grad(func, 1) (that is $\partial_y f(x, y) = 2$). Of course, I can provide the additional argument grad(func, (0, 1)) to get something more easily identifiable as the gradient.

I realise that in the applications for which Jax was designed, this is something that you might not need to worry about, and there are some technical issues with constructing the correct sized vector - given that some of the input variables that are not variables in the mathematical sense (i.e. hyperparameters). I really love the design of the API: it is simple, clean, and hassle free. However, I think this could be made more clear in the documentation.

I didn't think this was worthy of opening an issue since it is a minor point, so I started a discussion topic instead. I do think though that the terms gradient and derivative are conflated somewhat (everywhere, not just in the data science community) when really they are quite different things: the derivative (in a particular direction) tells me how much my function changes approximately linearly using a small step in that direction (a scalar); the gradient is the vector direction in which I take the largest increasing step. The two are obviously very closely related (especially on $\Bbb R^d$), but that is no reason to disregard caution.

jakevdp · 2022-06-09T17:34:57Z

jakevdp
Jun 9, 2022
Maintainer

Thanks for pointing this out! You're correct to note that jax.grad computes the gradient with respect to the first function argument by default. Do you have concrete ideas on how the discussion could be made more precise without missing the forest for the trees, so to speak?

1 reply

inakleinbottle Jun 12, 2022
Author

Not exactly, and this is the main reason I didn't open an issue. Moreover it is a little more subtle than I made out. If I'm not mistaken the input variables must be flat arrays or scalars for the grad function to make sense of it as a function: so replacing my function of two variables with

def func(x):
    return x[0, :] + 2.*x[1, :]

doesn't make the situation more simple. Indeed using this function grad(func) produces a function that when called on jnp.array([[1.], [1.]]) gives a result of DeviceArray([3.], dtype=float32) which is the directional derivative in the direction I provided, but not the gradient evaluated at that point. (The latter would be [1., 2.].) Note that the gradient lives in the tangent space of the domain, and not in the codomain, which in $\Bbb R^2$ is a copy of $\Bbb R^2$. The derivative lives in the codomain, (i.e. $\Bbb R$). Thus it is not quite true to say that grad gives the gradient of the first variable; it gives the (directional) derivative in the direction of the first variable. (Note that if $f:\Bbb R\to \Bbb R$ then both the gradient and derivative live in the same space, and have the same value.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gradient documentation subtleties #11019

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

gradient documentation subtleties #11019

inakleinbottle Jun 7, 2022

Replies: 1 comment · 1 reply

jakevdp Jun 9, 2022 Maintainer

inakleinbottle Jun 12, 2022 Author

inakleinbottle
Jun 7, 2022

Replies: 1 comment 1 reply

jakevdp
Jun 9, 2022
Maintainer

inakleinbottle Jun 12, 2022
Author