gradient documentation subtleties #11019
Unanswered
inakleinbottle
asked this question in
General
Replies: 1 comment 1 reply
-
Thanks for pointing this out! You're correct to note that |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was just reading the autograd cookbook in the documentation (https://jax.readthedocs.io/en/latest/notebooks/autodiff_cookbook.html#starting-with-grad) and I have some comments and questions about the wording of the documentation. The claim made there is that if$f$ then $\nabla f$ . This is certainly true for functions of single value, but for functions of two or more variables this equivalence does not seem to hold: if $f$ is a function from $\Bbb R^d$ into $\Bbb R$ , say, then the gradient of $f$ should be a member of $\Bbb R^d$ (a $d$ -dimensional vector) where the $i$ th coordinate is the partial derivative of $f$ with respect to the $i$ th coordinate $\partial_i f$ . However, if I have a function in Python, say
func
is a Python function that represents some mathematical functiongrad(func)
represents the gradientthen$f(x,y) = x + 2y$ ; that is $\partial_x f(x, y) = 1$ . The corresponding partial derivative with respect to the 1th variable is $\partial_y f(x, y) = 2$ ). Of course, I can provide the additional argument
grad(func)
is, without any additional arguments, the 0th partial derivative ofgrad(func, 1)
(that isgrad(func, (0, 1))
to get something more easily identifiable as the gradient.I realise that in the applications for which Jax was designed, this is something that you might not need to worry about, and there are some technical issues with constructing the correct sized vector - given that some of the input variables that are not variables in the mathematical sense (i.e. hyperparameters). I really love the design of the API: it is simple, clean, and hassle free. However, I think this could be made more clear in the documentation.
I didn't think this was worthy of opening an issue since it is a minor point, so I started a discussion topic instead. I do think though that the terms$\Bbb R^d$ ), but that is no reason to disregard caution.
gradient
andderivative
are conflated somewhat (everywhere, not just in the data science community) when really they are quite different things: the derivative (in a particular direction) tells me how much my function changes approximately linearly using a small step in that direction (a scalar); the gradient is the vector direction in which I take the largest increasing step. The two are obviously very closely related (especially onBeta Was this translation helpful? Give feedback.
All reactions