-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some tweaks to the Getting Started docs #2195
base: main
Are you sure you want to change the base?
Conversation
so telling it the activity is still helpful. Enzyme does its best job to deduce the return activity, but if something is unstable (e.g. like in #2194) it may fail. In the case of the issue I presume marking the return activity as Active explicitly would allow the type unstable case to succeed. That said, that doesn't mean we need to introduce the more complex version at the start |
``` | ||
|
||
where we note for future reference that the value of this function at `x=1.0`, `y=2.0` is `100.0`, and its derivative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider showing this as code instead of prose?
julia> rosenbrock(x, y) = (1.0 - x)^2 + 100.0 * (y - x^2)^2;
julia> rosenbrock(xy) = (1.0 - xy[1])^2 + 100.0 * (xy[2] - xy[1]^2)^2;
julia> rosenbrock(1.0, 2.0) == rosenbrock([1.0, 2.0]) == 100.0
true
I also think you should not call the input of rosenbrock_inp
the same thing, x == [x, y]
is weird. The name rosenbrock_inp
also seems a bit weird, maybe it can just be another method, or if that's too confusing, add a suffix more informative than "inp"? (I'm not sure what INP means, maybe input, but why?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was originally in place (@michel2323 were you the one to originally author this doc, just by virtue of it being rosenbrock?)
But either way sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. But this function isn't in-place, it's just going to be used somewhere below in a demonstration that Enzyme likes to handle functions which accept Vector by mutating something else. The reader doesn't know that yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very true, maybe rosenbrok_array or something? or even just rosenbrock2
The return value of reverse mode [`autodiff`](@ref) is a tuple that contains as a first value | ||
the derivative value of the active inputs and optionally the primal return value. | ||
the derivative value of the active inputs and optionally the _primal_ return value (i.e. the | ||
value of the undifferentiated function). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider not using "value" to mean so many things here?
is a tuple that contains as a first element the derivatives of ..., and optionally the primal value (i.e. what the function returns).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "optionally" also seems a bit odd to describe the output not the input. It's not that you may omit this. It's that ReverseWithPrimal tells it to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we definitely don't need to say "derivative value" and can just say "the derivative of"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and "the value of the undifferentiated function" -> "the result of the original function without differentiation"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also consider putting the ReverseWithPrimal case first, as without it, ((-400.0, 200.0),)
seems like a puzzle to count the brackets & guess why.
Perhaps also write it with destructuring syntax, like:
derivs, y = autodiff(ReverseWithPrimal, rosenbrock, Active(1.0), Active(2.0))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah totally fair, if you want to put that in this PR that would be fine with me!
@@ -71,8 +73,9 @@ julia> dx | |||
200.0 | |||
``` | |||
|
|||
Both the inplace and "normal" variant return the gradient. The difference is that with | |||
[`Active`](@ref) the gradient is returned and with [`Duplicated`](@ref) the gradient is accumulated in place. | |||
Both the inplace and "normal" variant return the gradient. The difference is that with [`Active`](@ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wording seems weird. The inplace version returns ((nothing,),)
, it's written right there. That's what return
means. And inplace / "normal" are new terms here, which aren't the terms you need to learn to understand Enzyme.
The version with
Active
arguments (for immutable inputs like x::Float64) returns the gradient. The version withDuplicated
(for mutable inputs likex::Vector{Float64}
) instead writes the gradient into theDuplicated
object, and returnsnothing
in the corresponding slot of the returnedderivs
. In fact it accumulates the gradient, i.e. if you run it again it will doubledx
. (Seemake_zero!
perhaps.) In general a function may accept any mix of Active, Duplicated, and Const arguments.
IDK how much of the end goes here, but the reader should not get the impression that all arguments must have the same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should say both compute the gradient.
And we can use whatever function names are here for clarity
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2195 +/- ##
==========================================
+ Coverage 67.50% 70.70% +3.20%
==========================================
Files 31 55 +24
Lines 12668 16302 +3634
==========================================
+ Hits 8552 11527 +2975
- Misses 4116 4775 +659 ☔ View full report in Codecov by Sentry. |
I'd like to get around to some more comprehensive documentation contributions, but thought I'd start with some very low hanging fruit / minor tweaks to the
getting started
page:autodiff(mode, f, activity, args...)
and replaced it withautodiff(mode, f, args...)
since telling it the activity mode is no longer neededrosenbrock(1.0, 2.0) == 100.0
and the derivative w.r.t.x
should be-400.0
and the derivative w.r.t.y
should be200.0
. This should hopefully help users orient themselves a bit quicker when they're trying to interpret the (perhaps confusing) outputs of the variousautodiff
calls.