-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Returning Broadcasted
cotangents for Broadcasted
arguments?
#698
Comments
Tangents need to be represented with types that support Potentially we could wrap it in a type and then do that. Maybe in ChainRules 2.0 we should relax that requirement and export our own |
What seems tricky is to efficiently consume any variety of thunks in downstream rules. If you're going to do (I agree that ditching the use of |
I think dedicated structural tangent types are still useful for cases like storing (co)tangents of mutable structs for rules which use |
With the improvements in compiler analysis coming its likely we will be able to do away with Thunks before ChainRules 2.0. |
But such analysis would not remove the above desire for BroadcastThunk -- where the goal is to fuse the reverse pass & save memory. Compiler improvements to fuse broadcasting (in code which looks like a function returning an Array, used in another broadcast) are presumably further off. It would be helpful to have some small examples where you might expect this to matter the most. Perhaps Edit: I have a messy prototype on a branch. One example is this: julia> let x = rand(1000)
@btime gradient(x -> sum((1 .- x) ./ 2), $x) # Diffractor
@btime copy($x) # to compare
end;
min 2.838 μs, mean 7.916 μs (27 allocations, 32.83 KiB) # before, 4 copies
min 294.147 ns, mean 1.515 μs (1 allocation, 7.94 KiB)
min 3.062 μs, mean 5.657 μs (28 allocations, 17.09 KiB) # after, 2 copies Zygote gets |
Maybe |
I pushed the branches now. In this version, when I had forgotten, but This makes no attempt to track how many places the same BroadcastThunk is going to be used. So the idea would be to only use it for cheap operations. That may include the gradient in |
More of a question on whether this makes sense than a feature request. I could see it replacing or complementing
@thunk(<eager broadcast> of unbroadcast(...))
for rules such as https://github.com/JuliaDiff/ChainRules.jl/blob/v1.48.0/src/rulesets/Base/broadcast.jl#L174.The text was updated successfully, but these errors were encountered: