Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation should emphasize calling LuxCore.apply #1160

Open
ExpandingMan opened this issue Jan 3, 2025 · 9 comments
Open

Documentation should emphasize calling LuxCore.apply #1160

ExpandingMan opened this issue Jan 3, 2025 · 9 comments
Labels
documentation Improvements or additions to documentation

Comments

@ExpandingMan
Copy link

The Parallel layer never calls Lux.apply, but only calls its own method Lux.applyparallel. The problem with this is that if you try to overload methods of apply to implement your own input types as described here, Parallel will just completely ignore it.

I would suggest keeping applyparallel as is but first adding an extra indirection via apply to the call to Parallel.

@avik-pal
Copy link
Member

avik-pal commented Jan 3, 2025

it calls it here:

[:(($(y_symbols[i]), $(st_symbols[i])) = @inline apply(

@ExpandingMan
Copy link
Author

Right, but it's only calling it for the child layers not for Parallel itself, right?

@avik-pal
Copy link
Member

avik-pal commented Jan 3, 2025

The apply calls are redirected from LuxCore. apply(m, x, ps, st) -> m(x, ps, st) -> applyparallel(m, x, ps, st)

@ExpandingMan
Copy link
Author

If apply calls m, shouldn't one overload m directly then instead of apply? The main issue I mean to raise here is that if one follows the instructions in the docs for creating custom input types, it passes the input through Parallel into its arguments without first executing the overloaded code on Parallel, which leads to unexpected results.

@avik-pal
Copy link
Member

avik-pal commented Jan 3, 2025

overloading m will lead to method ambiguities in most cases. Can you give a MWE for this case?

@ExpandingMan
Copy link
Author

Ok, here is an MWE, albeit a little contrived:

using LinearAlgebra, Random, Statistics
using CUDA
using Lux


struct InputType{T<:AbstractMatrix}
    components::NTuple{2,T}
end

function Lux.apply(l::Lux.AbstractLuxLayer, x::InputType, θ, ψ::NamedTuple)
    l(x.components, θ, ψ)
end

function main()
    l1 = Parallel(vcat, Dense(2=>2), Dense(2=>2))

    (θ, ψ) = Lux.setup(Xoshiro(999), l1)

    X = InputType((randn(Float32, 2,4), randn(Float32, 2,4)))

    (y, ψ′) = l1(X, θ, ψ)
end

Calling main() gives

ERROR: MethodError: no method matching (::Dense{typeof(identity), Int64, Int64, Nothing, Nothing, Static.True})(::Tuple{Matrix{…}, Matrix{…}}, ::@NamedTuple{weight::Matrix{…}, bias::Vector{…}}, ::@NamedTuple{})
The object of type `Dense{typeof(identity), Int64, Int64, Nothing, Nothing, Static.True}` exists, but no method is defined for this combination of argument types when trying to treat it as a callable object.

Closest candidates are:
  (::Dense)(::AbstractArray, ::Any, ::NamedTuple)
   @ Lux ~/.julia/packages/Lux/DHtyL/src/layers/basic.jl:339

Stacktrace:
 [1] apply
   @ ~/src/lux_1160.jl:12 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/Lux/DHtyL/src/layers/containers.jl:0 [inlined]
 [3] applyparallel(layers::@NamedTuple{…}, connection::typeof(vcat), x::InputType{…}, ps::@NamedTuple{…}, st::@NamedTuple{…})
   @ Lux ~/.julia/packages/Lux/DHtyL/src/layers/containers.jl:175
 [4] (::Parallel{typeof(vcat), @NamedTuple{…}, Nothing})(x::InputType{Matrix{…}}, ps::@NamedTuple{layer_1::@NamedTuple{…}, layer_2::@NamedTuple{…}}, st::@NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}})
   @ Lux ~/.julia/packages/Lux/DHtyL/src/layers/containers.jl:173
 [5] main()
   @ Main ~/src/lux_1160.jl:23
 [6] top-level scope
   @ REPL[2]:1
Some type information was truncated. Use `show(err)` to see complete types.

I would expect Parallel to unpack InputType, but it never does because it doesn't call apply, only its component layers do. One may argue that Parallel is somehow special, but my counter-argument to that would be that it is still just AbstractLuxLayer, it is surprising for it not to call apply, so I would argue that either the type structure should change to accommodate layers with different semantics or, more likely, rewrite Parallel to call apply.

@avik-pal
Copy link
Member

right because you are calling l1(X, θ, ψ) instead of calling Lux.apply(l1, x, θ, ψ). The entry point for layers is always the apply call which is redirected to the (::XYZ)(...) call.

I probably see the source of confusion.

xt = ArrayAndTime(x, 10.0f0)

model(xt, ps, st)[1]

should be rewritten as Lux.apply(model, x, ps, st)[1]

@ExpandingMan
Copy link
Author

Yeah, I'm definitely confused about where to use apply vs where to use a direct call, might be worth being extra pedantic on that. Now that I am looking at the doc string for apply I realize it's asking me to use it, but this is only dawning on me now, I'm quite sure I won't be the only one to mis-use the direct call.

@avik-pal
Copy link
Member

Yeah, generally it doesn't matter too much (except in situations like this). But using apply is always the safest bet.

I will mark this as a documentation issue. We need to emphasize the downsides of not using apply and probably update calls in our docs to use apply

@avik-pal avik-pal changed the title Parallel never calls apply Documentation should emphasize calling LuxCore.apply Jan 28, 2025
@avik-pal avik-pal added documentation Improvements or additions to documentation and removed needs more information labels Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants