Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow non-tuple data in the new train! #2119

Merged
merged 10 commits into from
Nov 24, 2022
Merged

allow non-tuple data in the new train! #2119

merged 10 commits into from
Nov 24, 2022

Conversation

CarloLucibello
Copy link
Member

@CarloLucibello CarloLucibello commented Nov 21, 2022

Follow up to #2082 with two changes to the new train!:

I can factor out the second change if deemed controversial. Edit: removed the callback addition.

Copy link
Member

@mcabbott mcabbott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this maybe-splat behaviour is a little odd, but see the argument for keeping it in the name of making upgrades easier. Comments below are only on this.

Besides the headline change, this PR not only re-introduces callbacks, but introduces a larger more complicated callback scheme. Do we want this, or to push people towards writing a for loop?

src/train.jl Outdated Show resolved Hide resolved
src/train.jl Outdated Show resolved Hide resolved
src/train.jl Outdated Show resolved Hide resolved
src/train.jl Outdated Show resolved Hide resolved
@CarloLucibello CarloLucibello changed the title allow non-tuple data in the new train! updates to the new train! Nov 21, 2022
@CarloLucibello
Copy link
Member Author

Besides the headline change, this PR not only re-introduces callbacks, but introduces a larger more complicated callback scheme. Do we want this, or to push people towards writing a for loop?

I simplified the scheme, compared to the old scheme it is more useful and has the same complexity.
I can factor out the callback change into another PR. Or also ditch it, I'm not sure myself it is something we want to do.

@mcabbott
Copy link
Member

I agree that a callback which gets some details is more useful; the old-style ones where you wrap the global model seem more in keeping with the implicit style.

Passing this as one NamedTuple is probably a better choice than splatting it as keywords, as your function can then take what it needs.

One downside is that it does mean many more things need names, in code not just in docs. The present list is (model, data, opt, step, loss, gradient), that's a lot. And is "opt" an optimiser or a state or a tree, the documentation isn't completely consistent... but so far its name never matters much.

Maybe it would be clearer to call this train!(loss, m, data, opt; callback) with a different keyword. Then there's less confusion between the old and new things. The old cb is a pretty cryptic name; it could survive for now to accept a zero-arg function with a depwarn.

@darsnack
Copy link
Member

Regarding callbacks, I think it's more of a design/community issue than anything else. The proposed callback scheme does provide all the state in the loop to the user. But there is already a difficulty: as written, the callback happens before the optimizer update, wouldn't opt and model be more useful after the update? I see two options:

  1. We provide a callback as proposed here but called at the end of each loop iteration. We take a hard stance on declining to add more entry points into the loop. We do not provide built-in callbacks. This feature is described as a "quick and dirty" callback for when you do not want to refactor your code.
  2. We delete callbacks completely. I don't think the old style (close over globals) callbacks should remain as we transition to explicit. They are incompatible with immutable models, and even though most models are/will be mutable, I think it is confusing to have a schism in behavior.

This is why I describe it as a community issue, because the issue is preventing feature creep. (2) does this neatly whereas (1) will always have users requesting more.

@mcabbott
Copy link
Member

Re before/after, worth remembering the Flux.skip debacle, where apparently it was so confusing and so code-golfed that it got documented in a way which didn't actually skip anything, and tests didn't notice.

Whereas test() && continue in a loop does exactly what it looks like it'll do.

@CarloLucibello
Copy link
Member Author

But there is already a difficulty: as written, the callback happens before the optimizer update, wouldn't opt and model be more useful after the update?

my reasoning was that executing the callback before the update you are able to implement things such as gradient clipping

@mcabbott
Copy link
Member

For simple clipping we of course have ClipGrad.

Do we have good examples of callbacks being useful in the wild?

Every single use in the model zoo appears to be to print the loss, with throttle. So are all the ones I found on discourse in 5 minutes.

https://github.com/FluxML/model-zoo/search?q=cb
FluxML/Metalhead.jl#62 (comment)
https://discourse.julialang.org/search?q=flux%20cb

We could just build that into train! as an option. The goal of #2120 is to make the same thing easy in "procedural" code.

@CarloLucibello CarloLucibello changed the title updates to the new train! allow non-tuple data in the new train! Nov 22, 2022
@CarloLucibello
Copy link
Member Author

I removed the callback addition from this PR as it is controversial.
We can merge this if we agree on supporting non-tuple data (I do).

src/train.jl Outdated Show resolved Hide resolved
Co-authored-by: Kyle Daruwalla <[email protected]>
@CarloLucibello CarloLucibello merged commit a5e5546 into master Nov 24, 2022
@CarloLucibello CarloLucibello mentioned this pull request Nov 24, 2022
@mcabbott mcabbott deleted the cl/batchme branch November 24, 2022 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants