Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Represent the absence of bias by bias=false #1407

Closed
wants to merge 2 commits into from

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Nov 27, 2020

This continues the good work of #1379 to its logical conclusion, by making the storage match the API: Dense(3,5; bias=false) literally stores randn(5,3), false, identity.

  • This neatly drops out of params, is handled well by loadparams! & destructure, automatically.
  • Since booleans are categorical in Zygote, they can be dropped completely in the gradient. The code from Adjoint for broadcasting A .+-* Bool Zygote.jl#838 is for now duplicated here for testing purposes.
  • Maybe there's some tiny cost to doing σ.(Wx .+ false) during inference, but it's pretty hard to detect.

Broadcasting identity.(Wx) is an easier cost to detect, since it copies the whole array. (Also only inference, removed by Zygote.) We could avoid both of these at the cost of a little complication, like some function mapadd(σ, Wx, b) to handle this, which I'm happy to write if there is interest. But nobody has thought this worth the complication so far --- which I read as saying that Flux values simplicity.

Smaller points:

src/utils.jl Outdated Show resolved Hide resolved
src/utils.jl Outdated Show resolved Hide resolved
@CarloLucibello
Copy link
Member

could you remove the conversion stuff? It's not related to this PR and I hope to merge #1393 soon

@mcabbott
Copy link
Member Author

Yes, can do. Probably easiest as a rebase once that's merged, rather than trying to roll back smoothly enough that git doesn't notice.

(It got added just because I landed in dispatch hell from the old implementation, somehow, and this was the quickest way out!)

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Jan 22, 2021

Since we've moved on from representing bias with false I would close this, feel free to open again if one thinks this needs a revisit. I'd be inclined to get #1402 in instead

@CarloLucibello
Copy link
Member

I still prefer this option to #1402 and to the solution we currently have. #1392 should be closed instead

@mcabbott
Copy link
Member Author

No, this is not done. It is awaiting a rebase now that #1393 is in, as mentioned. Will someone who has sufficient permissions please re-open?

@CarloLucibello
Copy link
Member

Will someone who has sufficient permissions please re-open?

@DhairyaLGandhi please upgrade @mcabbott and @darsnack roles, don't force me to keep asking and asking

@DhairyaLGandhi
Copy link
Member

Both of them are part of the same team as you or I.

@DhairyaLGandhi
Copy link
Member

Either way, I think this approach hasn't been in line with the api changes that we have wanted to make here.

@CarloLucibello
Copy link
Member

@DhairyaLGandhi
Copy link
Member

I am not inclined to deprecate Zeros or introduce a literal false, and it's somewhat surprising that folks here are not willing to consider the possibility of respecting existing code, their patterns, and the current solution also has questionable behavior with other tools. Added to that the solution in #1402 is generic enough to be used outside of this context without forcing any more burden on users as what this PR proposes and comes with more predictable behavior.

Now I can understand that mcabbott wants to say that non existence of a bias means it doesn't exist, end of story. That's fine. But given a set of parameters how would you determine the model they came from? The skeleton model that we load the parameters into was typically expected to have this information prebaked, but once I tell you these params come from a typical resnet, with these slight tweaks to some biases being missing, without knowing them a priori, a singleton approach can't handle it, and would not be able to load the params correctly. Not to mention figuring out how to rewrite the model building code to make these distinctions. This is in part because this partial model specification was not embedded into the params. With a fake array approach, that is mitigated, and can be turned into trainable parameters at will, and used without any extra runtime penalties. The issues with param saving and GPU support were also fixed in #1402, so I don't understand why it's being pushed down this hard. Maybe it's more flexibility than we absolutely need at this moment, but it doesn't exactly come at extra cost either.

@mcabbott
Copy link
Member Author

I think arguments in favour of Flux defining constant array types can be made in #1402. They would be strongest with references to deep learning models making use of them.

questionable behavior with other tools

Do you have a MWE?

given a set of parameters how would you determine the model they came from?

You can't. A vector of numbers has less information than a model. That's why destructure needs to return a function which re-creates the model.

@darsnack
Copy link
Member

Having been incidentally tagged on this PR, and now that I have spent over an hour catching up on the optional bias threads scattered all over, I have to say that I am strongly in favor of the approach used in this PR over #1402. It is more maintainable and simpler (which as the original PR notes, is something Flux should continue to strive towards). I'll comment a bit more on #1402 as my thoughts are around Zeros not this PR.

@darsnack darsnack mentioned this pull request Jan 22, 2021
4 tasks
@mcabbott mcabbott mentioned this pull request Feb 19, 2022
@mcabbott mcabbott deleted the nobias branch March 5, 2022 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GPU error when using Zeros() as bias in Conv layer
4 participants