-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Represent the absence of bias by bias=false
#1407
Conversation
could you remove the conversion stuff? It's not related to this PR and I hope to merge #1393 soon |
Yes, can do. Probably easiest as a rebase once that's merged, rather than trying to roll back smoothly enough that git doesn't notice. (It got added just because I landed in dispatch hell from the old implementation, somehow, and this was the quickest way out!) |
Since we've moved on from representing bias with |
No, this is not done. It is awaiting a rebase now that #1393 is in, as mentioned. Will someone who has sufficient permissions please re-open? |
@DhairyaLGandhi please upgrade @mcabbott and @darsnack roles, don't force me to keep asking and asking |
Both of them are part of the same team as you or I. |
Either way, I think this approach hasn't been in line with the api changes that we have wanted to make here. |
org members' permissions should be set to Write |
I am not inclined to deprecate Zeros or introduce a literal false, and it's somewhat surprising that folks here are not willing to consider the possibility of respecting existing code, their patterns, and the current solution also has questionable behavior with other tools. Added to that the solution in #1402 is generic enough to be used outside of this context without forcing any more burden on users as what this PR proposes and comes with more predictable behavior. Now I can understand that mcabbott wants to say that non existence of a bias means it doesn't exist, end of story. That's fine. But given a set of parameters how would you determine the model they came from? The skeleton model that we load the parameters into was typically expected to have this information prebaked, but once I tell you these params come from a typical resnet, with these slight tweaks to some biases being missing, without knowing them a priori, a singleton approach can't handle it, and would not be able to load the params correctly. Not to mention figuring out how to rewrite the model building code to make these distinctions. This is in part because this partial model specification was not embedded into the params. With a fake array approach, that is mitigated, and can be turned into trainable parameters at will, and used without any extra runtime penalties. The issues with param saving and GPU support were also fixed in #1402, so I don't understand why it's being pushed down this hard. Maybe it's more flexibility than we absolutely need at this moment, but it doesn't exactly come at extra cost either. |
I think arguments in favour of Flux defining constant array types can be made in #1402. They would be strongest with references to deep learning models making use of them.
Do you have a MWE?
You can't. A vector of numbers has less information than a model. That's why |
Having been incidentally tagged on this PR, and now that I have spent over an hour catching up on the optional bias threads scattered all over, I have to say that I am strongly in favor of the approach used in this PR over #1402. It is more maintainable and simpler (which as the original PR notes, is something Flux should continue to strive towards). I'll comment a bit more on #1402 as my thoughts are around |
This continues the good work of #1379 to its logical conclusion, by making the storage match the API:
Dense(3,5; bias=false)
literally storesrandn(5,3), false, identity
.params
, is handled well byloadparams!
&destructure
, automatically.The code from Adjoint for broadcastingA .+-* Bool
Zygote.jl#838 is for now duplicated here for testing purposes.σ.(Wx .+ false)
during inference, but it's pretty hard to detect.Broadcasting
identity.(Wx)
is an easier cost to detect, since it copies the whole array. (Also only inference, removed by Zygote.) We could avoid both of these at the cost of a little complication, like some functionmapadd(σ, Wx, b)
to handle this, which I'm happy to write if there is interest. But nobody has thought this worth the complication so far --- which I read as saying that Flux values simplicity.Smaller points:
Zeros()
andZeros(dims...)
.create_bias
from Fix some issues with Zeros option 2 #1379 is retained, and now also checks the size of the bias vector, if the user provides their own.conv.jl
has fewer lines of code, by moving thereshape(c.bias, map(_->1, c.stride)..., :, 1)
thing to a function, instead of repeating it.There is some overlap with remove implicit conversions #1393, as this was written some time ago. This PR makes the type conversion noisy & explicit, although removing it entirely might be cleaner.