Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Dense and Conv BatchEnsemble layers along with unit tests and example on MNIST classification using LeNet5 #4

Merged
merged 3 commits into from
Sep 7, 2021

Conversation

DwaraknathT
Copy link
Collaborator

  • Added BatchEnsemble layers -- the idea is to factorize the weight matrix of each member in the ensemble into 3 matrices. 1 full matrix of the same shape as layer's weight matrix, and 2 fast matrices (usually rank-1 matrices). A model's weights are generated by taking the cross product of the two fast matrices and taking the hadamard product between the resultant matrix and full matrix.
  • Added unit tests for both batch ensemble layers
  • Added an example on using batch ensemble layers for MNIST classification using LeNet 5

Copy link
Member

@DhairyaLGandhi DhairyaLGandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on starting to Add GPU tests along with the regular ones? In theory it should be as straightforward as gpu(layer), gpu(input), @test ...

Comment on lines +77 to +92
function ConvBatchEnsemble(
k::NTuple{N,Integer},
ch::Pair{<:Integer,<:Integer},
rank::Integer,
ensemble_size::Integer,
σ = identity;
init = glorot_normal,
alpha_init = glorot_normal,
gamma_init = glorot_normal,
stride = 1,
pad = 0,
dilation = 1,
groups = 1,
bias = true,
ensemble_bias = true,
ensemble_act = identity,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as last time about keeping things simple and general.

Maybe it makes sense to have a constructor that takes in a Conv layer directly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does. I guess we can have both as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually need the input/output dimensions to create the alpha/gamma matrices. Might as well keep them in the signature, or we'll have to infer them from the conv layer's struct and that might change anytime in flux source ?

src/layers/BatchEnsemble/conv.jl Show resolved Hide resolved
ensemble_act::F = identity,
rank = 1,
) where {M,F,L}
ensemble_bias = create_bias(gamma, ensemble_bias, size(gamma)[1], size(gamma)[2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test it with FluxML/Flux.jl#1402

alpha = repeat(alpha, samples_per_model)
gamma = repeat(gamma, samples_per_model)
# Reshape alpha, gamma to [units, batch_size, rank]
e_b = reshape(e_b, (1, 1, out_size, batch_size))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size of the bias seems relevant here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know that the shape of the bias allocated can fit into the container its expected to be in

src/layers/BatchEnsemble/dense.jl Outdated Show resolved Hide resolved
outputs = sum(outputs, dims = 3)
outputs = reshape(outputs, (out_size, samples_per_model, ensemble_size))
# Reshape ensemble bias
e_b = Flux.unsqueeze(e_b, ndims(e_b))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: Are the sizes of bias somewhat variable in these methods?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, you meant the physical size in the memory ? no, those sizes are not variable. There are a fixed number of elements in the bias, we just change the shape of the array. If you meant the logical size (shape in numpy terms) the yes, they are variable.

alpha = reshape(alpha, (in_size, ensemble_size * rank))
gamma = reshape(gamma, (out_size, ensemble_size * rank))
# Repeat breaks on GPU when input dims > 2
alpha = repeat(alpha, samples_per_model)
Copy link
Member

@DhairyaLGandhi DhairyaLGandhi Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to materialise this array or can we broadcast it to higher dimensions. Something like

julia> x = ones(3,3)
3×3 Matrix{Float64}:
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0

julia> y = zeros(3,3,3)
3×3×3 Array{Float64, 3}:
[:, :, 1] =
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

[:, :, 2] =
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

[:, :, 3] =
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

julia> x .+ y
3×3×3 Array{Float64, 3}:
[:, :, 1] =
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0

[:, :, 2] =
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0

[:, :, 3] =
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0

Notice that the lower dimension array was broadcasted to the higher dimensions automatically

Copy link
Collaborator Author

@DwaraknathT DwaraknathT Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already broadcasting the input for the last dimension (the rank dimension). I think we have to materialize the array because, conceptually, the idea is to take a minibatch of samples (of batch size B), repeat them N times to have an effective minibatch of B*N. Now, we want each N copy of the B samples to be give a different ensemble model weights. So we need the fast weights (alpha, gamma) to have the same size as batch size for to be broadcasted for the final dimension.

Also, the starting shape of the fast weights is (in_size, ensemble_size, rank) while input shape is (in_size, batch_size) -- so we need the repeat call to match the dimensions for * op.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the samples are always the same, so why would it matter if its materialised or not?

1. Reduce imports and move them to main file
2. Renamed test file names
3. Added GPU tests for layers -- for now it's basic forward pass etc
@DwaraknathT DwaraknathT merged commit ed3d16c into main Sep 7, 2021
DwaraknathT added a commit that referenced this pull request Sep 7, 2021
…ts and example on MNIST classification using LeNet5 (#4)"

This reverts commit ed3d16c.
DwaraknathT added a commit that referenced this pull request Sep 7, 2021
…ts and example on MNIST classification using LeNet5 (#4)" (#14)

This reverts commit ed3d16c.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants