Stochastic depth example #67

christopher-beckham · 2016-05-20T00:53:41Z

Hi,

This is in reference to an issue I opened recently, #66. I have sent a PR mainly to get feedback (there are some rough edges) and to see if I've made any mistakes in my implementation, rather than this be something that is ready to be accepted. Also, because I cannot access a GPU at the moment, I'm unable to run a proper experiment, e.g. using a very deep network, as this is what this method is designed for.

I'm aware that currently I have written the code so that it is only applicable to convnets -- I will need to make this applicable to dense layers as well.

Thanks

f0k · 2016-05-20T10:20:35Z

Thanks! I'm not so sure about the way the implementation uses the layer names... I'd start from the residual networks example and either modify the ElemwiseSumLayer to include a theano.ifelse.ifelse() (this would drop the same layers for all items in a minibatch, but that's how I understood the paper, and that's also what's in the code: https://github.com/yueatsprograms/Stochastic_Depth/blob/master/ResidualDrop.lua) or insert your modified dropout layer (in the long term, we should give Lasagne's DropoutLayer an argument to specify tied axes, there are a lot of use cases for that). This will require access to a GPU, though, so you can check whether you can reproduce the original paper at least for CIFAR-10.

I will need to make this applicable to dense layers as well.

Possible, but not required for reproducing their paper. I'd suggest to concentrate on that, if possible.

f0k · 2016-05-20T10:29:37Z

Actually, the Stochastic Depth code is very clean and easy to read, whether you know Torch or not. I'd suggest to start from this, rather than from the paper, and write a Python script reproducing results for CIFAR-10 (rather than a notebook). It would be cool to compare performance of theano.ifelse with zeroing things out.

christopher-beckham · 2016-05-20T23:52:27Z

Thanks! I'll have a look and try simplify the code a bit. I didn't realize the same layers were dropped for all mini batch examples.

f0k · 2016-05-23T13:41:28Z

I'll have a look and try simplify the code a bit.

Thanks! Don't have time to review this week... anybody else?

I didn't realize the same layers were dropped for all mini batch examples.

It's more complicated to get any speed savings when dropping different layers per example.

christopher-beckham · 2016-05-23T13:59:04Z

Hi,

I have made some big changes (and more commits than necessary =/)! I decided to be a bit more faithful to the paper, as they use residual blocks to evaluate their algorithm, so I used some of the code you wrote here for the residual blocks:

Lasagne/Lasagne#531

I have also reproduced their architecture for CIFAR-10 which is described here:

https://github.com/yueatsprograms/Stochastic_Depth/blob/master/ResidualDrop.lua

I have implemented both kinds of dropout layers, one that uses ifelse and one that just multiplies with a mask. Both are written so that it will drop the same layers for all the examples in the minibatch.

Three epochs on the ifelse implementation took 124-125 sec and the other implementation 126-127 sec. Seems like the ifelse is marginally better.

Things I need to do:

Add batch norm
Make He initialisation work (I tried it but it gave me NaNs... maybe this only works with batch norm? I don't know)
Implement the linear decay schedule for p
Most importantly, reproduce the results for CIFAR10

f0k · 2016-05-23T14:20:57Z

I have made some big changes (and more commits than necessary =/)!

Don't worry, this can be cleaned up in retrospect if needed.

Sounds you've made some good progress, cool!

I used some of the code you wrote here for the residual blocks

Be sure to compare to https://github.com/yueatsprograms/Stochastic_Depth/blob/master/ResidualDrop.lua. I think it will be easy to just port their code to Lasagne.

christopher-beckham · 2016-06-03T15:54:50Z

I should have some experiments to post here in the short future. I'm about to run some exps on CIFAR-10 with their linear decay schedule

add stochastic depth example

296bfd1

Christopher Beckham added 3 commits May 22, 2016 15:51

overhaul example and add py script

7ba6dc1

clean py script

56fc6b6

fix bugs

bb4879a

bug fixes

f96d307

make resblock method more like in stochastic depth lua code

49747c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stochastic depth example #67

Stochastic depth example #67

christopher-beckham commented May 20, 2016

f0k commented May 20, 2016

f0k commented May 20, 2016

christopher-beckham commented May 20, 2016

f0k commented May 23, 2016

christopher-beckham commented May 23, 2016 •

edited

Loading

f0k commented May 23, 2016

christopher-beckham commented Jun 3, 2016

Stochastic depth example #67

Are you sure you want to change the base?

Stochastic depth example #67

Conversation

christopher-beckham commented May 20, 2016

f0k commented May 20, 2016

f0k commented May 20, 2016

christopher-beckham commented May 20, 2016

f0k commented May 23, 2016

christopher-beckham commented May 23, 2016 • edited Loading

f0k commented May 23, 2016

christopher-beckham commented Jun 3, 2016

christopher-beckham commented May 23, 2016 •

edited

Loading