Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stochastic depth example #67

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

christopher-beckham
Copy link

Hi,

https://arxiv.org/abs/1603.09382

This is in reference to an issue I opened recently, #66. I have sent a PR mainly to get feedback (there are some rough edges) and to see if I've made any mistakes in my implementation, rather than this be something that is ready to be accepted. Also, because I cannot access a GPU at the moment, I'm unable to run a proper experiment, e.g. using a very deep network, as this is what this method is designed for.

I'm aware that currently I have written the code so that it is only applicable to convnets -- I will need to make this applicable to dense layers as well.

Thanks

@f0k
Copy link
Member

f0k commented May 20, 2016

Thanks! I'm not so sure about the way the implementation uses the layer names... I'd start from the residual networks example and either modify the ElemwiseSumLayer to include a theano.ifelse.ifelse() (this would drop the same layers for all items in a minibatch, but that's how I understood the paper, and that's also what's in the code: https://github.com/yueatsprograms/Stochastic_Depth/blob/master/ResidualDrop.lua) or insert your modified dropout layer (in the long term, we should give Lasagne's DropoutLayer an argument to specify tied axes, there are a lot of use cases for that). This will require access to a GPU, though, so you can check whether you can reproduce the original paper at least for CIFAR-10.

I will need to make this applicable to dense layers as well.

Possible, but not required for reproducing their paper. I'd suggest to concentrate on that, if possible.

@f0k
Copy link
Member

f0k commented May 20, 2016

Actually, the Stochastic Depth code is very clean and easy to read, whether you know Torch or not. I'd suggest to start from this, rather than from the paper, and write a Python script reproducing results for CIFAR-10 (rather than a notebook). It would be cool to compare performance of theano.ifelse with zeroing things out.

@christopher-beckham
Copy link
Author

Thanks! I'll have a look and try simplify the code a bit. I didn't realize the same layers were dropped for all mini batch examples.

@f0k
Copy link
Member

f0k commented May 23, 2016

I'll have a look and try simplify the code a bit.

Thanks! Don't have time to review this week... anybody else?

I didn't realize the same layers were dropped for all mini batch examples.

It's more complicated to get any speed savings when dropping different layers per example.

@christopher-beckham
Copy link
Author

christopher-beckham commented May 23, 2016

Hi,

I have made some big changes (and more commits than necessary =/)! I decided to be a bit more faithful to the paper, as they use residual blocks to evaluate their algorithm, so I used some of the code you wrote here for the residual blocks:

Lasagne/Lasagne#531

I have also reproduced their architecture for CIFAR-10 which is described here:

https://github.com/yueatsprograms/Stochastic_Depth/blob/master/ResidualDrop.lua

I have implemented both kinds of dropout layers, one that uses ifelse and one that just multiplies with a mask. Both are written so that it will drop the same layers for all the examples in the minibatch.

Three epochs on the ifelse implementation took 124-125 sec and the other implementation 126-127 sec. Seems like the ifelse is marginally better.

Things I need to do:

  • Add batch norm
  • Make He initialisation work (I tried it but it gave me NaNs... maybe this only works with batch norm? I don't know)
  • Implement the linear decay schedule for p
  • Most importantly, reproduce the results for CIFAR10

@f0k
Copy link
Member

f0k commented May 23, 2016

I have made some big changes (and more commits than necessary =/)!

Don't worry, this can be cleaned up in retrospect if needed.

Sounds you've made some good progress, cool!

I used some of the code you wrote here for the residual blocks

Be sure to compare to https://github.com/yueatsprograms/Stochastic_Depth/blob/master/ResidualDrop.lua. I think it will be easy to just port their code to Lasagne.

@christopher-beckham
Copy link
Author

I should have some experiments to post here in the short future. I'm about to run some exps on CIFAR-10 with their linear decay schedule

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants