-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Activation Problem #26
Comments
You seem to be right. If you add it, I'll be happy to accept a pull request. Otherwise, I currently don't have time to work on this, but maybe later this year. |
But it does not really differ from the results I got before without these activations. Btw, your example code is great! |
Thanks, and thank you for trying to fix it. 👍 1.8% is much better than 8% error. As far as I know, the activation has to be performed after the convolution, not the pooling layers. In the Tensorflow tutorial, they also change the learning rate to 0.001 and add a Dropout layer after the first dense (fully-connected) layer. Even with that, they achieve 97.3% test accuracy: https://www.tensorflow.org/tutorials/estimators/cnn It should be possible to get 0.8% error with this network, but it may also require to change the optimizer from standard SGD to Momentum SGD or Adam. |
Thanks for the information and the link to the tutorial. Ok, so I will post some new code which handles the activation after the convolution bias layer. In the tutorial there is no bias used. On this page http://cs231n.github.io/neural-networks-1/ I found the information, that the bias is added and then after that the activation function is applied. So I will applied the activation directly before the pooling (instead of after the pooling). Added also also DropOut Layer. Its great to have a simple example, but some optional improvements (separated by #ifdef .. #endif) would be great too, so that one knows, what to change to move from SGD to Nesterov’s Accelerated Momentum (NAG). NAG:
(found on http://cs231n.github.io/neural-networks-3/) on a Momentum SGD :
I implemented Momentum SGD already with mu=0.9 and it gives similar results, but learning_rate now must be lower.. it has a high error on learning rate 0.01 in my implementation. |
EDIT: added a pull-request instead. so I removed the code here Training dataset size: 60000, Test dataset size: 10000 Training dataset size: 60000, Test dataset size: 10000 Training dataset size: 60000, Test dataset size: 10000 so a faster convergence with Nesterov’s Accelerated Momentum would really help to reduce the high amount of iterations! |
And I applied the DropOut Layer (directly after the FullyConnected1 Layer) using this code: Training dataset size: 60000, Test dataset size: 10000 Batch size: 32 DropOut Rate = 0.4 changing the learning rate to 0.001 did not work for me. error was even increasing. |
I finally got NAG + SGD Momentum working. NesterovMomentumWeightUpdate Momentum=0.9 Learning Rate: 0.001 And I noticed this: Any idea why this happens? CUDA kernel:
|
I don't know why the gradients explode, and I can't go over your code in this form. It's hard to read without a proper diff and hard to test when I need to apply your changes. Please create a pull request. To create a pull request, first you have to fork the repository through Github, then commit your changes and push them to your Github. At this point Github will ask if you want to create a pull request if you browse to your version of the repo. If not, you can still go to my version, click "Pull Requests" and create a new one from there. |
I opened a pull-request. only the commits of lenet.cu (RElu, Nestorov, DropOut, Adam) are the ones I wanted to submit in there, but all other commits seem to be in the pull-request, too. so please ignore the others. I do not use git command line, so I can not change the pull request. |
I know that this repository is about showing the features of the CUDNN library but still, OOP style is needed... |
@BilgehanSel |
@BilgehanSel @3DdeepAI While these are both excellent examples of how to train with CUDNN, in my opinion they're missing the point of this sample. SelCNN actually starts looking like Caffe in its early days, and this is what I wanted to avoid with this repository. I wanted to create a concise, clear example of how CUDNN can be used for training, in one file. Supporting all the bells and whistles that come along is what frameworks are for. This is also why I have not yet merged the PR as is. I think the activation part should be added, but all the extra stuff is making the sample too heavy IMO. Unfortunately I'm too busy to do it right now, but when I have time, I'll take parts of that PR and integrate them, if that's OK. |
@tbennun OK, you're right a simple sample should not have all the other stuff. So I closed the PR's and created a new one with RELU activations only UPDATE: |
@3DdeepAI thank you for understanding, and thank you for taking the time to create another PR. 👍 |
Sorry to bother if I'm wrong but, it seems like there is no activation between convolution layers...
The text was updated successfully, but these errors were encountered: