Activation Problem #26

BilgehanSel · 2018-07-31T03:13:19Z

Sorry to bother if I'm wrong but, it seems like there is no activation between convolution layers...

tbennun · 2018-08-01T11:24:20Z

You seem to be right. If you add it, I'll be happy to accept a pull request. Otherwise, I currently don't have time to work on this, but maybe later this year.

ghost · 2018-08-22T05:50:51Z

But it does not really differ from the results I got before without these activations.
I put the activations directly after the pooling layers. Should they maybe be somewhere else?
What would be good alpha + beta values for them ?

Btw, your example code is great!

tbennun · 2018-08-22T08:57:46Z

Thanks, and thank you for trying to fix it. 👍 1.8% is much better than 8% error.

As far as I know, the activation has to be performed after the convolution, not the pooling layers. In the Tensorflow tutorial, they also change the learning rate to 0.001 and add a Dropout layer after the first dense (fully-connected) layer. Even with that, they achieve 97.3% test accuracy: https://www.tensorflow.org/tutorials/estimators/cnn

It should be possible to get 0.8% error with this network, but it may also require to change the optimizer from standard SGD to Momentum SGD or Adam.
Since I didn't want to make the example complicated, I left the optimizer as simple as possible. What do you think?

ghost · 2018-08-22T21:57:26Z

Thanks for the information and the link to the tutorial. Ok, so I will post some new code which handles the activation after the convolution bias layer. In the tutorial there is no bias used.

On this page http://cs231n.github.io/neural-networks-1/ I found the information, that the bias is added and then after that the activation function is applied. So I will applied the activation directly before the pooling (instead of after the pooling). Added also also DropOut Layer.
The 1.8% I got because of 10000 iterations instead of 1000 iterations also without the activation function.

Its great to have a simple example, but some optional improvements (separated by #ifdef .. #endif) would be great too, so that one knows, what to change to move from SGD to Nesterov’s Accelerated Momentum (NAG).
I'm not sure, whether I understood it right. the "UpdateWeights" function is the "optimiser", right? So for Nesterov I would have to change all of the cublasSaxpy(..) calls with some math operation (using a CUDA kernel)

NAG:

v_prev = v # back this up
v = mu * v - learning_rate * dx # velocity update stays the same
x += -mu * v_prev + (1 + mu) * v # position update changes form

(found on http://cs231n.github.io/neural-networks-3/)
for momentum I use 0.9

on a Momentum SGD :

    v[i] = mu*v[i] + learning_rate * gradient[i]
    weights[i] += v[i]

I implemented Momentum SGD already with mu=0.9 and it gives similar results, but learning_rate now must be lower.. it has a high error on learning rate 0.01 in my implementation.

ghost · 2018-08-22T23:18:07Z

EDIT: added a pull-request instead. so I removed the code here

Training dataset size: 60000, Test dataset size: 10000
Batch size: 32, iterations: 20000
Classification result: 1.51% error (used 10000 images)

Training dataset size: 60000, Test dataset size: 10000
Batch size: 32, iterations: 100000
Classification result: 0.98% error (used 10000 images)

Training dataset size: 60000, Test dataset size: 10000
Batch size: 32, iterations: 200000
Classification result: 0.91% error (used 10000 images)

so a faster convergence with Nesterov’s Accelerated Momentum would really help to reduce the high amount of iterations!

ghost · 2018-08-23T00:45:36Z

And I applied the DropOut Layer (directly after the FullyConnected1 Layer) using this code:
https://devtalk.nvidia.com/default/topic/1028240/cudnn/how-to-implement-a-dropout-layer-using-cudnn-/

Training dataset size: 60000, Test dataset size: 10000 Batch size: 32 DropOut Rate = 0.4
iterations: 500000 Classification result: 0.84% error (used 10000 images)
iterations: 200000 Classification result: 0.86% error (used 10000 images)
iterations: 100000 Classification result: 0.93% error (used 10000 images)
iterations: 10000 Classification result: 1.72% error (used 10000 images)

changing the learning rate to 0.001 did not work for me. error was even increasing.
so next I will try to change the SGD to Nesterov’s Accelerated Momentum.

ghost · 2018-08-23T06:30:09Z

I finally got NAG + SGD Momentum working.

NesterovMomentumWeightUpdate Momentum=0.9 Learning Rate: 0.001
Training dataset size: 60000, Test dataset size: 10000 Batch size: 32,
LEARNING_RATE_POLICY_GAMMA 0.0001
LEARNING_RATE_POLICY_POWER 0.75
iterations: 100000 Classification result: 0.80% error (used 10000 images)
iterations: 20000 Classification result: 1.32% error (used 10000 images)
iterations: 10000 Classification result: 1.86% error (used 10000 images)

And I noticed this:
NesterovMomentumWeightUpdate Momentum=0.9
Training dataset size: 60000, Test dataset size: 10000 Batch size: 32,
LEARNING_RATE 0.005
LEARNING_RATE_POLICY_GAMMA 0.00001
LEARNING_RATE_POLICY_POWER 0.8
iterations: 1000 Classification result: 3.43% error (used 10000 images)
iterations: 10000 Classification result: 88.65% error (used 10000 images)
iterations: 20000 Classification result: 88.20% error (used 10000 images)

Any idea why this happens?

CUDA kernel:

__global__ void NesterovMomentumWeightUpdate(float *weights,  float *gradients, float *v, float learning_rate,  int size)
{
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx >= size)
        return;

    float v_prev = v[idx];
    const float MomentumUpdate = 0.9f;
    v[idx] = MomentumUpdate * v[idx] + learning_rate * gradients[idx];
                    // here + learning_rate (cause its already negated)
    weights[idx] += -MomentumUpdate * v_prev + (1.0f + MomentumUpdate) * v[idx];

#if 0  // TEST ONLY    SGD Momentum
    const float MomentumUpdate = 0.9f;
    v[idx] = MomentumUpdate * v[idx] + learning_rate * gradients[idx];
    weights[idx] += v[idx];
#endif

#if 0 // TEST ONLY   (same as calling cublasSaxpy(...))
    // pure SGD:
    float  v0  = learning_rate * gradients[idx];
    weights[idx] += v0;
#endif
}

tbennun · 2018-08-23T12:56:57Z

I don't know why the gradients explode, and I can't go over your code in this form. It's hard to read without a proper diff and hard to test when I need to apply your changes. Please create a pull request.

To create a pull request, first you have to fork the repository through Github, then commit your changes and push them to your Github. At this point Github will ask if you want to create a pull request if you browse to your version of the repo. If not, you can still go to my version, click "Pull Requests" and create a new one from there.
Please refer to the official guide for more information: https://help.github.com/articles/creating-a-pull-request/

ghost · 2018-08-23T23:31:34Z

I opened a pull-request. only the commits of lenet.cu (RElu, Nestorov, DropOut, Adam) are the ones I wanted to submit in there, but all other commits seem to be in the pull-request, too. so please ignore the others. I do not use git command line, so I can not change the pull request.

BilgehanSel · 2018-09-09T18:35:09Z

I know that this repository is about showing the features of the CUDNN library but still, OOP style is needed...
Here is my version, easier to understand since layers are divided into their own classes.
https://github.com/BilgehanSel/SelCNN

ghost · 2018-09-09T19:59:20Z

@BilgehanSel
However, in your SoftmaxLossBackprop() function you do a different handling as in the original code. Why? Do you yield better results than 0.80 with that code on the MNIST dataset in less than 10000 iterations?
You do softmax twice. One time on that last FullyConnectedLayer and one time on the output. The original code does it only on the last fullyconnected layer.

tbennun · 2018-09-09T20:42:47Z

@BilgehanSel @3DdeepAI While these are both excellent examples of how to train with CUDNN, in my opinion they're missing the point of this sample. SelCNN actually starts looking like Caffe in its early days, and this is what I wanted to avoid with this repository. I wanted to create a concise, clear example of how CUDNN can be used for training, in one file. Supporting all the bells and whistles that come along is what frameworks are for.

This is also why I have not yet merged the PR as is. I think the activation part should be added, but all the extra stuff is making the sample too heavy IMO. Unfortunately I'm too busy to do it right now, but when I have time, I'll take parts of that PR and integrate them, if that's OK.

ghost · 2018-09-09T21:25:18Z

@tbennun OK, you're right a simple sample should not have all the other stuff. So I closed the PR's and created a new one with RELU activations only
The latest commit there contains all necessary changes:
https://github.com/3DdeepAI/3DdeepAI/commit/d7764241ba357ca0ec581fa726fa75291e97017c#diff-8526a070794ac85f9da83e9dbf728cbf
simply please ignore all other commits.

UPDATE:
new PR: #30 with one commit

tbennun · 2018-09-09T22:10:47Z

@3DdeepAI thank you for understanding, and thank you for taking the time to create another PR. 👍
I'll take a look at it soon and we can keep discussing it there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activation Problem #26

Activation Problem #26

BilgehanSel commented Jul 31, 2018

tbennun commented Aug 1, 2018

ghost commented Aug 22, 2018 •

edited by ghost

Loading

tbennun commented Aug 22, 2018

ghost commented Aug 22, 2018 •

edited by ghost

Loading

ghost commented Aug 22, 2018 •

edited by ghost

Loading

ghost commented Aug 23, 2018 •

edited by ghost

Loading

ghost commented Aug 23, 2018 •

edited by ghost

Loading

tbennun commented Aug 23, 2018

ghost commented Aug 23, 2018 •

edited by ghost

Loading

BilgehanSel commented Sep 9, 2018

ghost commented Sep 9, 2018 •

edited by ghost

Loading

tbennun commented Sep 9, 2018

ghost commented Sep 9, 2018 •

edited by ghost

Loading

tbennun commented Sep 9, 2018

Activation Problem #26

Activation Problem #26

Comments

BilgehanSel commented Jul 31, 2018

tbennun commented Aug 1, 2018

ghost commented Aug 22, 2018 • edited by ghost Loading

tbennun commented Aug 22, 2018

ghost commented Aug 22, 2018 • edited by ghost Loading

ghost commented Aug 22, 2018 • edited by ghost Loading

ghost commented Aug 23, 2018 • edited by ghost Loading

ghost commented Aug 23, 2018 • edited by ghost Loading

tbennun commented Aug 23, 2018

ghost commented Aug 23, 2018 • edited by ghost Loading

BilgehanSel commented Sep 9, 2018

ghost commented Sep 9, 2018 • edited by ghost Loading

tbennun commented Sep 9, 2018

ghost commented Sep 9, 2018 • edited by ghost Loading

tbennun commented Sep 9, 2018

ghost commented Aug 22, 2018 •

edited by ghost

Loading

ghost commented Aug 22, 2018 •

edited by ghost

Loading

ghost commented Aug 22, 2018 •

edited by ghost

Loading

ghost commented Aug 23, 2018 •

edited by ghost

Loading

ghost commented Aug 23, 2018 •

edited by ghost

Loading

ghost commented Aug 23, 2018 •

edited by ghost

Loading

ghost commented Sep 9, 2018 •

edited by ghost

Loading

ghost commented Sep 9, 2018 •

edited by ghost

Loading