-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
could you share your script for producing these protos and results of each model? #1
Comments
@jiangxuehan Hi! Actually because in my implementation of the model I can specify an entire DenseBlock (tens of transitions) as one layer, so the entire DenseBlock was manually created by prototxt, and there is no scripts for generate proto. But there are only about 10 layers if you look at each prototxt so I think it should be manually doable. |
@Tongcheng I have run models in this repo, for k=12 and L=100, the accuracy on cifar10+ is 94.8%, but it should be 95.5% according to the paper. Looking forward your results. |
@jiangxuehan Thanks for pointing out! I currently have the same result, which is about 0.8% lower than torch counterpart. |
@jiangxuehan It turns out caffe's datalayer is feeding data without permutation, now I add a flag to permute the data, which turns the accuracy to 95.2% |
@Tongcheng. Thanks for your reply. Using ImageDataLayer with shuffle option can get the same result(95.2%) with your modified DataLayer. Do you think if there are some other differences between Torch and Caffe that will affect model performance? |
@jiangxuehan Currently I have no definitive conclusion of the remaining 0.3% divergence, but there are several hypothesis: |
@jiangxuehan Also, I think my datalayer with random option should be superior than the default imageDataLayer implementation because ImageDataLayer did the shuffling on a vector of Datum, which are quite big objects but I did shuffling on index of Datum which are smaller objects. So there should be some time efficiency within my implementation. |
Hi, I found your explanation about different between caffe and torch implementation for BN. I guess you tried to modify the BatchNorm layer in caffe to make it is similar the torch. How much different performance by using the modification? In addition, I also want to use the modify for my Caffe version. So, I just copy/paste your Finally, the BN often follows by a Scale layer. But in your prototxt, I did not see the Scale layer after BN, such as https://github.com/Tongcheng/DN_CaffeScript/blob/master/train_test_BCBN_C10plus.prototxt. Or Did you already integrate it together?
|
Hi @John1231983 , the torch version's use cudnn version of BatchNormalization, which already includes the scale layer in the function, so in my version of caffe's modified BatchNorm, there is no need to put additional ScaleLayer behind BatchNorm. The difference is mainly the difference in smoothing factors of EMA, which means different training curve shapes by different batchNorm. |
Thanks for point out this. Could you tell me which file did you change for BatchNorm layer? I would like to test it in my caffe version by changing these file. I check the |
No description provided.
The text was updated successfully, but these errors were encountered: