-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General hyperparameters? #16
Comments
@zizhaozhang that's probably related to #17, DenseNet doesn't use whitened data |
OMFG! Thanks for that paper, it's totally the same concept as I've recently figured out and had been testing. NNs grow up by days, not years right now. |
Yes, as far as I see from their pictures, it's nearly identical to my hoardNet, so you might as well just use my code, maybe, modify it a little bit. https://github.com/ibmua/Breaking-Cifar Check out the "hoard" models. Major parameters there are "sequences" and "depth", though, I think "2-x"+ models I've designed to be run with depth=2. Earlier models are more generic. But the thing should be easily tweakable, has a clean code. Mind that it has 4-space tabs. BTW, hoard-2-x is that model that I referred to as possibly being comparable to WRN in terms of performance/parameters. IMHO HoardNet sounds like a more meaningful name for this thing =D The info is being hoarded without discarding it like in usual architectures. Accumulation would be a less reasonable way to call this, because it sounds more like something ResNets do. DenseNet doesn't seem too reasonable of a name to me. https://github.com/ibmua/Breaking-Cifar/blob/master/logs/load_59251794/log.txt log from near-end of that training where I had 19.5% on my [0..1]-scaled Cifar-100+ |
Their code is also available. At https://github.com/liuzhuang13/DenseNet/blob/master/densenet.lua . They are using preactivation, which is different from what I've used. I thought that that may be beneficial, but that needed more testing and I don't have too many resources. =) That's a lot more expensive, though. But the difference may be well worth it. Some other things I've designed differently may actually work better, I think. I think I took a bit from Inception & InceptionResNet when they've only took from ResNet. |
@szagoruyko I see. I will have a try of that. It is really trick. Thanks @ibmua for observing that. I have tried so many different tests using WRN code to train DenseNet, ignoring this part. A little different is that @ibmua uses [0,1] scaled data and DenseNet uses mean&std as fb-resnet.torch does. Which do you think is better? |
Mean+std is likely a little bit better. |
Cool. I will check your code. |
Hi,
My question seems a bit unrelated. But I am really curious, so sorry for interrupt,
WRN uses a quite different weightDecay and learning rate schedule scheme from fb-resnet-torch used. As the WRN paper mentions, the accuracy of pre-act-Resnet trained with WRN learning scheme gets better results. So I think this hyperparameter setting is quite good and generalized.
Recently, I am using WRN code to train a new method, DENSELY CONNECTED CONVOLUTIONAL NETWORKS (DenseNet) http://arxiv.org/pdf/1608.06993v1.pdf, but the error is larger than the accuracy trained using fb-resnet-torch code (5.2 vs. 4.1 (original paper reported)).
I understand the hyperparameters may vary from model to model. But by so many tests of WRN, I think this setting should not have obtained more than 1.0 increased error rate. WRN paper does not discuss much about how they select the hyperparameters.
I am not sure if you are familiar with this new method (DenseNet), could you comment on this situation?
In addition, could you provide more details about how you select the hyperparameters instead of using fb-resnet-torch code's settings. It will be very helpful for us to train some modified architecture based on WRN to delve better hyperparameter settings.
Thanks a lot!
The text was updated successfully, but these errors were encountered: