This project is based on TensorFlow 2
and has implemented representative convolutional neural networks in recent years, which are trained on the CIFAR-10
dataset and suitable for image classification tasks. The basic architecture of the network refers to the original papers on arXiv
as much as possible, and some of them have been modified for the CIFAR-10 dataset. The best accuracy is 97.05%.
- Python 3.7
- TensorFlow-gpu 2.1
- Jupyter Notebook
- GPU: NVIDIA TESLA P100
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
- AlexNet (2012) : ImageNet Classification with Deep ConvolutionalNeural Networks
- NetworkInNetwork (2014) : Network In Network
- VGG (2014) : Very Deep Convolutional Networks for Large-Scale Image Recognition
- InceptionV1 (2014) : Going Deeper with Convolutions
- InceptionV2, InceptionV3 (2015) :
- InceptionV4, InceptionResNet (2016) :
- ResNet (2015,2016) :
- DilatedConvolution (2016) : Multi-Scale Context Aggregation by Dilated Convolutions
- SqueezeNet (2016) : SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
- Stochastic Depth (2016) : Deep Networks with Stochastic Depth
- FractalNet (2017) : Ultra-Deep Neural Networks without Residuals
- Xception (2017) : Xception: Deep Learning with Depthwise Separable Convolutions
- PyramidNet (2017) : Deep Pyramidal Residual Networks
- ResNeXt (2017) : Aggregated Residual Transformations for Deep Neural Networks
- WideResNet (2017) : Wide Residual Networks
- DenseNet (2017) : Densely Connected Convolutional Networks
- DualPathNet (2017) : Dual Path Networks
- ShuffleNet :
- MobileNet :
- SENet (2019) : Squeeze-and-Excitation Networks
- CBAM (2018) : CBAM: Convolutional Block Attention Module
- SKNet (2019) : Selective Kernel Networks
- EfficientNet (2019) : EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
- ResNeSt (2020) : ResNeSt: Split-Attention Networks
- Other :
- tricks : Bag of Tricks for Image Classification with Convolutional Neural Networks
- Hyperbolic-Tangent decay : Stochastic Gradient Descent with Hyperbolic-Tangent Decay on Classification
- NasNet : Learning Transferable Architectures for Scalable Image Recognition
- AmoebaNet : Regularized Evolution for Image Classifier Architecture Search
Dataset: CIFAR-10
No Pre-train
Network | Params | Batch Size | Epochs | Time Per Epoch | Total Time | Accuracy | Remarks |
---|---|---|---|---|---|---|---|
AlexNet | 9.63M | 128 | 100 | 36s | 1h | 78.44% | |
NIN | 0.97M | 128 | 100 | 36s | 1h | 90.38% | |
VGG16 | 33.69M | 128 | 100 | 41s | 1h 8min | 92.34% | |
InceptionV1 | 0.37M | 128 | 100 | 42s | 1h 10min | 93.02% | simplified |
InceptionV2 | 0.65M | 128 | 100 | 51s | 1h 25min | 93.40% | simplified |
InceptionV3 | 1.17M | 128 | 100 | 55s | 1h 30min | 94.20% | simplified |
InceptionV4 | 2.57M | 128 | 100 | 104s | 2h 53min | 94.55% | simplified |
ResNet18 | 11.18M | 128 | 150 | 39s | 1h 38min | 95.11% | pre-act |
ResNet50 | 23.59M | 128 | 100 | 88s | 2h 27min | 94.55% | pre-act |
DilatedConv | 2.02M | 128 | 100 | 92s | 2h 33min | 93.22% | |
SqueezeNet | 0.73M | 32 | 100 | 35s | 58min | 88.41% | light-weight |
StochasticDepth | 23.59M | 128 | 100 | 92s | 2h 33min | 95.07% | ResNet50 |
FractalNet | 33.76M | 128 | 100 | 48s | 1h 20min | 94.32% | |
Xception | 1.36M | 128 | 100 | 54s | 1h 30min | 94.56% | simplified |
PyramidNet110 | 9.90M | 128 | 100 | 185s | 5h 8min | 95.65% | |
ResNeXt50 | 23.11M | 128 | 100 | 210s | 5h 50min | 95.43% | 32×4d |
WideResNet | 36.51M | 128 | 150 | 138s | 5h 45min | 95.94% | 28-10 |
DenseNet100 | 3.31M | 128 | 150 | 159s | 6h 38min | 95.57% | 100-24 |
DenseNet121 | 7.94M | 128 | 100 | 110s | 3h 3min | 94.91% | 121-32 |
DualPathNet50 | 21.05M | 128 | 100 | 220s | 6h 7min | 95.44% | |
DualPathNet92 | 34.38M | 128 | 100 | 370s | 10h 17min | 95.78% | |
ShuffleNetV2 | 1.28M | 128 | 100 | 39s | 1h 5min | 92.41% | light-weight |
MobileNetV3 | 4.21M | 128 | 100 | 66s | 1h 50min | 94.85% | light-weight |
SE-ResNet50 | 26.10M | 128 | 100 | 110s | 3h 3min | 95.37% | |
SE-ResNeXt50 | 25.59M | 128 | 120 | 270s | 9h | 96.12% | 32×4d |
SE-WideResNet | 36.86M | 128 | 150 | 175s | 7h 18min | 96.60% | 28-10 |
SE-WideResNet_2 | 36.86M | 128 | 220 | 143s | 8h 45min | 97.05% | more tricks |
SENet154 | 567.9M | 128 | 100 | ---- | ----- | ----- | |
CBAM-ResNet50 | 26.12M | 128 | 100 | 154s | 4h 17min | 95.01% | |
SKNet | 6.73M | 256 | 100 | 205s | ----- | ----- | |
EfficientNetB0 | 3.45M | 64 | 100 | 390s | ----- | ----- |
SOTA : SE-WideResNet (more tricks) (Acc. : 97.05%)
Remarks :
- simplified : replace the stem structure with one convolutional layer, channels are divided by 4
- pre-act : ResNet V2 (full pre-activation)
- light-weight : smaller efficient CNN architecture which is suitable for mobile and embedded vision applications
Details of the SOTA network :
- Architecture :
- WideResNet (depth=28, k=10) (improved)
- Squeeze-and-Excitation Block
- Policy : mixed_precision (FP16)
- Pre-process : Z-score normalization
- Data augment : Rotation, Shift, Shear, Zoom, HorizontalFlip, Mixup(alpha=0.2)
- Learning rate :
- Initial learning rate : 0.1
- Learning rate decay : Hyperbolic-Tangent Decay (-6,3)
- WarmingUp
- Weight decay : 0.0001
- Weight initial : he_normal
- Activation : replace relu with swish
- Dropout : 0.1
- Optimizer : SGDM with nesterov
- Label smoothing : 0.1
- Gradient clipping
Copyright (c) 2020 ZZH