Rethinking Normalization Technique.
The typically classical works are shown as follows,
- ECCV 2018
- Wu, Yuxin and He, Kaiming
- Pytorch has already implemented.
Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems — BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation.
This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform its BNbased counterparts for object detection and segmentation in COCO,1 and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries.
@article{wu2018group, title={Group normalization}, author={Wu, Yuxin and He, Kaiming}, journal={arXiv preprint arXiv:1803.08494}, year={2018} }
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
- CVPR 2018
- The source code and network models will be available at
Global covariance pooling in convolutional neural networks has achieved impressive improvement over the classical first-order pooling. Recent works have shown matrix square root normalization plays a central role in achieving state-of-the-art performance. However, existing methods depend heavily on eigendecomposition (EIG) or singular value decomposition (SVD), suffering from inefficient training due to limited support of EIG and SVD on GPU. Towards addressing this problem, we propose an iterative matrix square root normalization method for fast end-toend training of global covariance pooling networks.
At the core of our method is a meta-layer designed with loopembedded directed graph structure. The meta-layer consists of three consecutive nonlinear structured layers, which perform pre-normalization, coupled matrix iteration and post-compensation, respectively. Our method is much faster than EIG or SVD based ones, since it involves only matrix multiplications, suitable for parallel implementation on GPU.
Moreover, the proposed network with ResNet architecture can converge in much less epochs, further accelerating network training. On large-scale ImageNet, we achieve competitive performance superior to existing counterparts. By finetuning our models pre-trained on ImageNet, we establish state-of-the-art results on three challenging finegrained benchmarks.
@inproceedings{li2018towards, title={Towards faster training of global covariance pooling networks by iterative matrix square root normalization}, author={Li, Peihua and Xie, Jiangtao and Wang, Qilong and Gao, Zilin}, booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, pages={947--955}, year={2018} }
- ICLR 2019 (open review)
- unofficial
Several first order stochastic optimization methods commonly used in the Euclidean domain such as stochastic gradient descent (SGD), accelerated gradient descent or variance reduced methods have already been adapted to certain Riemannian settings. However, some of the most popular of these optimization tools − namely ADAM, ADAGRAD and the more recent AMSGRAD − remain to be generalized to Riemannian manifolds. We discuss the difficulty of generalizing such adaptive schemes to the most agnostic Riemannian setting, and then provide algorithms and convergence proofs for geodesically convex objectives in the particular case of a product of Riemannian manifolds, in which adaptivity is implemented across manifolds in the cartesian product. Our generalization is tight in the sense that choosing the Euclidean space as Riemannian manifold yields the same algorithms and regret bounds as those that were already known for the standard algorithms.
Experimentally, we show faster convergence and to a lower train loss value for Riemannian adaptive methods over their corresponding baselines on the realistic task of embedding the WordNet taxonomy in the Poincare ball.
- CVPR 2018
- Lei Huang, Xianglong Liu, Bo Lang, Adams Wei Yu, Yongliang Wang, Bo Li
Batch Normalization (BN) is capable of accelerating the training of deep models by centering and scaling activations within mini-batches. In this work, we propose Decorre- lated Batch Normalization (DBN), which not just centers and scales activations but whitens them.
We explore multiple whitening techniques, and find that PCA whitening causes a problem we call stochastic axis swapping, which is detrimen- tal to learning. We show that ZCA whitening does not suffer from this problem, permitting successful learning. DBN re- tains the desirable qualities of BN and further improves BN’s optimization efficiency and generalization ability.
We design comprehensive experiments to show that DBN can improve the performance of BN on multilayer perceptrons and con- volutional neural networks. Furthermore, we consistently improve the accuracy of residual networks on CIFAR-10, CIFAR-100, and ImageNet.
@misc{1804.08450, Author = {Lei Huang and Dawei Yang and Bo Lang and Jia Deng}, Title = {Decorrelated Batch Normalization}, Year = {2018}, Eprint = {arXiv:1804.08450}, }
Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks
- AAAI 2018
- Huang, Lei and Liu, Xianglong and Lang, Bo and Yu, Adams Wei and Wang, Yongliang and Li, Bo
Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs.
In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feed-forward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM).
We show that the rectangular orthogonal matrix can stabilize the distribution of network activations and regularize FNNs. We also propose a novel orthogonal weight normalization method to solve OMDSM.
Particularly, it constructs orthogonal transformation over proxy parameters to ensure the weight matrix is orthogonal and back-propagates gradient information through the transformation during training.
To guarantee stability, we minimize the distortions between proxy parameters and canonical weights over all tractable orthogonal transformations. In addition, we design an orthogonal linear module (OLM) to learn orthogonal filter banks in practice, which can be used as an alternative to standard linear module. Extensive experiments demonstrate that by simply substituting OLM for standard linear module without revising any experimental protocols, our method largely improves the performance of the state-of-the-art networks, including Inception and residual networks on CIFAR and ImageNet datasets.
@inproceedings{huang2018orthogonal, title={Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks}, author={Huang, Lei and Liu, Xianglong and Lang, Bo and Yu, Adams Wei and Wang, Yongliang and Li, Bo}, booktitle={Thirty-Second AAAI Conference on Artificial Intelligence}, year={2018} }
- ECCV 2018
- Pan, Xingang and Luo, Ping and Shi, Jianping and Tang, Xiaoou
Convolutional neural networks (CNNs) have achieved great successes in many computer vision problems. Unlike existing works that designed CNN architectures to improve performance on a single task of a single domain and not generalizable, we present IBN-Net, a novel convolutional architecture, which remarkably enhances a CNN’s modeling ability on one domain (e.g. Cityscapes) as well as its generalization capacity on another domain (e.g. GTA5) without finetuning. IBN-Net carefully integrates Instance Normalization (IN) and Batch Normalization (BN) as building blocks, and can be wrapped into many advanced deep networks to improve their performances.
This work has three key contributions. (1) By delving into IN and BN, we disclose that IN learns features that are invariant to appearance changes, such as colors, styles, and virtuality/reality, while BN is essential for preserving content related information. (2) IBN-Net can be applied to many advanced deep architectures, such as DenseNet, ResNet, ResNeXt, and SENet, and consistently improve their performance without increasing computational cost. 1 (3) When applying the trained networks to new domains, e.g. from GTA5 to Cityscapes, IBN-Net achieves comparable improvements as domain adaptation methods, even without using data from the target domain. With IBN-Net, we won the 1st place on the WAD 2018 Challenge Drivable Area track, with an mIoU of 86.18%.
@inproceedings{pan2018two, title={Two at once: Enhancing learning and generalization capacities via ibn-net}, author={Pan, Xingang and Luo, Ping and Shi, Jianping and Tang, Xiaoou}, booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, pages={464--479}, year={2018} }
In this page, we provide a list of related work which has been used and cited in the ECCV2018 oral talk.
