Neural Network Quantization & Compact Networks Design Awesome Papers!

1. Overall Presentation

Naver techtalk : "https://www.slideshare.net/mobile/NaverEngineering/designing-more-efficient-convolution-neural-network-122869307"
Song Han. "Accelerating inference at the edge" : https://www.hotchips.org/hc30/0tutorials/T2_Part_2_Song_Hanv3.pdf

0. Binarized Neural Network & Its Training

Courbariaux, Matthieu, et al. “Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1.” arXiv preprint arXiv:1602.02830 (2016).
Rastegari, Mohammad, et al. "Xnor-net: Imagenet classification using binary convolutional neural networks." European Conference on Computer Vision. Springer, Cham, 2016.
Darabi, Sajad, et al. “BNN+: Improved binary network training.” arXiv preprint arXiv:1812.11800 (2018).
Galloway, Angus, Graham W. Taylor, and Medhat Moussa. “Attacking binarized neural networks.” arXiv preprint arXiv:1711.00449 (2017).
Zhou, Shuchang, et al. “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients.” arXiv preprint arXiv:1606.06160 (2016).
Wang, Ziwei, et al. “Learning Channel-Wise Interactions for Binary Convolutional Neural Networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
Hou, Lu, Quanming Yao, and James T. Kwok. "Loss-aware binarization of deep networks." Proceedings of the Fifth International Conference on Learning Representations (ICLR), Toulon, France, Apr 2017.

1. Hardware for BNN (Edge device)

Yonekawa, Haruyoshi, and Hiroki Nakahara. “On-chip memory based binarized convolutional deep neural network applying batch normalization free technique on an fpga.” 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2017.
Umuroglu, Yaman, et al. “Finn: A framework for fast, scalable binarized neural network inference.” Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2017.

2. Multi-bit Quantization

Jung, Sangil, et al. “Learning to quantize deep networks by optimizing quantization intervals with task loss.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
Chen Xu, Jianqiang Yao, et al. “Alternating Multi-bit Quantization for Recurrent Neural Networks” International Conference on Learning Representations (ICLR), 2018
Zhang, Dongqing, et al. “Lq-nets: Learned quantization for highly accurate and compact deep neural networks.” Proceedings of the European Conference on Computer Vision (ECCV). 2018.
Lee, Dongsoo, Parichay Kapoor, and Byeongwook Kim. “Deeptwist: Learning model compression via occasional weight distortion.” arXiv preprint arXiv:1810.12823 (2018).
Hou, Lu, and James T. Kwok. "Loss-aware weight quantization of deep networks." Proceedings of the Sixth International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, Apr 2018.
Lu Hou, Ruiliang Zhang, James T. Kwok. "Analysis of Quantized Models" Proceedings of the Seventh International Conference on Learning Representations (ICLR), New Orleans, USA, May 2019.

3. Neural Network Model Compression

Quantization

Kuan Wang, Song Han, et al. "HAQ: Hardware-Aware Automated Quantization with Mixed Precision" IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019)
Jacob, Benoit, et al. "Quantization and training of neural networks for efficient integer-arithmetic-only inference." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Zhu, Chenzhuo, et al. “Trained ternary quantization.” arXiv preprint arXiv:1612.01064 (2016).

Distillation

Elliot J. Crowley, et al. "Moonshine: Distilling with Cheap Convolutions" arXiv preprint arXiv:1711.02613 (2019)
Polino, Antonio, Razvan Pascanu, and Dan Alistarh. “Model compression via distillation and quantization.” arXiv preprint arXiv:1802.05668 (2018).

Prunning

Song Han, Huizi Mao, William J. Dally. "Deep Compression :Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"
Zhuang, Zhuangwei, et al. “Discrimination-aware channel pruning for deep neural networks.” Advances in Neural Information Processing Systems. 2018.

Hashing

Han Zhu, Mingsheng Long, et al. "Deep Hashing Network for Efficient Similarity Retrieval"
Dayan Wu, Qi Dai, et al. "Deep Incremental Hashing Network for Efficient Image Retrieval"

Automl

He, Yihui, et al. “Amc: Automl for model compression and acceleration on mobile devices.” Proceedings of the European Conference on Computer Vision (ECCV). 2018.

4. Efficient Neural Network Design

Residual Connection

Mark Sandler, Andrew Howard, et al. "MobileNetV2: Inverted Residuals and Linear Bottlenecks"
Andreas Veit, Michael Wilber, et al. "Residual Networks Behave LIke Ensembles of Relatively Shallow Networks"

Dilated Convolution

Junho Yim1 Donggyu Joo1 Jihoon Bae2 Junmo Kim1 "A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning"

Point-wise Convolution

Iandola, Forrest N., et al. “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size.” arXiv preprint arXiv:1602.07360 (2016).
Gholami, Amir, et al. “Squeezenext: Hardware-aware neural network design.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018.
Howard, Andrew G., et al. “Mobilenets: Efficient convolutional neural networks for mobile vision applications.” arXiv preprint arXiv:1704.04861 (2017).

Grouped Convolution

Xiangyu Zhang, Xinyu Zhou, et al. "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices"

Dense Convolution

Gao Huang, Zhuang Liu, et al. "Densely Connected Convolutional Networks", CVPR (2017)

Depth-wise (Seperable) Convolution

François Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions" arXiv preprint arXiv:1610.02357
(Mobilenets)

(Shift)

Weijie Chen, Di Xie, et al. "All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification" CVPR (2019)
Bichen Wu, Alvin Wan, et al. "Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions" arXiv preprint arXiv:1711.08141 (2017)

Compound Scaling

Tan, Mingxing, and Quoc V. Le. “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.” arXiv preprint arXiv:1905.11946 (2019). + EfficientNet-EdgeTPU: Creating Accelerator-Optimized Neural Networks with AutoML (https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html)

4. Hardware for Quantized Neural Network

Lee, Jinmook, et al. “Unpu: An energy-efficient deep neural network accelerator with fully variable weight bit precision.” IEEE Journal of Solid-State Circuits 54.1 (2018): 173-185.
Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149 (2015).
Yu, Jiecao, et al. "Scalpel: Customizing dnn pruning to the underlying hardware parallelism." ACM SIGARCH Computer Architecture News 45.2 (2017): 548-560.

5. Hardware awareness for Accelerating

EIE [Han, 2016] "EIE: Efficient Inference Engine on Compressed Deep Neural Network"
ESE [Han, 2017] "ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA"
SCNN [Parashar, 2017] "SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks"
DLA [NVIDIA, 2017] "DLA" : https://www.hotchips.org/hc30/2conf/2.08_NVidia_DLA_Nvidia_DLA_HotChips_10Aug18.pdf

Computation Specialization

Clement Farabet, Berin Martini et al. "NeuFlow: A Runtime Reconfigurable Dataflow Processor for Vision"
Norman P. Jouppi, Cliff Young, Google et al. "In-Datacenter Performance Analysis of a Tensor Processing Unit"
A Neural Network Accelerator Exploiting Both Inter-and Intra-Neuron Parallelism

Memory centric Specialization

Diannao Family [Chen, 2014-2016]
Eyeriss [Chen, 2016]

Special CNN

Kiseok Kwon. "Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications" (2018)
DeePhi’s DPU-v2
Baohua Sun, Lin Yang, "Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications"
ShiftNet Accelerator [Wu, 2018]

6. Compiler Optimization

DLVM- A modern compiler framework for neural network DSLs
DLVM- A MODERN COMPILER INFRASTRUCTURE FOR DEEP LEARNING SYSTEMS
A modern compiler infrastructure for deep learning systems with adjoint code generation in a domain-specific IR
Compiling machine learning programs via high-level tracing
TVM- An Automated End-to-End Optimizing Compiler for Deep Learning
Halide- A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines
Glow: Graph Lowering Compiler Techniques for Neural Networks
TensorFlow XLA compiler and the NNVM compiler (논문 아님)
Cambricon: An Instruction Set Architecture for Neural Networks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AwesomePapers.md

AwesomePapers.md

Neural Network Quantization & Compact Networks Design Awesome Papers!

Files

AwesomePapers.md

Latest commit

History

AwesomePapers.md

File metadata and controls

Neural Network Quantization & Compact Networks Design Awesome Papers!