Skip to content

Deep Learning Papers (scientific literature)

Thomas Lidy edited this page Dec 17, 2018 · 2 revisions

Basic papers on deep learning by G. Hinton (http://www.cs.toronto.edu/~hinton/)

(a paragraph from a paper citing several fundamental deep learning papers:)
Deep Learning Networks (e.g., J. Weng et al. IJCNN 1992, Y. LeCun et al. Proceedings of IEEE 1998, G. Hinton et al. NIPS 2012) are not only biologically implausible but also functionally weak. The brain uses a rich network of processing areas (e.g., Felleman & Van Essen, Cerebral Cortex 1991) where connections are almost always two-way (J. Weng, Natural and Artificial Intelligence, 2012), not a cascade of modules as in the Deep Learning Networks. Such a Deep Learning Network is not able to conduct top-down attention in a cluttered scene (e.g., attention to location or type in J. Weng, Natural and Artificial Intelligence, 2012 or attention to more complex object shape as reported in L. B. Smith et al. Developmental Science 2005).

Semi-Supervised Learning with Deep Generative Models (http://arxiv.org/abs/1406.5298) Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling. In: Proceedings of Neural Information Processing Systems (NIPS) 2014

ImageNet Classification with Deep Convolutional Neural Networks (http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) (Hinton et al.)

Neural Turing Machines (http://arxiv.org/pdf/1410.5401v2.pdf) Google DeepMind - Dec 2014

Visualizing and Understanding Convolutional Networks (http://arxiv.org/abs/1311.2901)

Optimizations

Neural Networks with Few Multiplications (http://arxiv.org/pdf/1510.03009v1.pdf?imm_mid=0dc617&cmp=em-data-na-na-newsltr_20151118) (Integer multiplication) - Bengio and others

"First we stochastically binarize weights to convert multiplications involved in computing hidden states to sign changes. Second, while back-propagating error derivatives, in addition to binarizing the weights, we quantize the representations at each layer to convert the remaining multiplications into binary shifts. Experimental results across 3 popular datasets (MNIST, CIFAR10, SVHN) show that this approach not only does not hurt classification performance but can result in even better performance than standard stochastic gradient descent training, paving the way to fast, hardware- friendly training of neural networks.“

Triplet Loss

Triplet Loss and Online Triplet Mining in TensorFlow https://omoindrot.github.io/triplet-loss