Skip to content

Latest commit

 

History

History
270 lines (160 loc) · 13.8 KB

README.md

File metadata and controls

270 lines (160 loc) · 13.8 KB

(Deep) Machine Learning

A collection of (deep) machine learning papers, tutorial, datasets and projects.

Other paper/link collections

NeuroIPS

The incredible Pytorch

Over 200 of the Best Machine Learning, NLP, and Python Tutorials

Projects

Unsupervised mapping of bird sounds: T-SNE applied to a large set of bird sounds to visualize the similarity/dissimilarity of bird songs. Beautiful visualization that is great fun to play with.

Quickdraw: Recognition of quickly drawn sketches of things.

Datasets

Quickdraw: A collection of 50 million drawings across 345 categories, contributed by players of the game Quick, Draw. Also some fascinating analytics of the dataset.

Talks

Overview / Critical reviews

Video tutorials

Tutorials

Machine Learning: Basic Principles

Deep Learning - Berkely

Tensorflow

More Tensorflow

Reinforcement Learning1 Reinforcement Learning2 Reinforcement Learning3

Blogs

Rules of Machine Learning: Best Practices for ML Engineering

Learning improvements

Revisiting ResNets: Improved Training and Scaling Strategies: Training and scaling strategies may matter more than architectural changes.

The Marginal Value of Adaptive Gradient Methods in Machine Learning: Adaptive Gradient Methods such as Adam converge faster and might even achieve better training error but have worse test error than Stochastic Gradient Decent.

Faster Convergence & Generalization in DNNs: An SVM based training on mini batches that reduces the number of epochs by several magnitudes. Effect on training time is unclear since SVM training is performed on CPU though there are GPU-based implementations. Learned networks are shown to be more robust to adversarial noise and over-fitting.

All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation: The authors propose a regularizer variant that allows to train very deep networks without the need for residual (shortcuts/identity mappings) connections.

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks: The authors demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more by using an appropriate initialization scheme. Implementation is available for tensorflow and PyTorch.

Understanding Batch Normalization: An analysis of batch normalization that reveals that batch normalization has a regularizing effect that improves generalization of normalized networks. Activations become large and the convolutional channels become increasingly ill-behaved for layers deep in unnormalized networks.

Recent Advances in Convolutional Neural Network Acceleration: A nice review of methods to accelerate CNN training and inference (including hardware approacjes).

Reducing Parameter Space for Neural Network Training: Authors show that limiting the network weights to be on a hypersphere leads to better result for some small regression problems and is less sensitive to the initial weight initialization.

Do Deep Nets Really Need to be Deep?: A very interesting paper demonstrating that shallow networks can perform as well as deep networks - rasing the question, whether DL architectures are actually necessary.

Augmentation

mixup: Beyond Empirical Risk Minimization: Linear interpolation with random factor between samples in a batch. Interpolation is on input AND output data, which requires that data is numerical, e.g. images and one-hot-encoded labels and that loss function can handle no-binary labels. Easy to implement, paper shows good results. Reviewer comments are interesting as well. Supported in nuts-ml

Manifold Mixup: Encouraging Meaningful On-Manifold Interpolation as a Regularizer: An improved version of mixup where mixup performed not only in input space but also on the network internal layer outputs.

Regularization

Regularization and Optimization strategies in Deep Convolutional Neural Network: A nice summary of regularization techniques and deep learning in general.

Concrete Dropout: Automatic tuning of the dropout probability using gradient methods.

Segmentation

Dense Transformer Networks: Automatic learning of patch sizes and shapes in contrast to fixed, rectangular pixel centered patches for segmentation. Achieves better segmentation.

Unsupervised

Look, Listen and Learn: Learning from unlabelled video and audio data.

Semi-Supervised

Improved Techniques for Training GANs: Very good results for semi-supervised training on MNIST, CIFAR-10 and SVHN datasets.

Variational Autoencoders

Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks: New training procedure for Variational Autoencoders based on adversarial training.

GANs

Improved Techniques for Training GANs: Very good results for semi-supervised training on MNIST, CIFAR-10 and SVHN datasets.

Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks: New training procedure for Variational Autoencoders based on adversarial training.

Reinforcement Learning

Thinking Fast and Slow with Deep Learning and Tree Search: Decomposes the problem into separate planning and generalisation tasks and shows better performance than Policy Gradients.

Architecture search

Neural Architecture Search: A Survey: As the title says: a review of methods to automatically determine the structure of a neural network.

DARTS: Differentiable Architecture Search: Search for network architectures using gradient decent. Considerably faster and simpler than other methods.

Evolving simple programs for playing Atari games: Search for network architectures/image processing functions using Cartesian Genetic Programming.

Understanding Networks / Visualization

Real Time Image Saliency for Black Box Classifiers: Fast saliency detection method that can be applied to any differentiable image classifier.

Learning Deep Features for Discriminative Localization: Shows how utilize a Global Average Pooling layer to compute so called Class Activation Maps (CAM) that allow to identify regions of the input image that are important for the classification result.

Grad-CAM: Why did you say that?: An extension of the above paper that enables the computation of Class Activation Maps (CAM) for arbitrary entwork architectures.

RISE: Randomized Input Sampling for Explanation of Black-box Models: A black-box approach that uses randomly occluded images to create saliency maps. More accurate than Grad-CAM, does not require a specific network architecture and creates high-resolution maps.

t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data: A CUDA implementation of t-sne that is substantially faster than other implementation and for instance allows to visualize the entire ImageNet data set.

Identifying Weights and Architectures of Unknown ReLU Networks: An extremely cool paper showing that it is possible to reconstruct the architecture, weights, and biases of a deep ReLU network given the ability to query the network.

Tensor Factorization

A general model for robust tensor factorization with unknown noise: Impressive de-nosing of images using tensor factorization.

Canonical Tensor Decomposition for Knowledge Base Completion: A nice comparison of tensor-based methods for Knowledge Base Completion demonstrating that CP performs as well as others provided parameters are chosen carefully.

Neuro-symbolic computing

Comination/integration of symbolic reasoning/representation with sub-symbolic/distributed representations.

Neural-Symbolic Learning and Reasoning: A Survey and Interpretation: How to integrate low-level, sub-symbolic neural network learning and high-level, symbolic reasoning.

Visual Question Answering

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding: A nice introduction and reference of VQA approaches. Introducing a novel method (easy to understand but less "organic") with excellent results (99.8%) on CLEVR.

FiLM: Visual Reasoning with a General Conditioning Layer: Highly accurate (on CLEVR) and simple model (FiLM).

How clever is the FiLM model, and how clever can it be?: An analysis of the FiLM model for VQA.

A simple neural network module for relational reasoning: A beautifully simple network architecture for relational learning and VQ answering but less accurate than FiLM.

Inferring and executing programs for visual reasoning: Uses LSTMs to creates programs from questions to perform symbolic reasoning on scene (CLEVR benchmark).

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning: Introducing the CLEVR benchmark. A piece of art!

An Analysis of Visual Question Answering Algorithms: A useful overview over VQ datasets (missing CLEVR, however)

Explainable Neural Computation via Stack Neural Module Networks: Neural module networks for visual question answering.

Relational Learning

A Review of Relational Machine Learning for Knowledge Graphs: A very nice review of Relational Learning using tensor factorization and neural network approaches (e.g. RESCAL).

A Three-Way Model for Collective Learning on Multi-Relational Data: Tensor-based Relational Learning introducing the simple but effective RESCAL algorithm.

Holographic Embeddings of Knowledge Graphs: An improvement of RESCAL that leads to a model with considerably less parameters and better accuracy.

Embedding Entities and Relations for Learning and Inference in Knowledge Bases: A very nice comparison of embedding based relational learning algorithms.

A simple neural network module for relational reasoning: A beautifully simple network architecture for relational learning and VQ answering.

FiLM: Visual Reasoning with a General Conditioning Layer: A slighly older more complex

A Semantic Matching Energy Function for Learning with Multi-relational Data: Relational Network learns relationship embeddings and combines entity embeddings and relationship embeddings separately first before combining the results.

A latent factor model for highly multi-relational data: An improved method for relational learning with performance better than RESCAL and SME, which scales well for large numbers of relationships.