Keeping a track of research papers I've read.
... Guilty of not pushing the recently read on the Top!
Keys -> || ✅ : Done reading || 📖 : In progress || 🚫 : Dropped ||
Paper Name | Notes | Link | Year | ||
---|---|---|---|---|---|
1 | ✅ | WaveNet: A Generative Model for Raw Audio | notes | arxiv | 2016 |
Causal conv. layers with dilation. Autoregression model. Sequential inference | |||||
2 | ✅ | Fast Wavenet Generation Algorithm | notes | arxiv | 2016 |
WaveNet improvement. O(2 L) -> O(L). Still sequential though. Use queues to push & pop already computed states at each layer |
|||||
3 | ✅ | Parallel WaveNet: Fast High-Fidelity Speech Synthesis | arxiv | 2017 | |
⬇️ | Probability Density Distillation - Teacher + student based architecture. Marries efficient training of Wavenet with efficient IAF for sampling. Sampling is parallel here for realtime synthesis. ✔️medium - An Explanation of Discretized Logistic Mixture Likelihood ✔️ vimeo - Parallel WaveNet |
||||
4 | 📕 | Improved Variational Inference with Inverse Autoregressive Flow | arxiv | 2016 | |
⬇️ | ⭐ ✔️ Introduction to Normalizing Flows (ECCV2020 Tutorial) | video | |||
5 | ✅ | Deep Unsupervised Learning UC Berkeley lectures | course | ||
✔️ L1 - Introduction -> Types: 1. Generative models 2. Self-supervised models | 01:10:00 | ||||
✔️ L2 - Autoregressive Models -> histogram. parameterized distribution. 1.)RNN based 2.)Masking based. 2.1)MADE 2.2)Masked ConvNets | 02:27:23 | ||||
✔️ L3 - Flow Models -> Model output != p_theta(x); instead z=f_theta(x). z comes from a prob dist. Sampling is inverse of f_inverse_theta(x). -> Autoregressive Flows:- Fast training; Slow sampling -> Inverse Autoregressive Flow:- Slow training; Fast Sampling |
01:56:53 | ||||
6 | ✅ | ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech | arxiv | 2018 | |
- | - | -------------------- | --- | --- | -- |
7 | ✅ | Deep Photo Enhancer: unpaired learning for Image Enhancement using GANS | CPVR | arxiv | 2018 |
Cycle gan extension; individual BN for x->y' & x'->y''; adaptive weighting for WGAN | |||||
8 | ✅ | AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE | Google Brain | arxiv | ICRL, 2021 |
Vision Transformer (ViT) - sequence of img patches to Transformer. Less computation than ResNets. Training on large data trumps inductive bias in CNNs and outperforms. |