Skip to content

Latest commit

 

History

History
706 lines (478 loc) · 49 KB

icml-2019.md

File metadata and controls

706 lines (478 loc) · 49 KB

RL paper in ICML 2019

Papers: Value Function Diagnosing Bottlenecks in Deep Q-learning Algorithms Justin Fu (University of California, Berkeley) · Aviral Kumar (University of California Berkeley) · Matthew Soh (UC Berkeley) · Sergey Levine (Berkeley)

The Value Function Polytope in Reinforcement Learning Robert Dadashi (Google AI Residency Program) · Marc Bellemare (Google Brain) · Adrien Ali Taiga (Université de Montréal) · Nicolas Le Roux (Google) · Dale Schuurmans (Google / University of Alberta)

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland (DeepMind) · Robert Dadashi (Google AI Residency Program) · Saurabh Kumar (Google) · Remi Munos (DeepMind) · Marc Bellemare (Google Brain) · Will Dabney (DeepMind)

Nonlinear Distributional Gradient Temporal-Difference Learning chao qu (Ant Financial Service Group) · Shie Mannor (Technion) · Huan Xu (Georgia Tech)

Sample-Optimal Parametric Q-Learning with Linear Transition Models Lin Yang (Princeton) · Mengdi Wang (Princeton University)

Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models Michael Oberst (MIT) · David Sontag (Massachusetts Institute of Technology)

Composing Value Functions in Reinforcement Learning Benjamin van Niekerk (University of the Witwatersrand) · Steven James (University of the Witwatersrand) · Adam Earle (University of the Witwatersrand) · Benjamin Rosman (Council for Scientific and Industrial Research)

Making Deep Q-learning methods robust to time discretization Corentin Tallec (Univ. Paris-Sud) · Leonard Blier (Université Paris Sud and Facebook) · Yann Ollivier (Facebook Artificial Intelligence Research)

Sample-Optimal Parametric Q-Learning Using Linearly Additive Features Lin Yang (Princeton) · Mengdi Wang (Princeton University)

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds Andrea Zanette (Stanford University) · Emma Brunskill (Stanford University)

Revisiting the Softmax Bellman Operator: New Benefits and New Perspective Zhao Song (Baidu Research) · Ron Parr (Duke University) · Lawrence Carin (Duke University)

Information-Theoretic Considerations in Batch Reinforcement Learning Jinglin Chen (University of Illinois at Urbana-Champaign) · Nan Jiang (University of Illinois at Urbana-Champaign)

Dynamic Weights in Multi-Objective Deep Reinforcement Learning Axel Abels (Université Libre de Bruxelles) · Diederik Roijers (VUB) · Tom Lenaerts (Vrije Universiteit Brussel) · Ann Nowé (Vrije Universiteit Brussel) · Denis Steckelmacher (Vrije Universiteit Brussel)

Papers: Policy Understanding the Impact of Entropy on Policy Optimization

Zafarali Ahmed (Mila — McGill University) · Nicolas Le Roux (Google) · Mohammad Norouzi (Google Brain) · Dale Schuurmans (Google / University of Alberta)

Policy Certificates: Towards Accountable Reinforcement Learning Christoph Dann (Carnegie Mellon University) · Lihong Li (Google Inc.) · Wei Wei (Google) · Emma Brunskill (Stanford University)

Quantifying Generalization in Reinforcement Learning Karl Cobbe (OpenAI) · Oleg Klimov (OpenAI) · Chris Hesse (OpenAI) · Taehoon Kim (OpenAI) · John Schulman (OpenAI)

Off-Policy Deep Reinforcement Learning without Exploration Scott Fujimoto (McGill University) · David Meger (McGill University) · Doina Precup (McGill University / DeepMind)

Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning Casey Chu (Stanford University) · Jose Blanchet (Stanford University) · Peter Glynn (Stanford University)

POLITEX: Regret Bounds for Policy Iteration using Expert Prediction Nevena Lazic (Google) · Yasin Abbasi-Yadkori (Adobe Research) · Kush Bhatia (UC Berkeley) · Gellért Weisz (DeepMind) · Peter Bartlett (“University of California, Berkeley”) · Csaba Szepesvari (DeepMind/University of Alberta)

Collaborative Evolutionary Reinforcement Learning Shauharda Khadka (Intel AI) · Somdeb Majumdar (Intel AI Lab) · Tarek Nassar (Intel AI Lab) · Zach Dwiel (Intel AI Lab) · Evren Tumer (Intel Corporation) · Santiago Miret (Intel AI Products Group) · Yinyin Liu (Intel AI Lab) · Kagan Tumer (Oregon State University US)

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules Daniel Ho (UC Berkeley) · Eric Liang (UC Berkeley) · Xi Chen (UC Berkeley) · Ion Stoica (UC Berkeley) · Pieter Abbeel (UC Berkeley)

Safe Policy Improvement with Baseline Bootstrapping Romain Laroche (Microsoft Research) · Paul TRICHELAIR (Mila — Quebec AI Institute/McGill University) · Remi Tachet des Combes (Microsoft Research Montreal)

Fingerprint Policy Optimisation for Robust Reinforcement Learning Supratik Paul (University of Oxford) · Michael A Osborne (U Oxford) · Shimon Whiteson (University of Oxford)

Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN dror freirich (Technion) · Tzahi Shimkin (Technion Israeli Institute of Technology) · Ron Meir (Technion Israeli Institute of Technology) · Aviv Tamar (Technion Israeli Institute of Technology)

Predictor-Corrector Policy Optimization Ching-An Cheng (Georgia Tech) · Xinyan Yan (Georgia Tech) · Nathan Ratliff (NVIDIA) · Byron Boots (Georgia Tech)

Optimistic Policy Optimization via Multiple Importance Sampling Matteo Papini (Politecnico di Milano) · Alberto Maria Metelli (Politecnico di Milano) · Lorenzo Lupo (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)

Projections for Approximate Policy Iteration Algorithms Riad Akrour (TU Darmstadt) · Joni Pajarinen (TU Darmstadt) · Jan Peters (TU Darmstadt + Max Planck Institute for Intelligent Systems) · Gerhard Neumann (University of Lincoln)

Transfer of Samples in Policy Search via Multiple Importance Sampling Andrea Tirinzoni (Politecnico di Milano) · Mattia Salvini (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)

Hessian Aided Policy Gradient Zebang Shen (Zhejiang University) · Alejandro Ribeiro (University of Pennsylvania) · Hamed Hassani (University of Pennsylvania) · Hui Qian (Zhejiang University) · Chao Mi (Zhejiang University)

Policy Consolidation for Continual Reinforcement Learning Christos Kaplanis (Imperial College London) · Murray Shanahan (DeepMind / Imperial College London) · Claudia Clopath (Imperial College London)

Importance Sampling Policy Evaluation with an Estimated Behavior Policy Josiah Hanna (UT Austin) · Scott Niekum (University of Texas at Austin) · Peter Stone (University of Texas at Austin)

Trajectory-Based Off-Policy Deep Reinforcement Learning Andreas Doerr (Bosch Center for Artificial Intelligence, Max Planck Institute for Intelligent Systems) · Michael Volpp (Bosch Center for AI) · Marc Toussaint (University Stuttgart) · Sebastian Trimpe (Max Planck Institute for Intelligent Systems) · Christian Daniel (Bosch Center for Artificial Intelligence)

CAB: Continuous Adaptive Blending for Policy Evaluation and Learning Yi Su (Cornell University) · Lequn Wang (Cornell University) · Michele Santacatterina (TRIPODS Center of Data Science — Cornell University) · Thorsten Joachims (Cornell)

More Efficient Policy Value Evaluation through Regularized Targeted Learning Aurelien Bibaut (UC Berkeley) · Ivana Malenica (UC Berkeley) · Nikos Vlassis (Netflix) · Mark van der Laan (UC Berkeley)

Learning Novel Policies For Tasks Yunbo Zhang (Georgia Institute of Technology) · Wenhao Yu (Georgia Institute of Technology) · Greg Turk (Georgia Institute of Technology)

Remember and Forget for Experience Replay Guido Novati (ETH Zurich) · Petros Koumoutsakos (ETH Zurich)

Online Control with Adversarial Disturbances Naman Agarwal (Google AI Princeton) · Brian Bullins (Princeton University) · Elad Hazan (Google Brain and Princeton University) · Sham Kakade (University of Washington) · Karan Singh (Princeton University)

Action Robust Reinforcement Learning and Applications in Continuous Control Chen Tessler (Technion) · Yonathan Efroni (Technion) · Shie Mannor (Technion)

Control Regularization for Reduced Variance Reinforcement Learning Richard Cheng (California Institute of Technology) · Abhinav Verma (Rice University) · Gabor Orosz (University of Michigan) · Swarat Chaudhuri (Rice University) · Yisong Yue (Caltech) · Joel Burdick (Caltech)

Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning Seungyul Han (KAIST) · Youngchul Sung (KAIST)

Kernel-Based Reinforcement Learning in Robust Markov Decision Processes Shiau Hong Lim (IBM Research) · Arnaud Autef (Ecole Polytechnique)

A Theory of Regularized Markov Decision Processes Matthieu Geist (Google) · Bruno Scherrer (INRIA) · Olivier Pietquin (GOOGLE BRAIN)

Online Convex Optimization in Adversarial Markov Decision Processes Aviv Rosenberg (Tell Aviv University) · Yishay Mansour (Google and Tel Aviv University)

Batch Policy Learning under Constraints Hoang Le (Caltech) · Cameron Voloshin (Caltech) · Yisong Yue (Caltech)

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning Rui Zhao (Siemens & Ludwig Maximilian University of Munich) · Xudong Sun (Ludwig Maximilian University of Munich) · Volker Tresp (Siemens AG and University of Munich)

Reinforcement Learning in Configurable Continuous Environments Alberto Maria Metelli (Politecnico di Milano) · Emanuele Ghelfi (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)

On the Generalization Gap in Reparameterizable Reinforcement Learning Huan Wang (Salesforce Research) · Stephan Zheng (Salesforce Research) · Caiming Xiong (Salesforce) · Richard Socher (Salesforce)

Papers: Reward Provably Efficient Imitation Learning from Observation Alone Wen Sun (Carnegie Mellon University) · Anirudh Vemula (CMU) · Byron Boots (Georgia Tech) · Drew Bagnell (Carnegie Mellon University)

Imitating Latent Policies from Observation Ashley Edwards (Georgia Institute of Technology) · Himanshu Sahni (Georgia Institute of Technology) · Yannick Schroecker (Georgia Institute of Technology) · Charles Isbell (Georgia Institute of Technology)

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations Daniel Brown (University of Texas at Austin) · Wonjoon Goo (University of Texas at Austin) · Prabhat Nagarajan (Preferred Networks) · Scott Niekum (University of Texas at Austin)

Imitation Learning from Imperfect Demonstration Yueh-Hua Wu (National Taiwan University) · Nontawat Charoenphakdee (The University of Tokyo / RIKEN) · Han Bao (The University of Tokyo / RIKEN) · Voot Tangkaratt (RIKEN AIP) · Masashi Sugiyama (RIKEN / The University of Tokyo)

Papers: Model An investigation of model-free planning Arthur Guez (Google DeepMind) · Mehdi Mirza (DeepMind) · Karol Gregor (DeepMind) · Rishabh Kabra (DeepMind) · Sebastien Racaniere (DeepMind) · Theophane Weber (DeepMind) · David Raposo (DeepMind) · Adam Santoro (DeepMind) · Laurent Orseau (DeepMind) · Tom Eccles (DeepMind) · Greg Wayne (DeepMind) · David Silver (Google DeepMind) · Timothy Lillicrap (Google DeepMind)

Calibrated Model-Based Deep Reinforcement Learning Ali Malik (Stanford Universtiy) · Volodymyr Kuleshov (Stanford University) · Jiaming Song (Stanford) · Danny Nemer (Afresh Technologies) · Harlan Seymour (Afresh Technologies) · Stefano Ermon (Stanford University)

Learning Latent Dynamics for Planning from Pixels Danijar Hafner (Google Brain & University of Toronto) · Timothy Lillicrap (Google DeepMind) · Ian Fischer (Google) · Ruben Villegas (University of Michigan) · David Ha (Google) · Honglak Lee (Google / U. Michigan) · James Davidson (Google Brain)

Papers: Exploration Distribution Reinforcement Learning for Efficient Exploration Borislav Mavrin (University of Alberta) · Hengshuai Yao (Huawei Technologies) · Linglong Kong (University of Alberta) · Kaiwen Wu (University of Waterloo) · Yaoliang Yu (University of Waterloo)

Exploration Conscious Reinforcement Learning Revisited Lior Shani (Technion) · Yonathan Efroni (Technion) · Shie Mannor (Technion)

Dead-ends and Secure Exploration in Reinforcement Learning Mehdi Fatemi (Microsoft Research) · Shikhar Sharma (Microsoft Research) · Harm van Seijen (Microsoft Research) · Samira Ebrahimi Kahou (Microsoft Research)

Learning to Explore via Disagreement Deepak Pathak (UC Berkeley) · Dhiraj Gandhi (Carnegie Mellon University Robotics Institute) · Abhinav Gupta (Carnegie Mellon University)

Model-Based Active Exploration Pranav Shyam (NNAISENSE) · Wojciech Jaskowski (NNAISENSE) · Faustino Gomez (NNAISENSE SA)

Papers: Exploration: Bandits Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback Chicheng Zhang (Microsoft Research) · Alekh Agarwal (Microsoft Research) · Hal Daume (Microsoft Research) · John Langford (Microsoft Research) · Sahand Negahban (YALE)

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits Branislav Kveton (Google Research) · Csaba Szepesvari (DeepMind/University of Alberta) · Sharan Vaswani (Mila, University of Montreal) · Zheng Wen (Adobe Research) · Tor Lattimore (DeepMind) · Mohammad Ghavamzadeh (Facebook AI Research)

Decentralized Exploration in Multi-Armed Bandits Raphael Feraud (Orange Labs) · REDA ALAMI (Orange Labs — Paris Saclay University — INRIA) · Romain Laroche (Microsoft Research)

Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits Martin Zhang (Stanford University) · James Zou (Stanford) · David Tse (Stanford University)

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model Gi-Soo Kim (Seoul National University) · Myunghee Cho Paik (Seoul National University)

Bilinear Bandits with Low-rank Structure Kwang-Sung Jun (Boston University) · Rebecca Willett (U Chicago) · Stephen Wright (University of Wisconsin-Madison) · Robert Nowak (University of Wisconsion-Madison)

Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards Shiyin Lu (Nanjing University) · Guanghui Wang (Nanjing University) · Yao Hu (Alibaba Youku Cognitive and Intelligent Lab) · Lijun Zhang (Nanjing University)

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously Julian Zimmert (University of Copenhagen) · Haipeng Luo (University of Southern California) · Chen-Yu Wei (University of Southern California)

Exploiting structure of uncertainty for efficient combinatorial semi-bandits Pierre Perrault (Inria Lille — Nord Europe) · Vianney Perchet (ENS Paris Saclay & Criteo AI Lab) · Michal Valko (DeepMind)

Correlated bandits or: How to minimize mean-squared error online Vinay Praneeth Boda (LinkedIn Corp.) · Prashanth L.A. (IIT Madras)

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits Arghya Roy Chaudhuri (Indian Institute of Technology Bombay) · Shivaram Kalyanakrishnan (IIT Bombay)

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging Ping-Chun Hsieh (Texas A&M University) · Xi Liu (Texas A&M University) · Anirban Bhattacharya (Texas A&M University) · P R Kumar (Texas A & M University)

Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem Junyu Cao (University of California Berkeley) · Wei Sun (IBM Research)

Data Poisoning Attacks on Stochastic Bandits Fang Liu (The Ohio State University) · Ness Shroff (The Ohio State University)

On the design of estimators for bandit off-policy evaluation Nikos Vlassis (Netflix) · Aurelien Bibaut (UC Berkeley) · Maria Dimakopoulou (Stanford) · Tony Jebara (Netflix)

An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule Touqir Sajed (University of Alberta) · Or Sheffet (University of Alberta)

Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case Alina Beygelzimer (Yahoo Research) · David Pal (Expedia) · Balazs Szorenyi (Yahoo Research) · Devanathan Thiruvenkatachari (New York University) · Chen-Yu Wei (University of Southern California) · Chicheng Zhang (Microsoft Research)

Papers: Representation Learning Action Representations for Reinforcement Learning Yash Chandak (University of Massachusetts Amherst) · Georgios Theocharous (Adobe Research) · James Kostas (UMass Amherst) · Scott Jordan (University of Massachusetts Amherst) · Philip Thomas (University of Massachusetts Amherst)

Provably efficient RL with Rich Observations via Latent State Decoding Simon Du (Carnegie Mellon University) · Akshay Krishnamurthy (Microsoft Research) · Nan Jiang (University of Illinois at Urbana-Champaign) · Alekh Agarwal (Microsoft Research) · Miroslav Dudik (Microsoft Research) · John Langford (Microsoft Research)

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun Du (MIT) · Karthik Narasimhan (Princeton)

The Natural Language of Actions Guy Tennenholtz (Technion) · Shie Mannor (Technion)

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning Marvin Zhang (UC Berkeley) · Sharad Vikram (UCSD) · Laura Smith (UC Berkeley) · Pieter Abbeel (OpenAI / UC Berkeley) · Matthew Johnson (Google Brain) · Sergey Levine (Berkeley)

DeepMDP: Learning Continuous Latent Space Models with Theoretical Guarantees Carles Gelada (Google Brain) · Saurabh Kumar (Google Brain) · Jacob Buckman (Johns Hopkins University) · Ofir Nachum (Google Brain) · Marc Bellemare (Google Brain)

Papers: Hierarchical RL Finding Options that Minimize Planning Time Yuu Jinnai (Brown University) · David Abel (Brown University) · David Hershkowitz (Carnegie Mellon University) · Michael L. Littman (Brown University) · George Konidaris (Brown)

Option Discovery for Solving Sparse Reward Reinforcement Learning Problems Yuu Jinnai (Brown University) · Jee Won Park (Brown University) · David Abel (Brown University) · George Konidaris (Brown)

Per-Decision Option Discounting Anna Harutyunyan (DeepMind) · Peter Vrancx (PROWLER.io) · Philippe Hamel (Deepmind) · Ann Nowe (VU Brussel) · Doina Precup (DeepMind)

Papers: Multi-agent RL Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning Jakob Foerster (Facebook AI Research) · Francis Song (DeepMind) · Edward Hughes (DeepMind) · Neil Burch (DeepMind) · Iain Dunning (DeepMind) · Shimon Whiteson (University of Oxford) · Matthew Botvinick (DeepMind) · Michael Bowling (DeepMind)

Multi-Agent Adversarial Inverse Reinforcement Learning Lantao Yu (Stanford University) · Jiaming Song (Stanford) · Stefano Ermon (Stanford University)

Actor-Attention-Critic for Multi-Agent Reinforcement Learning Shariq Iqbal (University of Southern California) · Fei Sha (University of Southern California)

Learning to Collaborate in Markov Decision Processes Goran Radanovic (Harvard University) · Rati Devidze (Max Planck Institute for Software Systems) · David Parkes (Harvard University) · Adish Singla (Max Planck Institute (MPI-SWS))

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning Natasha Jaques (MIT) · Angeliki Lazaridou (DeepMind) · Edward Hughes (DeepMind) · Caglar Gulcehre (DeepMind) · Pedro Ortega (DeepMind) · DJ Strouse (Princeton University) · Joel Z Leibo (DeepMind) · Nando de Freitas (DeepMind)

TarMAC: Targeted Multi-Agent Communication Abhishek Das (Georgia Tech) · Theophile Gervet (Carnegie Mellon University) · Joshua Romoff (McGill University) · Dhruv Batra (Georgia Institute of Technology / Facebook AI Research) · Devi Parikh (Georgia Tech & Facebook AI Research) · Michael Rabbat (Facebook) · Joelle Pineau (Facebook)

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning Thinh Doan (Georgia Institute of Technology) · Siva Maguluri (Georgia Tech) · Justin Romberg (Georgia Tech)

Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI Lei Han (Tencent AI Lab) · Peng Sun (Tencent AI Lab) · Yali Du (University of Technology Sydney) · Jiechao Xiong (Tencent AI Lab) · Qing Wang () · Xinghai Sun (Tencent AI Lab) · Han Liu (Northwestern) · Tong Zhang (Tecent AI Lab)

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning Kyunghwan Son (KAIST) · Daewoo Kim (KAIST) · Wan Ju Kang (KAIST) · David Earl Hostallero (KAIST) · Yung Yi (KAIST)

A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs Jingkai Mao (Man AHL) · Jakob Foerster (Facebook AI Research) · Tim Rocktäschel (University of Oxford) · Maruan Al-Shedivat (Carnegie Mellon University) · Gregory Farquhar (University of Oxford) · Shimon Whiteson (University of Oxford)

Open-ended learning in zero-sum games David Balduzzi (DeepMind) · Marta Garnelo (DeepMind) · Yoram Bachrach () · Wojciech Czarnecki (DeepMind) · Julien Perolat (DeepMind) · Max Jaderberg (DeepMind) · Thore Graepel (DeepMind)

Papers: Relational RL Neural Logic Reinforcement Learning zhengyao jiang (University of Liverpool) · Shan Luo (University of Liverpool)

Papers: Learning to Learn Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables Kate Rakelly (UC Berkeley) · Aurick Zhou (UC Berkeley) · Chelsea Finn (Stanford, Google, UC Berkeley) · Sergey Levine (Berkeley) · Deirdre Quillen (UC Berkeley)

CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning Cédric Colas (Inria) · Pierre-Yves Oudeyer (Inria) · Olivier Sigaud (Sorbonne University) · Pierre Fournier (UPMC) · Mohamed Chetouani (UPMC)

Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation Shani Gamrian (Bar-Ilan University) · Yoav Goldberg ()

Few-Shot Intent Inference via Meta-Inverse Reinforcement Learning Kelvin Xu (University of California, Berkeley) · Ellis Ratner (University of California, Berkeley) · EECS Anca Dragan (EECS Department, University of California, Berkeley) · Sergey Levine (Berkeley) · Chelsea Finn (Stanford, Google, UC Berkeley)

Taming MAML: Control variates for unbiased meta-reinforcement learning gradient estimation Hao Liu (Salesforce) · Richard Socher (Salesforce) · Caiming Xiong (Salesforce)

TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning Tameem Adel (University of Cambridge) · Adrian Weller (University of Cambridge, Alan Turing Institute)

Papers: Applications ELF OpenGo: an analysis and open reimplementation of AlphaZero Yuandong Tian (Facebook AI Research) · Jerry Ma (Facebook AI Research) · Qucheng Gong (Facebook AI Research) · Shubho Sengupta (Facebook AI Research) · Zhuoyuan Chen (Facebook) · James Pinkerton (Facebook AI Research) · Larry Zitnick (Facebook AI Research)

Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems Timothy Mann (DeepMind) · Sven Gowal (DeepMind) · Huiyi Hu (DeepMind) · Ray Jiang (Google Deepmind) · Balaji Lakshminarayanan (Google DeepMind) · Andras Gyorgy (DeepMind) · Prav Srinivasan (DeepMind)

Dynamic Measurement Scheduling for Event Forecasting using Deep RL Chun-Hao Chang (University of Toronto) · Mingjie Mai (University of Toronto) · Anna Goldenberg (University of Toronto)

Generative Adversarial User Model for Reinforcement Learning Based Recommendation System Xinshi Chen (Georgia Institution of Technology) · Shuang Li (Georgia Tech) · Hui Li (Ant Financial) · Shaohua Jiang (Ant Financial) · Yuan Qi (Ant Financial Services Group) · Le Song (Georgia Institute of Technology)

A Deep Reinforcement Learning Perspective on Internet Congestion Control Nathan Jay (University of Illinois Urbana-Champaign) · Noga H. Rotman (Hebrew University of Jerusalem) · Brighten Godfrey (University of Illinois Urbana-Champaign) · Michael Schapira (Hebrew University of Jerusalem) · Aviv Tamar (Technion Israeli Institute of Technology)

Target Tracking for Contextual Bandits: Application to Demand Side Management Margaux Brégère (CNRS Université Paris-Sud, Inria Paris, EDF R&D) · Pierre Gaillard (INRIA Paris) · Yannig Goude (EDF Lab Paris-Saclay) · Gilles Stoltz (Université paris Sud)

Greedy Sequential Subset Selection via Sequential Facility Location Ehsan Elhamifar (Northeastern University)

Hiring Under Uncertainty Manish Purohit (Google) · Sreenivas Gollapudi (Google Research) · Manish Raghavan (Cornell)

Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments Kirthevasan Kandasamy (Carnegie Mellon University) · Willie Neiswanger (CMU) · Reed Zhang (Carnegie Mellon University) · Akshay Krishnamurthy (Microsoft Research) · Jeff Schneider (Uber/CMU) · Barnabás Póczos (CMU)

A Control-Theoretic Perspective on Nesterov’s Accelerated Gradient Method Michael Muehlebach (UC Berkeley) · Michael Jordan (UC Berkeley)Papers In the following, I collect (probably) all papers (directly) related to RL and put them into various topics. Comments are welcome, e.g., about the categorization, or if I miss some (important) papers. My email: [email protected]. Thanks!

Papers: Value Function Diagnosing Bottlenecks in Deep Q-learning Algorithms Justin Fu (University of California, Berkeley) · Aviral Kumar (University of California Berkeley) · Matthew Soh (UC Berkeley) · Sergey Levine (Berkeley)

The Value Function Polytope in Reinforcement Learning Robert Dadashi (Google AI Residency Program) · Marc Bellemare (Google Brain) · Adrien Ali Taiga (Université de Montréal) · Nicolas Le Roux (Google) · Dale Schuurmans (Google / University of Alberta)

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland (DeepMind) · Robert Dadashi (Google AI Residency Program) · Saurabh Kumar (Google) · Remi Munos (DeepMind) · Marc Bellemare (Google Brain) · Will Dabney (DeepMind)

Nonlinear Distributional Gradient Temporal-Difference Learning chao qu (Ant Financial Service Group) · Shie Mannor (Technion) · Huan Xu (Georgia Tech)

Sample-Optimal Parametric Q-Learning with Linear Transition Models Lin Yang (Princeton) · Mengdi Wang (Princeton University)

Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models Michael Oberst (MIT) · David Sontag (Massachusetts Institute of Technology)

Composing Value Functions in Reinforcement Learning Benjamin van Niekerk (University of the Witwatersrand) · Steven James (University of the Witwatersrand) · Adam Earle (University of the Witwatersrand) · Benjamin Rosman (Council for Scientific and Industrial Research)

Making Deep Q-learning methods robust to time discretization Corentin Tallec (Univ. Paris-Sud) · Leonard Blier (Université Paris Sud and Facebook) · Yann Ollivier (Facebook Artificial Intelligence Research)

Sample-Optimal Parametric Q-Learning Using Linearly Additive Features Lin Yang (Princeton) · Mengdi Wang (Princeton University)

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds Andrea Zanette (Stanford University) · Emma Brunskill (Stanford University)

Revisiting the Softmax Bellman Operator: New Benefits and New Perspective Zhao Song (Baidu Research) · Ron Parr (Duke University) · Lawrence Carin (Duke University)

Information-Theoretic Considerations in Batch Reinforcement Learning Jinglin Chen (University of Illinois at Urbana-Champaign) · Nan Jiang (University of Illinois at Urbana-Champaign)

Dynamic Weights in Multi-Objective Deep Reinforcement Learning Axel Abels (Université Libre de Bruxelles) · Diederik Roijers (VUB) · Tom Lenaerts (Vrije Universiteit Brussel) · Ann Nowé (Vrije Universiteit Brussel) · Denis Steckelmacher (Vrije Universiteit Brussel)

Papers: Policy Understanding the Impact of Entropy on Policy Optimization

Zafarali Ahmed (Mila — McGill University) · Nicolas Le Roux (Google) · Mohammad Norouzi (Google Brain) · Dale Schuurmans (Google / University of Alberta)

Policy Certificates: Towards Accountable Reinforcement Learning Christoph Dann (Carnegie Mellon University) · Lihong Li (Google Inc.) · Wei Wei (Google) · Emma Brunskill (Stanford University)

Quantifying Generalization in Reinforcement Learning Karl Cobbe (OpenAI) · Oleg Klimov (OpenAI) · Chris Hesse (OpenAI) · Taehoon Kim (OpenAI) · John Schulman (OpenAI)

Off-Policy Deep Reinforcement Learning without Exploration Scott Fujimoto (McGill University) · David Meger (McGill University) · Doina Precup (McGill University / DeepMind)

Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning Casey Chu (Stanford University) · Jose Blanchet (Stanford University) · Peter Glynn (Stanford University)

POLITEX: Regret Bounds for Policy Iteration using Expert Prediction Nevena Lazic (Google) · Yasin Abbasi-Yadkori (Adobe Research) · Kush Bhatia (UC Berkeley) · Gellért Weisz (DeepMind) · Peter Bartlett (“University of California, Berkeley”) · Csaba Szepesvari (DeepMind/University of Alberta)

Collaborative Evolutionary Reinforcement Learning Shauharda Khadka (Intel AI) · Somdeb Majumdar (Intel AI Lab) · Tarek Nassar (Intel AI Lab) · Zach Dwiel (Intel AI Lab) · Evren Tumer (Intel Corporation) · Santiago Miret (Intel AI Products Group) · Yinyin Liu (Intel AI Lab) · Kagan Tumer (Oregon State University US)

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules Daniel Ho (UC Berkeley) · Eric Liang (UC Berkeley) · Xi Chen (UC Berkeley) · Ion Stoica (UC Berkeley) · Pieter Abbeel (UC Berkeley)

Safe Policy Improvement with Baseline Bootstrapping Romain Laroche (Microsoft Research) · Paul TRICHELAIR (Mila — Quebec AI Institute/McGill University) · Remi Tachet des Combes (Microsoft Research Montreal)

Fingerprint Policy Optimisation for Robust Reinforcement Learning Supratik Paul (University of Oxford) · Michael A Osborne (U Oxford) · Shimon Whiteson (University of Oxford)

Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN dror freirich (Technion) · Tzahi Shimkin (Technion Israeli Institute of Technology) · Ron Meir (Technion Israeli Institute of Technology) · Aviv Tamar (Technion Israeli Institute of Technology)

Predictor-Corrector Policy Optimization Ching-An Cheng (Georgia Tech) · Xinyan Yan (Georgia Tech) · Nathan Ratliff (NVIDIA) · Byron Boots (Georgia Tech)

Optimistic Policy Optimization via Multiple Importance Sampling Matteo Papini (Politecnico di Milano) · Alberto Maria Metelli (Politecnico di Milano) · Lorenzo Lupo (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)

Projections for Approximate Policy Iteration Algorithms Riad Akrour (TU Darmstadt) · Joni Pajarinen (TU Darmstadt) · Jan Peters (TU Darmstadt + Max Planck Institute for Intelligent Systems) · Gerhard Neumann (University of Lincoln)

Transfer of Samples in Policy Search via Multiple Importance Sampling Andrea Tirinzoni (Politecnico di Milano) · Mattia Salvini (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)

Hessian Aided Policy Gradient Zebang Shen (Zhejiang University) · Alejandro Ribeiro (University of Pennsylvania) · Hamed Hassani (University of Pennsylvania) · Hui Qian (Zhejiang University) · Chao Mi (Zhejiang University)

Policy Consolidation for Continual Reinforcement Learning Christos Kaplanis (Imperial College London) · Murray Shanahan (DeepMind / Imperial College London) · Claudia Clopath (Imperial College London)

Importance Sampling Policy Evaluation with an Estimated Behavior Policy Josiah Hanna (UT Austin) · Scott Niekum (University of Texas at Austin) · Peter Stone (University of Texas at Austin)

Trajectory-Based Off-Policy Deep Reinforcement Learning Andreas Doerr (Bosch Center for Artificial Intelligence, Max Planck Institute for Intelligent Systems) · Michael Volpp (Bosch Center for AI) · Marc Toussaint (University Stuttgart) · Sebastian Trimpe (Max Planck Institute for Intelligent Systems) · Christian Daniel (Bosch Center for Artificial Intelligence)

CAB: Continuous Adaptive Blending for Policy Evaluation and Learning Yi Su (Cornell University) · Lequn Wang (Cornell University) · Michele Santacatterina (TRIPODS Center of Data Science — Cornell University) · Thorsten Joachims (Cornell)

More Efficient Policy Value Evaluation through Regularized Targeted Learning Aurelien Bibaut (UC Berkeley) · Ivana Malenica (UC Berkeley) · Nikos Vlassis (Netflix) · Mark van der Laan (UC Berkeley)

Learning Novel Policies For Tasks Yunbo Zhang (Georgia Institute of Technology) · Wenhao Yu (Georgia Institute of Technology) · Greg Turk (Georgia Institute of Technology)

Remember and Forget for Experience Replay Guido Novati (ETH Zurich) · Petros Koumoutsakos (ETH Zurich)

Online Control with Adversarial Disturbances Naman Agarwal (Google AI Princeton) · Brian Bullins (Princeton University) · Elad Hazan (Google Brain and Princeton University) · Sham Kakade (University of Washington) · Karan Singh (Princeton University)

Action Robust Reinforcement Learning and Applications in Continuous Control Chen Tessler (Technion) · Yonathan Efroni (Technion) · Shie Mannor (Technion)

Control Regularization for Reduced Variance Reinforcement Learning Richard Cheng (California Institute of Technology) · Abhinav Verma (Rice University) · Gabor Orosz (University of Michigan) · Swarat Chaudhuri (Rice University) · Yisong Yue (Caltech) · Joel Burdick (Caltech)

Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning Seungyul Han (KAIST) · Youngchul Sung (KAIST)

Kernel-Based Reinforcement Learning in Robust Markov Decision Processes Shiau Hong Lim (IBM Research) · Arnaud Autef (Ecole Polytechnique)

A Theory of Regularized Markov Decision Processes Matthieu Geist (Google) · Bruno Scherrer (INRIA) · Olivier Pietquin (GOOGLE BRAIN)

Online Convex Optimization in Adversarial Markov Decision Processes Aviv Rosenberg (Tell Aviv University) · Yishay Mansour (Google and Tel Aviv University)

Batch Policy Learning under Constraints Hoang Le (Caltech) · Cameron Voloshin (Caltech) · Yisong Yue (Caltech)

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning Rui Zhao (Siemens & Ludwig Maximilian University of Munich) · Xudong Sun (Ludwig Maximilian University of Munich) · Volker Tresp (Siemens AG and University of Munich)

Reinforcement Learning in Configurable Continuous Environments Alberto Maria Metelli (Politecnico di Milano) · Emanuele Ghelfi (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)

On the Generalization Gap in Reparameterizable Reinforcement Learning Huan Wang (Salesforce Research) · Stephan Zheng (Salesforce Research) · Caiming Xiong (Salesforce) · Richard Socher (Salesforce)

Papers: Reward Provably Efficient Imitation Learning from Observation Alone Wen Sun (Carnegie Mellon University) · Anirudh Vemula (CMU) · Byron Boots (Georgia Tech) · Drew Bagnell (Carnegie Mellon University)

Imitating Latent Policies from Observation Ashley Edwards (Georgia Institute of Technology) · Himanshu Sahni (Georgia Institute of Technology) · Yannick Schroecker (Georgia Institute of Technology) · Charles Isbell (Georgia Institute of Technology)

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations Daniel Brown (University of Texas at Austin) · Wonjoon Goo (University of Texas at Austin) · Prabhat Nagarajan (Preferred Networks) · Scott Niekum (University of Texas at Austin)

Imitation Learning from Imperfect Demonstration Yueh-Hua Wu (National Taiwan University) · Nontawat Charoenphakdee (The University of Tokyo / RIKEN) · Han Bao (The University of Tokyo / RIKEN) · Voot Tangkaratt (RIKEN AIP) · Masashi Sugiyama (RIKEN / The University of Tokyo)

Papers: Model An investigation of model-free planning Arthur Guez (Google DeepMind) · Mehdi Mirza (DeepMind) · Karol Gregor (DeepMind) · Rishabh Kabra (DeepMind) · Sebastien Racaniere (DeepMind) · Theophane Weber (DeepMind) · David Raposo (DeepMind) · Adam Santoro (DeepMind) · Laurent Orseau (DeepMind) · Tom Eccles (DeepMind) · Greg Wayne (DeepMind) · David Silver (Google DeepMind) · Timothy Lillicrap (Google DeepMind)

Calibrated Model-Based Deep Reinforcement Learning Ali Malik (Stanford Universtiy) · Volodymyr Kuleshov (Stanford University) · Jiaming Song (Stanford) · Danny Nemer (Afresh Technologies) · Harlan Seymour (Afresh Technologies) · Stefano Ermon (Stanford University)

Learning Latent Dynamics for Planning from Pixels Danijar Hafner (Google Brain & University of Toronto) · Timothy Lillicrap (Google DeepMind) · Ian Fischer (Google) · Ruben Villegas (University of Michigan) · David Ha (Google) · Honglak Lee (Google / U. Michigan) · James Davidson (Google Brain)

Papers: Exploration Distribution Reinforcement Learning for Efficient Exploration Borislav Mavrin (University of Alberta) · Hengshuai Yao (Huawei Technologies) · Linglong Kong (University of Alberta) · Kaiwen Wu (University of Waterloo) · Yaoliang Yu (University of Waterloo)

Exploration Conscious Reinforcement Learning Revisited Lior Shani (Technion) · Yonathan Efroni (Technion) · Shie Mannor (Technion)

Dead-ends and Secure Exploration in Reinforcement Learning Mehdi Fatemi (Microsoft Research) · Shikhar Sharma (Microsoft Research) · Harm van Seijen (Microsoft Research) · Samira Ebrahimi Kahou (Microsoft Research)

Learning to Explore via Disagreement Deepak Pathak (UC Berkeley) · Dhiraj Gandhi (Carnegie Mellon University Robotics Institute) · Abhinav Gupta (Carnegie Mellon University)

Model-Based Active Exploration Pranav Shyam (NNAISENSE) · Wojciech Jaskowski (NNAISENSE) · Faustino Gomez (NNAISENSE SA)

Papers: Exploration: Bandits Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback Chicheng Zhang (Microsoft Research) · Alekh Agarwal (Microsoft Research) · Hal Daume (Microsoft Research) · John Langford (Microsoft Research) · Sahand Negahban (YALE)

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits Branislav Kveton (Google Research) · Csaba Szepesvari (DeepMind/University of Alberta) · Sharan Vaswani (Mila, University of Montreal) · Zheng Wen (Adobe Research) · Tor Lattimore (DeepMind) · Mohammad Ghavamzadeh (Facebook AI Research)

Decentralized Exploration in Multi-Armed Bandits Raphael Feraud (Orange Labs) · REDA ALAMI (Orange Labs — Paris Saclay University — INRIA) · Romain Laroche (Microsoft Research)

Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits Martin Zhang (Stanford University) · James Zou (Stanford) · David Tse (Stanford University)

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model Gi-Soo Kim (Seoul National University) · Myunghee Cho Paik (Seoul National University)

Bilinear Bandits with Low-rank Structure Kwang-Sung Jun (Boston University) · Rebecca Willett (U Chicago) · Stephen Wright (University of Wisconsin-Madison) · Robert Nowak (University of Wisconsion-Madison)

Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards Shiyin Lu (Nanjing University) · Guanghui Wang (Nanjing University) · Yao Hu (Alibaba Youku Cognitive and Intelligent Lab) · Lijun Zhang (Nanjing University)

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously Julian Zimmert (University of Copenhagen) · Haipeng Luo (University of Southern California) · Chen-Yu Wei (University of Southern California)

Exploiting structure of uncertainty for efficient combinatorial semi-bandits Pierre Perrault (Inria Lille — Nord Europe) · Vianney Perchet (ENS Paris Saclay & Criteo AI Lab) · Michal Valko (DeepMind)

Correlated bandits or: How to minimize mean-squared error online Vinay Praneeth Boda (LinkedIn Corp.) · Prashanth L.A. (IIT Madras)

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits Arghya Roy Chaudhuri (Indian Institute of Technology Bombay) · Shivaram Kalyanakrishnan (IIT Bombay)

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging Ping-Chun Hsieh (Texas A&M University) · Xi Liu (Texas A&M University) · Anirban Bhattacharya (Texas A&M University) · P R Kumar (Texas A & M University)

Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem Junyu Cao (University of California Berkeley) · Wei Sun (IBM Research)

Data Poisoning Attacks on Stochastic Bandits Fang Liu (The Ohio State University) · Ness Shroff (The Ohio State University)

On the design of estimators for bandit off-policy evaluation Nikos Vlassis (Netflix) · Aurelien Bibaut (UC Berkeley) · Maria Dimakopoulou (Stanford) · Tony Jebara (Netflix)

An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule Touqir Sajed (University of Alberta) · Or Sheffet (University of Alberta)

Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case Alina Beygelzimer (Yahoo Research) · David Pal (Expedia) · Balazs Szorenyi (Yahoo Research) · Devanathan Thiruvenkatachari (New York University) · Chen-Yu Wei (University of Southern California) · Chicheng Zhang (Microsoft Research)

Papers: Representation Learning Action Representations for Reinforcement Learning Yash Chandak (University of Massachusetts Amherst) · Georgios Theocharous (Adobe Research) · James Kostas (UMass Amherst) · Scott Jordan (University of Massachusetts Amherst) · Philip Thomas (University of Massachusetts Amherst)

Provably efficient RL with Rich Observations via Latent State Decoding Simon Du (Carnegie Mellon University) · Akshay Krishnamurthy (Microsoft Research) · Nan Jiang (University of Illinois at Urbana-Champaign) · Alekh Agarwal (Microsoft Research) · Miroslav Dudik (Microsoft Research) · John Langford (Microsoft Research)

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun Du (MIT) · Karthik Narasimhan (Princeton)

The Natural Language of Actions Guy Tennenholtz (Technion) · Shie Mannor (Technion)

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning Marvin Zhang (UC Berkeley) · Sharad Vikram (UCSD) · Laura Smith (UC Berkeley) · Pieter Abbeel (OpenAI / UC Berkeley) · Matthew Johnson (Google Brain) · Sergey Levine (Berkeley)

DeepMDP: Learning Continuous Latent Space Models with Theoretical Guarantees Carles Gelada (Google Brain) · Saurabh Kumar (Google Brain) · Jacob Buckman (Johns Hopkins University) · Ofir Nachum (Google Brain) · Marc Bellemare (Google Brain)

Papers: Hierarchical RL Finding Options that Minimize Planning Time Yuu Jinnai (Brown University) · David Abel (Brown University) · David Hershkowitz (Carnegie Mellon University) · Michael L. Littman (Brown University) · George Konidaris (Brown)

Option Discovery for Solving Sparse Reward Reinforcement Learning Problems Yuu Jinnai (Brown University) · Jee Won Park (Brown University) · David Abel (Brown University) · George Konidaris (Brown)

Per-Decision Option Discounting Anna Harutyunyan (DeepMind) · Peter Vrancx (PROWLER.io) · Philippe Hamel (Deepmind) · Ann Nowe (VU Brussel) · Doina Precup (DeepMind)

Papers: Multi-agent RL Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning Jakob Foerster (Facebook AI Research) · Francis Song (DeepMind) · Edward Hughes (DeepMind) · Neil Burch (DeepMind) · Iain Dunning (DeepMind) · Shimon Whiteson (University of Oxford) · Matthew Botvinick (DeepMind) · Michael Bowling (DeepMind)

Multi-Agent Adversarial Inverse Reinforcement Learning Lantao Yu (Stanford University) · Jiaming Song (Stanford) · Stefano Ermon (Stanford University)

Actor-Attention-Critic for Multi-Agent Reinforcement Learning Shariq Iqbal (University of Southern California) · Fei Sha (University of Southern California)

Learning to Collaborate in Markov Decision Processes Goran Radanovic (Harvard University) · Rati Devidze (Max Planck Institute for Software Systems) · David Parkes (Harvard University) · Adish Singla (Max Planck Institute (MPI-SWS))

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning Natasha Jaques (MIT) · Angeliki Lazaridou (DeepMind) · Edward Hughes (DeepMind) · Caglar Gulcehre (DeepMind) · Pedro Ortega (DeepMind) · DJ Strouse (Princeton University) · Joel Z Leibo (DeepMind) · Nando de Freitas (DeepMind)

TarMAC: Targeted Multi-Agent Communication Abhishek Das (Georgia Tech) · Theophile Gervet (Carnegie Mellon University) · Joshua Romoff (McGill University) · Dhruv Batra (Georgia Institute of Technology / Facebook AI Research) · Devi Parikh (Georgia Tech & Facebook AI Research) · Michael Rabbat (Facebook) · Joelle Pineau (Facebook)

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning Thinh Doan (Georgia Institute of Technology) · Siva Maguluri (Georgia Tech) · Justin Romberg (Georgia Tech)

Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI Lei Han (Tencent AI Lab) · Peng Sun (Tencent AI Lab) · Yali Du (University of Technology Sydney) · Jiechao Xiong (Tencent AI Lab) · Qing Wang () · Xinghai Sun (Tencent AI Lab) · Han Liu (Northwestern) · Tong Zhang (Tecent AI Lab)

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning Kyunghwan Son (KAIST) · Daewoo Kim (KAIST) · Wan Ju Kang (KAIST) · David Earl Hostallero (KAIST) · Yung Yi (KAIST)

A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs Jingkai Mao (Man AHL) · Jakob Foerster (Facebook AI Research) · Tim Rocktäschel (University of Oxford) · Maruan Al-Shedivat (Carnegie Mellon University) · Gregory Farquhar (University of Oxford) · Shimon Whiteson (University of Oxford)

Open-ended learning in zero-sum games David Balduzzi (DeepMind) · Marta Garnelo (DeepMind) · Yoram Bachrach () · Wojciech Czarnecki (DeepMind) · Julien Perolat (DeepMind) · Max Jaderberg (DeepMind) · Thore Graepel (DeepMind)

Papers: Relational RL Neural Logic Reinforcement Learning zhengyao jiang (University of Liverpool) · Shan Luo (University of Liverpool)

Papers: Learning to Learn Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables Kate Rakelly (UC Berkeley) · Aurick Zhou (UC Berkeley) · Chelsea Finn (Stanford, Google, UC Berkeley) · Sergey Levine (Berkeley) · Deirdre Quillen (UC Berkeley)

CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning Cédric Colas (Inria) · Pierre-Yves Oudeyer (Inria) · Olivier Sigaud (Sorbonne University) · Pierre Fournier (UPMC) · Mohamed Chetouani (UPMC)

Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation Shani Gamrian (Bar-Ilan University) · Yoav Goldberg ()

Few-Shot Intent Inference via Meta-Inverse Reinforcement Learning Kelvin Xu (University of California, Berkeley) · Ellis Ratner (University of California, Berkeley) · EECS Anca Dragan (EECS Department, University of California, Berkeley) · Sergey Levine (Berkeley) · Chelsea Finn (Stanford, Google, UC Berkeley)

Taming MAML: Control variates for unbiased meta-reinforcement learning gradient estimation Hao Liu (Salesforce) · Richard Socher (Salesforce) · Caiming Xiong (Salesforce)

TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning Tameem Adel (University of Cambridge) · Adrian Weller (University of Cambridge, Alan Turing Institute)

Papers: Applications ELF OpenGo: an analysis and open reimplementation of AlphaZero Yuandong Tian (Facebook AI Research) · Jerry Ma (Facebook AI Research) · Qucheng Gong (Facebook AI Research) · Shubho Sengupta (Facebook AI Research) · Zhuoyuan Chen (Facebook) · James Pinkerton (Facebook AI Research) · Larry Zitnick (Facebook AI Research)

Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems Timothy Mann (DeepMind) · Sven Gowal (DeepMind) · Huiyi Hu (DeepMind) · Ray Jiang (Google Deepmind) · Balaji Lakshminarayanan (Google DeepMind) · Andras Gyorgy (DeepMind) · Prav Srinivasan (DeepMind)

Dynamic Measurement Scheduling for Event Forecasting using Deep RL Chun-Hao Chang (University of Toronto) · Mingjie Mai (University of Toronto) · Anna Goldenberg (University of Toronto)

Generative Adversarial User Model for Reinforcement Learning Based Recommendation System Xinshi Chen (Georgia Institution of Technology) · Shuang Li (Georgia Tech) · Hui Li (Ant Financial) · Shaohua Jiang (Ant Financial) · Yuan Qi (Ant Financial Services Group) · Le Song (Georgia Institute of Technology)

A Deep Reinforcement Learning Perspective on Internet Congestion Control Nathan Jay (University of Illinois Urbana-Champaign) · Noga H. Rotman (Hebrew University of Jerusalem) · Brighten Godfrey (University of Illinois Urbana-Champaign) · Michael Schapira (Hebrew University of Jerusalem) · Aviv Tamar (Technion Israeli Institute of Technology)

Target Tracking for Contextual Bandits: Application to Demand Side Management Margaux Brégère (CNRS Université Paris-Sud, Inria Paris, EDF R&D) · Pierre Gaillard (INRIA Paris) · Yannig Goude (EDF Lab Paris-Saclay) · Gilles Stoltz (Université paris Sud)

Greedy Sequential Subset Selection via Sequential Facility Location Ehsan Elhamifar (Northeastern University)

Hiring Under Uncertainty Manish Purohit (Google) · Sreenivas Gollapudi (Google Research) · Manish Raghavan (Cornell)

Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments Kirthevasan Kandasamy (Carnegie Mellon University) · Willie Neiswanger (CMU) · Reed Zhang (Carnegie Mellon University) · Akshay Krishnamurthy (Microsoft Research) · Jeff Schneider (Uber/CMU) · Barnabás Póczos (CMU)

A Control-Theoretic Perspective on Nesterov’s Accelerated Gradient Method Michael Muehlebach (UC Berkeley) · Michael Jordan (UC Berkeley)