Skip to content

Latest commit

 

History

History
30 lines (22 loc) · 2.13 KB

README.md

File metadata and controls

30 lines (22 loc) · 2.13 KB

Open In Colab

Slides - here

Exploration and exploitation

  • [main] David Silver lecture on exploration and expoitation - video
  • Alternative lecture by J. Schulman - video
  • Alternative lecture by N. de Freitas (with bayesian opt) - video
  • Our lectures (russian)
    • "mathematical" lecture (by Alexander Vorobev) '17 - slides, video
    • "practical" lecture '18 - video
    • Seminar - video

More materials

  • Gittins Index - the less heuristical approach to bandit exploration - article

  • "Deep" version: variational information maximizing exploration - video

    • Same topics in russian - video
  • Lecture covering intrinsically motivated reinforcement learning - video

    • Slides
    • Same topics in russian - video
    • Note: UCB-1 is not for bernoulli rewards, but for arbitrary r in [0,1], so you can just scale any reward to [0,1] to obtain a peace of mind. It's derived directly from Hoeffding's inequality.
  • Very interesting blog post written by Lilian Weng that summarises this week's materials: The Multi-Armed Bandit Problem and Its Solutions

Seminar

In this seminar, you'll be solving basic and contextual bandits with uncertainty-based exploration like Bayesian UCB and Thompson Sampling. You will also get acquainted with Bayesian Neural Networks.

Everything else is in the notebook :)