Slides - here

Exploration and exploitation

[main] David Silver lecture on exploration and expoitation - video
Alternative lecture by J. Schulman - video
Alternative lecture by N. de Freitas (with bayesian opt) - video
Our lectures (russian)
- "mathematical" lecture (by Alexander Vorobev) '17 - slides, video
- "practical" lecture '18 - video
- Seminar - video

More materials

Gittins Index - the less heuristical approach to bandit exploration - article
"Deep" version: variational information maximizing exploration - video
- Same topics in russian - video
Lecture covering intrinsically motivated reinforcement learning - video
- Slides
- Same topics in russian - video
- Note: UCB-1 is not for bernoulli rewards, but for arbitrary r in [0,1], so you can just scale any reward to [0,1] to obtain a peace of mind. It's derived directly from Hoeffding's inequality.
Very interesting blog post written by Lilian Weng that summarises this week's materials: The Multi-Armed Bandit Problem and Its Solutions

Seminar

In this seminar, you'll be solving basic and contextual bandits with uncertainty-based exploration like Bayesian UCB and Thompson Sampling. You will also get acquainted with Bayesian Neural Networks.

Everything else is in the notebook :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Slides - here

Exploration and exploitation

More materials

Seminar

Files

README.md

Latest commit

History

README.md

File metadata and controls

Slides - here

Exploration and exploitation

More materials

Seminar