Based on the CS-234 course
- High level introduction: SB chapter 1
- Linear algebra review
- Probability review
- Python tutorial
- SB chapter 3, 4.1-4.4
- SB chapter 5.1, 5.5, 6.1-6.3
- David Silver's Lecture 4
- SB chapter 5.2, 5.4, 6.4-6.5, 6.7
- SB chapter 9.3, 9.6, 9.7
- SB chapter 13
- David Silver lecture: Policy Gradient Methods
- John Schulman: Policy Gradients and Q-learning
- Pieter Abbeel: lecture1
- Pieter Abbeel: lecture2
- Pieter Abbeel: lecture3
- Evolution strategies as a scalable alternative to reinforcement learning
- RL specialization
- detailed practical materials
- previous-year assignemnts & solutions
- gymnasium gym deep-dive repo
- gym setup article
Submission link
First deadline: 6/03/22 (30 points max)
Second deadline: 13/03/22 (25 points max)
Final deadline: 24/04/22 (20 points max)
Submission link
First deadline:
Second deadline:
Final deadline: 24/04/22
Submission link
First deadline:
Second deadline:
Final deadline: 24/04/22
-
SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
-
Training language models to follow instructions with human feedback
-
AdaFrame: Adaptive Frame Selection for Fast Video Recognition; Adaptive Focus for Efficient Video Recognition; AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
-
Benchmarking Reinforcement Learning Algorithms on Real-World Robots
-
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor