This repository is for the Reinforcement Learning course CS885 taught by Prof. Pascal Poupart at the University of Waterloo. It covers planning by dynamic programming (value iteration, policy iteration, and modified policy iteration), Q-learning, three bandit algorithms (epsilon-greedy, Thompson sampling, and UCB), REINFORCE, and model-based reinforcement learning.
-
Notifications
You must be signed in to change notification settings - Fork 3
h-shahidi/cs885-rl
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published