This repository has been archived by the owner on Sep 1, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path03.tex
72 lines (54 loc) · 2.04 KB
/
03.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
\documentclass{exam}
\usepackage{amsmath, amsfonts}
\usepackage{verbatim}
\usepackage{graphicx}
\usepackage[super]{nth}
\DeclareMathOperator*{\argmin}{argmin}
\usepackage[hyperfootnotes=false]{hyperref}
\usepackage[usenames,dvipsnames]{color}
\newcommand{\note}[1]{
\noindent~\\
\vspace{0.25cm}
\fcolorbox{Red}{Orange}{\parbox{0.99\textwidth}{#1\\}}
%{\parbox{0.99\textwidth}{#1\\}}
\vspace{0.25cm}
}
%\input{../macros}
%\renewcommand{\hide}[1]{#1}
\qformat{\thequestion. \textbf{\thequestiontitle}\hfill}
\bonusqformat{\thequestion. \textbf{\thequestiontitle}\hfill}
\pagestyle{headandfoot}
%%%%%% MODIFY FOR EACH SHEET!!!! %%%%%%
\newcommand{\duedate}{10.11.2021 (15:00)}
\newcommand{\due}{{\bf This assignment is due on \duedate.} }
\firstpageheader
{Due: \duedate}
{{\bf\lecture}\\ \assignment{1}}
{\lectors\\ \semester}
\runningheader
{Due: \duedate}
{\assignment{2}}
{\semester}
%%%%%% MODIFY FOR EACH SHEET!!!! %%%%%%
\firstpagefooter
{}
{\thepage}
{}
\runningfooter
{}
{\thepage}
{}
\headrule
\pointsinrightmargin
\bracketedpoints
\marginpointname{pt.}
\begin{document}
\noindent
This week you will implement the three different policy evaluation algorithms. They are relatively similar, so pay close attention to their equations. You will also experiment to see the benefits of each one.
\begin{questions}
\titledquestion{Dynamic Programming for Policy evaluation}
At first you will complete the methods in \emph{pe\_dynamic\_programming.py} to evaluate a policy given the MDP. Please use the given method stubs, but you are allowed to add other methods if needed.
\titledquestion{Model-free Policy Evaluation}
Now move on to two policy evaluation algorithms you can use without a model of the MDP. The lecture should contain all the information you need to complete the code in \emph{pe\_monte\_carlo.py} and \emph{pe\_td\_zero.py}. Again pay attention to the differences between the algorithms and use the method stubs we provide in order to pass the tests. As above, you can also add separate helper methods.
\end{questions}
\end{document}