Unclear points for Chapter 5 #14

eatam23 · 2024-10-12T18:06:32Z

eatam23
Oct 12, 2024

The following points are unclear for Chapter 5:

1- Page 93-94: It is not clear what the difference between the policy in Figure 5.6 (a) ("a given $\epsilon$-greedy policy for $\epsilon =0$") and the policy in Figure 5.67(a) ("the optimal $\epsilon$-greedy policy for $\epsilon =0$") is. How are these policies obtained?
My guess is that to obtain the policy corresponding to $\epsilon=0$ in Figure 5.6 (a) and other $\epsilon$-greedy policies in Figure 5.6, , first Belman Optimality Equation is solved to obtain optimal action values, and then from
these action values, $\epsilon$- greedy policies are obtained for different values of $\epsilon$. Otherwise, if you use Algorithm 5.3 with $\epsilon$ values, there is risk of local optima and we may not be able to obtain the results in Figure 5.6, especially the optimal result in Figure 5.6 (a).

2- In Figure 5.6, it is said that "These $\epsilon$-greedy policies are consistent with each other in the sense that the actions with the greatest probabilities are the same". No explanation is given why we need this condition.

3- We obtain the results in Figure 5.7 using the Algorithm 5.3, right? However, based on the results of Figure 5.7, the book gives the impression that optimal policies will be obtained if we have $\epsilon=0$ in the $\epsilon$-greedy algorithm in Algorithm 5.3, which may not be correct (especially with a bad initial policy) because we may get stuck at a local optima.

The discussions are ambiguous and need clarity in my opinion. Especially, it is not clear how the results in Figure 5.6 and Figure 5.7 are obtained.

4-What does the left blue arrow in (1,1) cell of Figure 5.7 denote? Why do we have it here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unclear points for Chapter 5 #14

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Unclear points for Chapter 5 #14

eatam23 Oct 12, 2024

Replies: 0 comments

eatam23
Oct 12, 2024