Skip to content

Pure random when estimating V(s), but epsilon_greedy when estimating Q(s,a)? #26

Discussion options

You must be logged in to vote

Hi @QuantHao, Good to hear from you again!
There's no reason not to use the epsilon_greedy policy to estimate .
You can estimate the state-value function for the epsilon_greedy policy and even compare it with the state-value function of the random policy. Similarly for the action-value function.

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@QuantHao
Comment options

@praveen-palanisamy
Comment options

Answer selected by QuantHao
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #25 on May 24, 2021 03:02.