PPO has no entropy factor #30

CesMak · 2020-04-04T06:24:20Z

Hey there,

Would it be wise to include entropy factor in your ppo implementation?

How to do that?

Second question I have is why do you not use 0.5*MSE Loss instead of F.smooth_l1_loss.

Here are some snippets as suggestion - but I am not absolutely sure

            surr1 = ratio * advantage
            surr2 = torch.clamp(ratio, 1-eps_clip, 1+eps_clip) * advantage
            actor_loss  = -torch.min(surr1, surr2)
            critic_loss = F.smooth_l1_loss(self.v(s) , td_target.detach())# alternative: 0.5*self.MseLoss(state_values, torch.tensor(rewards))
            #beta       = 0.01 # encourage to explore different policies let at 0.01
            total_loss = critic_loss+actor_loss#- beta*dist_entropy

Including entropy we need a function like this:

    def evaluate(self, state, action):
        #what values are returned here?
        action_probs = self.action_layer(state)
        dist = Categorical(action_probs)

        action_logprobs = dist.log_prob(action)
        dist_entropy = dist.entropy()

        state_value = self.value_layer(state)

        return action_logprobs, torch.squeeze(state_value), dist_entropy

However I am not sure about the best way to include entropy in your implementation.

Glad for some help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO has no entropy factor #30

PPO has no entropy factor #30

CesMak commented Apr 4, 2020 •

edited

Loading

PPO has no entropy factor #30

PPO has no entropy factor #30

Comments

CesMak commented Apr 4, 2020 • edited Loading

CesMak commented Apr 4, 2020 •

edited

Loading