We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
critic_loss: def critic_loss_fn(self, values, old_values, returns, mask):
values_clipped = torch.clamp( values, old_values - self.cliprange_value, old_values + self.cliprange_value, ) vf_loss1 = (values - returns) ** 2 vf_loss2 = (values_clipped - returns) ** 2 vf_loss = 0.5 * torch.sum( torch.max(vf_loss1, vf_loss2) * mask) / mask.sum() return vf_loss
Why does vf_loss take the maximum value? If that's the case, wouldn't the clamp be meaningless?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
critic_loss:
def critic_loss_fn(self, values, old_values, returns, mask):
value loss
values_clipped = torch.clamp(
values,
old_values - self.cliprange_value,
old_values + self.cliprange_value,
)
vf_loss1 = (values - returns) ** 2
vf_loss2 = (values_clipped - returns) ** 2
vf_loss = 0.5 * torch.sum(
torch.max(vf_loss1, vf_loss2) * mask) / mask.sum()
return vf_loss
Why does vf_loss take the maximum value? If that's the case, wouldn't the clamp be meaningless?
The text was updated successfully, but these errors were encountered: