SMM intrinsic motivation signs #29

AOS55 · 2022-10-28T11:24:03Z

Hey,

Not sure if anyone can clarify just wanted to check on signs with intrinsic reward for SMM

intr_reward = pred_log_ratios + self.latent_ent_coef * h_z + self.latent_cond_ent_coef * h_z_s.detach()

The original paper in equation 3 has:

r_z(s) = log(p*(s)) - log(rho_pi(s|z)) + log(p(z|s)) - log(p(z))

Why do we add the log(rho_pi(s|z)) == pred_log_ratios and log(p(z)) == self.latent_ent_coef and not subtract them as in equation 3, sorry if this is obvious 😄

The text was updated successfully, but these errors were encountered:

chhas · 2023-01-29T18:29:26Z

I'm also wondering about the signs of the terms within SMM's intrinsic reward.
Regarding pred_log_ratios, I noticed that the VAE of the original SMM implementation returns the negated log_prob (= h_s_z) value.
And within the intrinsic reward it is negated again.
Hence, URLB's intrinsic reward might be correct w.r.t. the sign of h_s_z because URLB's VAE does not negate the log_prob in the first place.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SMM intrinsic motivation signs #29

SMM intrinsic motivation signs #29

AOS55 commented Oct 28, 2022

chhas commented Jan 29, 2023 •

edited

Loading

SMM intrinsic motivation signs #29

SMM intrinsic motivation signs #29

Comments

AOS55 commented Oct 28, 2022

chhas commented Jan 29, 2023 • edited Loading

chhas commented Jan 29, 2023 •

edited

Loading