-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nan
in MultivariateNormalDiag log prob
#216
Comments
Also ran into |
Did some digging into this because it was really bothering me and turns out the behaviour seems somewhat expected / it's not really distrax' fault. I think the conclusion I came to is pretty much what this comment in #7 describes as well, but perhaps it'd be worth documenting here in greater detail since this issue is still open. If you print the sampled actions in this code snippet rather than their sum, you will notice that specifically at index [0,0] the value is Obviously such a value is outside the range of As a sidenote, the reason Therefore, this isn't something that can be fixed on distrax' end. The reason #7's workaround works is because it computes the log prob of the sampled actions using the pre-tanh value (which is readily available since the operation includes a forward sampling pass) and the numerical precision never becomes a problem. Calling To conclude, the ways around this I can think of are to either:
|
Hello thanks for this awesome repo! We have had a slight issue with using distrax which creates
nan
at vwxyzjn/cleanrl#300. See the following reproduction script:The text was updated successfully, but these errors were encountered: