-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EKF (diagonal) #9
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't include files like these in this PR. They seem unrelated to the PR. We should open a second PR for this, like "update sghmc with new library"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this was lazy from me, the flexi_tree_map update should have been its own PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like what you did with the flexi_tree_map
. Cleans up the code nicely.
I'll trust @dmitrisaberi's review on the EKF stuff. I still need to read through the paper more closely to understand the methods.
Adds extended Kalman filter (EKF) implementations with diagonal Hessian and empirical Fisher approximations to the Hessian of the likelihood.
It's perhaps a non-standard version of the EKF which uses the gradient and Hessian approximation of the likelihood at time t rather than the Jacobian of the non-linear conditional mean of the likelihood. This version is easier to implement since it only requires the log_likelihood function itself (and its gradients) rather than the conditional mean of the likelihood function. For more details see https://arxiv.org/abs/1703.00209 which also gives a gradient descent interpretation and an equivalence between likelihood inverse temperature and learning rate (which we adopt).
Additionally this PR adds a
flexi_tree_map
function to simplify theinplace
code.