-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pseudo-Observation Parametrisations #121
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Codecov Report
@@ Coverage Diff @@
## master #121 +/- ##
==========================================
+ Coverage 93.61% 94.41% +0.79%
==========================================
Files 5 5
Lines 329 376 +47
==========================================
+ Hits 308 355 +47
Misses 21 21
Continue to review full report at Codecov.
|
# pseudo-observation covariance matrix `Ŝ` such that ``\hat{C} = C - C (C + \hat{S})^{-1} C``. | ||
# | ||
# However, ths is not necessarily a problem: if the likelihood used in the model is | ||
# log-concave then the optimal choice for `Ĉ` can always be represented using this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a citation for this? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about that. I've definitely seen the result floating around (and it's easy enough to prove) -- will have a hunt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's that easy to prove, just do it here in the docs 😂
# The reason to think that this parametrisation will do something sensible is this property. | ||
# Obviously when ``\mathbf{v} \neq \mathbf{x}`` the optimal approximate posterior cannot be | ||
# recovered, however, when the hope is that there exists a small pseudo-dataset which gets | ||
# close to the optimum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the advantage of this parametrisation ?
) | ||
return AbstractGPs.elbo(sva, l_fx, ys; kwargs...) | ||
end | ||
|
||
_get_prior(approx::SparseVariationalApproximation) = approx.fz.f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just calling it
_get_prior(approx::SparseVariationalApproximation) = approx.fz.f | |
prior(approx::SparseVariationalApproximation) = approx.fz.f |
?
I think it shows up in a bunch more places (within AbstractGPs' VFE/DTC code too)...
PseudoObsSparseVariationalApproximation( | ||
f::AbstractGP, | ||
z::AbstractVector, | ||
S::AbstractMatrix{<:Real}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it's got to be a symmetric, pos-def matrix, right?
S::AbstractMatrix{<:Real}, v::AbstractVector, y::AbstractVector{<:Real} | ||
) | ||
|
||
Chooses `likelihood(u) = N(y; f(v), S)` where `length(y)` need not be equal to the number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this makes any sense. On the left-hand side you have u
, on the right hand side you don't. How are the two coupled ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so it's implicit via f
. u := f(z)
, so by making noisy observations of f(v)
, you learn something about f(z)
. Not clear from what I wrote though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
....so then the inducing points are actually v
? why are u
/z
needed then?
y = approx.likelihood.y | ||
S = approx.likelihood.S | ||
v = approx.likelihood.v | ||
return posterior(AbstractGPs.VFE(f(z, 1e-9)), f(v, S), y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we name all these magic constants floating around? Should they be consistent? Should they be configurable? Here it's 1e-9, above it's zero, below it's 1e-18....
approx_centered = SparseVariationalApproximation( | ||
Centered(), f(z, 1e-12), qu | ||
) | ||
approx_post_centered = posterior(approx_centered) | ||
approx_centered = SparseVariationalApproximation( | ||
Centered(), f(z, 1e-12), qu | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approx_centered = SparseVariationalApproximation( | |
Centered(), f(z, 1e-12), qu | |
) | |
approx_post_centered = posterior(approx_centered) | |
approx_centered = SparseVariationalApproximation( | |
Centered(), f(z, 1e-12), qu | |
) | |
approx_centered = SparseVariationalApproximation( | |
Centered(), f(z, 1e-12), qu | |
) | |
approx_post_centered = posterior(approx_centered) |
approx_centered = SparseVariationalApproximation( | ||
Centered(), f(z, 1e-12), qu | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approx_centered = SparseVariationalApproximation( | |
Centered(), f(z, 1e-12), qu | |
) |
I'm rather more hesitant about it, because any added code incurs a maintenance burden throughout the future lifetime... so if you'd like to add these I'd like to be more convinced why they're genuinely useful. Are they easier/faster to optimise? What do you gain from using them? |
Whether or not we end up adding these specific parameterisations, I think it would still be good to add the |
As discussed on the call the other day, I've implemented a couple of pseudo-observation parametrisations.
The first has been used in a few places now, so I really think that we should support it. The thing I refer to as the "decoupled pseudo-observation" parametrisation is quite non-standard (I've not actually seen it used anywhere before, just been using it in my own work), but it's a really obvious extension, so I figure why not?
I've had to add an additional abstract type to make it possible to have different fields from those in the `SparseVariationalApproximation", that's why there are quite a lot of LOC changes.
Also, I've expanded on the parametrisation explanations, and provided some code examples that actually run and are now CI'd, so they won't go stale.
Keen to know what people make of this.