Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define Holder continuity for covariance functions #47

Open
chrisyeh96 opened this issue Feb 14, 2024 · 2 comments
Open

Define Holder continuity for covariance functions #47

chrisyeh96 opened this issue Feb 14, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@chrisyeh96
Copy link

chrisyeh96 commented Feb 14, 2024

The theorem on page 30 about sample continuity of a Gaussian process uses Holder continuity, but Holder continuity is never defined in the book. It would be very helpful to see a definition, especially since the book states in the subsequent paragraph that

the squared exponential covariance function (2.4) is Hölder continuous

without providing additional justification.

Furthermore, Hölder continuity is usually defined for functions that take a single input. According to Wikipedia, a function $f: \mathcal{X} \to \mathbb{R}$ is $\alpha$-Hölder continuous on the metric space $(\mathcal{X},d)$ if there exists a constant $0 < C < \infty$ such that

$$\forall x,y \in \mathcal{X}:\quad |f(x)-f(y)| \leq C \, d(x,y)^\alpha.$$

However, a covariance/kernel function $k(x,y)$ takes two inputs. How is Hölder continuity defined for covariance functions, especially non-stationary covariance functions?

@chrisyeh96 chrisyeh96 added the enhancement New feature or request label Feb 14, 2024
@bayesoptbook
Copy link
Owner

bayesoptbook commented Feb 15, 2024

This is a great question, and I apologize for the confusion. I agree with you that I should probably have expanded footnote 26 in chapter 2 to include an explicit definition of Hölder continuity rather than the hand-wavy description I gave. I think I was trying to avoid as much theory in chapter 2 as possible.

Regarding checking the Hölder continuity of a covariance function K(x, x'), the short answer is that you can simply check the Hölder continuity of the functions K(x, ·) (and x can be any arbitrary point in the domain). Note that:

  • this comes down to checking that K(x, x) - K(x, x') is well behaved as |x - x'| → 0, and
  • this implies K(·, x') is also Hölder continuous by symmetry.

However, this answer isn't terribly insightful.

A nicer way to connect the smoothness of the covariance function to the smoothness of sample paths (which is what we're after here after all) is to assume Hölder continuity of f and work backwards from there. I sketch this reasoning below without actually proving anything, but hopefully it's enough to build intuition.

Let f ~ GP(μ, K), and here let us assume that the GP is centered so that μ ≡ 0. (We can deal with nonzero μ later [*].) Let x and x' be points in the domain of the GP, and let ϕ = f(x) and ϕ' = f(x'). Hölder continuity of f at x (we can use your quoted definition here) is a condition on |ϕ - ϕ'|, but this is actually a random variable.

Instead, we can work with a suitable statistical replacement. It turns out the right notion here is the quantity

√E|ϕ - ϕ'|².   (1)

(Note that this equals |ϕ - ϕ'| in the deterministic case, as expected.) We want to control the rate this term vanishes as |x - x'| → 0. If you can accept this definition, we can now define a probabilistic version of the Hölder condition:

√E|ϕ - ϕ'|² ≤ C|x - x'|ᵃ.

equivalent to (squaring both sides)

E|ϕ - ϕ'|² ≤ C|x - x'|²ᵃ.

Note that we can rewrite the lhs in terms of the covariance function:

E|ϕ - ϕ'|² = E[ϕ² - 2ϕϕ' + ϕ'²]
           = K(x, x) + K(x', x') - 2K(x, x').

(Here we relied on the GP being centered.) So what we really want for f to be α-Hölder continuous is for K(x, ·) to be 2α-Hölder continuous. Note that this still comes down to checking that K(x, x) - K(x, x') is well behaved as |x - x'| → 0, like before.

Note also that everything above is very similar in spirit to the continuity in mean square discussion in the book, just with a bit more care put into the continuity check.

As an aside, the term (1) actually defines a pseudometric over the domain and appears often in the GP literature as the so-called canonical metric:

d(x, x') = √E|ϕ - ϕ'|².

Hope this helps!

[*] To deal with nonzero μ just assume it's Hölder continuous and add it back in.

@bayesoptbook
Copy link
Owner

Did this help @chrisyeh96?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants