-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathrosetta_stone.tex
50 lines (44 loc) · 2.8 KB
/
rosetta_stone.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
\chapter{Rosetta Stone}
Bayesian statistics has a lot of terminology floating around, and sometimes
different communities use different terms for the same concept. This appendix
lists various common terms that are basically synonymous (they mean the same thing).
I may even throw in the different terms from time to time, intentionally or
unintentionally!
\subsection*{Event, Hypothesis, Proposition, Statement}
These are all basically the same thing.
A proposition is something that can be either true or false, such as ``my age is
greater than 35'' or ``the number of Aardvarks in my room is either zero or
one''. These are the things that go in our
probability statements: If we write $P(A|B)$, $A$ and $B$ are both propositions,
that is, statements that may be true or false. Event is the preferred term in
classical statistics.
\subsection*{Sampling Distribution, Probability Model for the Data,
Generative Model, Likelihood Function, Likelihood}
This is the thing that we
write as $p(x|\theta)$. Sometimes it is called the sampling distribution or
a generative model because
you can sometimes think of the data as having been ``drawn from''
$p(x|\theta)$ but using the true value of $\theta$, which you don't actually
know. There is a subtlety here, and that is that the word likelihood can be
used to mean either $p(x|\theta)$ before $x$ is known (in which case it is the
thing you would use to predict possible data) or after $x$ is known. In the latter
case $p(x|\theta)$ is only a function of $\theta$ because $x$ is fixed at the
observed value. The term likelihood function is often used at this point.
\subsection*{Probability Distribution}
The term probability distribution is used to refer to either a probability density
function (in the continuous case) or a probability mass function (in the discrete
case). The latter gives the probability of each particular value, whereas the
former only gives a probability if you integrate it within some region.
\subsection*{Marginal Likelihood, Evidence, Prior Predictive Probability, Normalising
Constant}
This is the $p(x)$ or $P(x)$ term in the denominator of Bayes' rule,
and it is also the total of the {\tt prior} $\times$ {\tt likelihood} column
of a Bayes' Box. This is the probability of getting the data that you actually got,
before you observed it: hence the terminology ``prior predictive probability''.
It is also the thing you use to normalise the posterior distribution (make it sum
or integrate to 1), hence the term normalising constant. Marginal likelihood makes
sense because it is a probability of data (like the regular likelihood) but
``marginalised'' (i.e. not caring about) the value of the parameter(s). It is
also called ``evidence'' because it can be used to compare different models, or
to ``patch together'' two Bayes' Boxes after the fact (see the hypothesis testing
chapter).