probability.qmd

# Probability

{{< include shared-config.qmd >}}

---

Most of the content in this chapter should be review from UC Davis Epi 202.

## Statistical events

---

:::{#thm-prob-subset}
If $A$ and $B$ are statistical events and $A\subseteq B$, then $p(A, B) = p(A)$.
:::

---

::: proof
Left to the reader.
:::

## Random variables

### Binary variables {#sec-binary-vars}

{{< include binary-vars.qmd >}}

---

### Count variables {#sec-count-vars}

{{< include count-vars.qmd >}}

---

## Key probability distributions

### The Bernoulli distribution {#sec-bern-dist}

{{< include bernoulli.qmd >}}

---

### The Poisson distribution {#sec-poisson-dist}

{{< include poisson.qmd >}}

---

### The Negative-Binomial distribution {#sec-nb-dist}

{{< include negbinom.qmd >}}

### Weibull Distribution {#sec-weibull}

$$
\begin{aligned}
p(t)&= \alpha\lambda x^{\alpha-1}\text{e}^{-\lambda x^\alpha}\\
h(t)&=\alpha\lambda x^{\alpha-1}\\
S(t)&=\text{e}^{-\lambda x^\alpha}\\
E(T)&= \Gamma(1+1/\alpha)\cdot \lambda^{-1/\alpha}
\end{aligned}
$$

When $\alpha=1$ this is the exponential. When $\alpha>1$ the hazard is
increasing and when $\alpha < 1$ the hazard is decreasing. This provides
more flexibility than the exponential.

We will see more of this distribution later.

## Characteristics of probability distributions

### Probability density function {#sec-prob-dens}

{{< include _def-pdf.qmd >}}

---

:::{#thm-density-vs-CDF}
### Density function is derivative of CDF

The density function $f(t)$ or $\p(T=t)$ for a random variable $T$ at value $t$ is equal to the derivative of the cumulative probability function $F(t) \eqdef P(T\le t)$; that is:

$$f(t) \eqdef \deriv{t} F(t)$$

:::

---

:::{#thm-density-sums-to-one}
#### Density functions integrate to 1

For any density function $f(x)$,

$$\int_{x \in \rangef{X}} f(x) dx = 1$$
:::

---

### Hazard function {#sec-prob-haz}

{{< include _def-hazard.qmd >}}


---

### Expectation {#sec-expectation}

:::{#def-expectation}
### Expectation, expected value, population mean \index{expectation} \index{expected value}

The **expectation**, **expected value**, or **population mean** of a *continuous* random variable $X$, denoted $\E{X}$, $\mu(X)$, or $\mu_X$, is the weighted mean of $X$'s possible values, weighted by the probability density function of those values:

$$\E{X} = \int_{x\in \rangef{X}} x \cdot \p(X=x)dx$$

The **expectation**, **expected value**, or **population mean** of a *discrete* random variable $X$, denoted $\E{X}$, $\mu(X)$, or $\mu_X$, is the mean of $X$'s possible values, weighted by the probability mass function of those values:

$$\E{X} = \sum_{x \in \rangef{X}} x \cdot \P(X=x)$$

(c.f. <https://en.wikipedia.org/wiki/Expected_value>)

:::

---

:::{#thm-bernoulli-mean}
#### Expectation of the Bernoulli distribution

The expectation of a Bernoulli random variable with parameter $\pi$ is:

$$\E{X} = \pi$$
:::

---

:::{.proof}

$$
\ba
\E{X}
&= \sum_{x\in \rangef{X}} x \cd \P(X=x)
\\&= \sum_{x\in \set{0,1}} x \cd \P(X=x)
\\&= \paren{0 \cd \P(X=0)} + \paren{1 \cd \P(X=1)}
\\&= \paren{0 \cd (1-\pi)} + \paren{1 \cd \pi}
\\&= 0 + \pi
\\&= \pi
\ea
$$

:::

---

### Variance and related characteristics

:::{#def-variance}
#### Variance

The variance of a random variable $X$ is the expectation of the squared difference between $X$ and $\E{X}$; that is:

$$
\Var{X} \eqdef \E{(X-\E{X})^2}
$$

:::

---

:::{#thm-variance}
#### Simplified expression for variance

$$\Var{X}=\E{X^2} - \sqf{\E{X}}$$

---

::::{.proof}
By linearity of expectation, we have:

$$
\begin{aligned}
\Var{X} 
&\eqdef \E{(X-\E{X})^2}\\
&=\E{X^2 - 2X\E{X} + \sqf{\E{X}}}\\
&=\E{X^2} - \E{2X\E{X}} + \E{\sqf{\E{X}}}\\
&=\E{X^2} - 2\E{X}\E{X} + \sqf{\E{X}}\\
&=\E{X^2} - \sqf{\E{X}}\\
\end{aligned}
$$
::::

:::

---

::: {#def-precision}
#### Precision

The **precision** of a random variable $X$, often denoted $\tau(X)$, $\tau_X$, or shorthanded as $\tau$, is
the inverse of that random variable's variance; that is:

$$\tau(X) \eqdef \inv{\Var{X}}$$
:::

::: {#def-sd}

#### Standard deviation

The standard deviation of a random variable $X$ is the square-root of the variance of $X$:

$$\SD{X} \eqdef \sqrt{\Var{X}}$$

:::

---

:::{#def-cov}
#### Covariance

For any two one-dimensional random variables, $X,Y$:

$$\Cov{X,Y} \eqdef \Expf{(X - \E X)(Y - \E Y)}$$

:::

---

:::{#thm-alt-cov}
$$\Cov{X,Y}= \E{XY} - \E{X} \E{Y}$$
:::

---

:::{.proof}
Left to the reader.
:::

---

:::{#lem-cov-xx}

##### The covariance of a variable with itself is its variance

For any random variable $X$:

$$\Cov{X,X} = \Var{X}$$

:::

:::{.proof}
$$
\ba
\Cov{X,X} &= E[XX] - E[X]E[X] 
\\ &= E[X^2]-(E[X])^2
\\ &= \Var{X}
\ea
$$
:::

---

:::{#def-cov-vec-x}

#### Variance/covariance of a $p \times 1$ random vector

For a $p \times 1$ dimensional random vector $X$,

$$
\begin{aligned}
\text{Var}(X) 
&\eqdef \text{Cov}(X)\\
&\eqdef E[ \left( X - E\lbrack X\rbrack \right)^{\top}\left( X - E\lbrack X\rbrack \right) ]\\
\ea
$$

:::

---

:::{#thm-vcov-vec}

#### Alternate expression for variance of a random vector

$$
\ba
\Var{X} 
&= E[ X^{\top}X ] - {E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack
\end{aligned}
$$
:::

---

:::{.proof}
$$
\ba
\Var{X} 
&= E[ \left( X^{\top} - E\lbrack X\rbrack^{\top} \right)\left( X - E\lbrack X\rbrack \right) ]\\
&= E[ X^{\top}X - E\lbrack X\rbrack^{\top}X - X^{\top}E\lbrack X\rbrack + E\lbrack X\rbrack^{\top}E\lbrack X\rbrack ]\\
&= E[ X^{\top}X ] - E\lbrack X\rbrack^{\top}E\lbrack X\rbrack - {E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack + E\lbrack X\rbrack^{\top}E\lbrack X\rbrack\\
&= E[ X^{\top}X ] - 2{E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack + E\lbrack X\rbrack^{\top}E\lbrack X\rbrack\\
&= E[ X^{\top}X ] - {E\lbrack X\rbrack}^{\top}E\lbrack X\rbrack
\end{aligned}
$$
:::

---

:::{#thm-var-lincom}

#### Variance of a linear combination

For any set of random variables $\Xin$ and corresponding constants $a_1, ... ,a_n$:

$$\Var{\sumin a_i X_i} = \sumin \sumn{j} a_i a_j \Cov{X_i,X_j}$$
:::

---

:::{.proof}

Left to the reader...

:::

---

:::{#lem-var-lincom2}

For any two random variables $X$ and $Y$ and scalars $a$ and $b$:

$$\Var{aX + bY} = a^2 \Var{X} + b^2 \Var{Y} + 2(a \cd b) \Cov{X,Y}$$

:::

---

:::{.proof}

Apply @thm-var-lincom with $n=2$, $X_1 = X$, and $X_2 = Y$.

Or, see <https://statproofbook.github.io/P/var-lincomb.html>

:::

---

:::{#def-homosked}
### homoskedastic, heteroskedastic

A random variable $Y$ is **homoskedastic** (with respect to covariates $X$) if the variance of $Y$ does not vary with $X$: 

$$\Varr(Y|X=x) = \ss, \forall x$$

Otherwise it is **heteroskedastic**.

:::

---

:::{#def-indpt}

### Statistical independence

A set of random variables $\X1n$ are **statistically independent** 
if their joint probability is equal to the product of their marginal probabilities:

$$\Pr(\Xx1n) = \prodi1n{\Pr(X_i=x_i)}$$

:::

::: notes

::::{.callout-tip}
The symbol for independence, $\ind$, is essentially just $\prod$ upside-down.
So the symbol can remind you of its definition (@def-indpt).
::::

:::

---

:::{#def-cind}

### Conditional independence

A set of random variables $\dsn{Y}$ are **conditionally statistically independent** 
given a set of covariates $\X1n$
if the joint probability of the $Y_i$s given the $X_i$s is equal to 
the product of their marginal probabilities:

$$\Pr(\dsvn{Y}{y}|\dsvn{X}{x}) = \prodi1n{\Pr(Y_i=y_i|X_i=x_i)}$$

:::

---

:::{#def-ident}

#### Identically distributed

A set of random variables $\X1n$ are **identically distributed**
if they have the same range $\rangef{X}$ and if
their marginal distributions $\P(X_1=x_1), ..., \P(X_n=x_n)$ are all 
equal to some shared distribution $\P(X=x)$:

$$
\forall i\in \set{1:n}, \forall x \in \rangef{X}: \P(X_i=x) = \P(X=x)
$$

:::

---

:::{#def-cident}

#### Conditionally identically distributed

A set of random variables $\dsn{Y}$ are **conditionally identically distributed** 
given a set of covariates $\X1n$ 
if $\dsn{Y}$ have the same range $\rangef{X}$ and if
the distributions $\P(Y_i=y_i|X_i =x_i)$ are all 
equal to the same distribution $\P(Y=y|X=x)$:

$$
\P(Y_i=y|X_i=x) = \P(Y=y|X=x)
$$
 
:::

---

:::{#def-iid}
#### Independent and identically distributed

A set of random variables $\dsn{X}$ are **independent and identically distributed**
(shorthand: "$X_i\ \iid$") if they are statistically independent and identically distributed.

:::

---

:::{#def-iid}
#### Conditionally independent and identically distributed

A set of random variables $\dsn{Y}$ are **conditionally independent and identically distributed** (shorthand: "$Y_i | X_i\ \ciid$" or just "$Y_i |X_i\ \iid$") given a set of covariates $\dsn{X}$
if $\dsn{Y}$ are conditionally independent given $\dsn{X}$ and $\dsn{Y}$ are identically distributed given 
$\dsn{X}$.

:::

{{< include sec-CLT.qmd >}}