-
Notifications
You must be signed in to change notification settings - Fork 0
/
continuous-distributions.Rmd
205 lines (140 loc) · 6.87 KB
/
continuous-distributions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
# Continuous Probability Distributions
## Learning Objectives {-#objectives-continuous-distributions}
1. Define and explain the key characteristics of the continuous distributions:
- normal,
- lognormal,
- exponential,
- gamma,
- chi-square,
- $t$,
- $F$,
- beta and
- uniform on an interval.
2. Evaluate probabilities and quantiles associated with such distributions.
3. Generate discrete and continuous random variables using the inverse transform method.
## Theory {-#theory-continuous-distributions}
**A reminder:** `R` was designed to be used for statistical computing - so it handles **randomness** well! Using `R` we can guarantee reproducibility (and enhance sharability) by using the function `set.seed(seed)`{.R} where `seed` is a single value integer. Using this approach we guarantee the generation of the same sequence of random numbers everytime we call this function. Use `?set.seed`{.R} to learn more about this function.
## In-built probability distributions
**A recap:** `R` has in-built functions for probability distributions:
- **d***\<distribution-name\>* $:=$ **d**ensity ("_PDF_"), *i.e.* $f_X(x)$
- **p***\<distribution-name\>* $:=$ **p**robability distribution cumulative function ("_CDF_"), *i.e.* $F_X(x) =\boldsymbol{P}(X \leq x)$
- **q***\<distribution-name\>* $:=$ **q**uantile function, *i.e.* return $x$ such that $\boldsymbol{P}(X \leq x) = p$
- **r***\<distribution-name\>* $:=$ **r**andom deviates, *i.e.* (*psuedo*) random number generator for a given distribution
- *Where \<distribution-name\>* $=$ Normal, uniform, lognormal, Student's $t$, Poisson, binormal, Weibull ... see `?distributions()`{.R} for more information
To give some quick examples (we will further explore these in more detail later in this chapter):
| R Code | Definition |
|-|-|
| `rnorm(1)`{.R} | Generates $x_1$ where $X \sim \mathcal{N}(0,\,1)$ |
| `rnorm(y, mean=10, sd=2)`{.R} | Generates $\{y_1,\,y_2,\,\dots\}$ with $Y \sim \mathcal{N}(10,\,2^2)$ |
| `runif(3, min=5, max=10)`{.R} | Generates $\{z_1,\,z_2,\,z_3\}$ where $Z \sim \mathcal{U}(5,\,10)$ |
| `dbinom(4, size=5, prob=0.5)`{.R} | Computes $\boldsymbol{P}(X = 4)$ where $X \sim Bin(5,\,0.5)$ |
| `pgamma(0.2, shape=2, rate=2)`{.R} | Computes $F_Y(0.2)$ where $Y \sim \mathcal{\Gamma}(2,\,2)$, i.e. $\boldsymbol{P}(Y\leq 0.2)$|
| `qexp(0.5, rate = 2)`{.R} | Determines smallest value of $z$ for $\boldsymbol{P}(Z \leq z) = 0.5$ where $Z \sim Exp(2)$ |
## Continuous probability distributions covered {-}
We will consider how to interact with the following continuous probability distributions in `R`:
- Normal
- Lognormal
- Exponential
- Gamma
- $\chi^2$
- Student's $t$
- $F$
- Beta
- Uniform (on an interval)
For each distribution above we will determine how to calculate:
- A random deviate following the discrete distribution $X$,
- The probability density function ("_PDF_"), $P(k_1 \leq X \leq k_2)$ for distribution $X$ over the range $[k_1,\,k_2]$,
- The cumulative distribution function ("_CDF_"), $P(X \leq k)$, and
- The quantile function to find $k$ representing the value such that $P(X \leq k) = p$, i.e. the _pth_ percentile.
We will finish off with a plot of the distribution.
## Normal distribution
We start with generating random deviates from the Normal distribution.
If we are interested in the standard normal, `R` helpfully has default argument values for $\mu = 0$ and $\sigma = 0$ so the function call is very concise:
```{r continuous-distributions-normal-1}
# Generate random deviates
set.seed(42)
rnorm(5)
```
We can also specify our own values of $\mu$ and $\sigma$. With `R` we need to remember that the $\sigma$ argument corresponds to the standard deviation, **not** the variance.
```{r continuous-distributions-normal-2}
# Generate random deviates
set.seed(42)
rnorm(5, 10, 2)
```
We next look at the cumulative distribution function for a Normal distribution. In `R` we can calculate this using `pnorm(q, mean, sd)`{.R} where:
- _q_ is the quantile of interest,
- _mean_ is the mean, and
- _sd_ is standard deviation.
```{r continuous-distributions-normal-3}
# Calculate P(X <= 2) for X~N(0,1)
```
Next we want to find the x<sup>th</sup> percentile of $X \sim N(\mu, \sigma)$. We use the quantile function, `qnorm(mu, sigma)`{.R}.
```{r continuous-distributions-normal-4}
# Find the 99th percentile for X~N(10,2)
percentile_99 <- qnorm(0.99, 10, 2)
paste0(
"The 99th percentile of X~N(10, 2) is ",
format(percentile_99, digits = 4),
"."
)
```
As is customary we finish with a plot of the normal distribution.
```{r continuous-distributions-normal-5}
# Plot the distribution function of X~Normal(mu =, sigma = )
```
## Lognormal distribution
```{r continuous-distributions-lognormal-1}
# Plot the distribution function of Y~Lognormal()
```
## Exponential distribution
```{r continuous-distributions-exponential-1}
# Plot the distribution function of Z~Exp()
```
## Gamma distribution
```{r continuous-distributions-gamma-1}
# Plot the distribution of X~Gamma()
```
## $\chi^2$ distribution
```{r continuous-distributions-chi-squared-1}
# Plot the distribution of Y~Chi-Square()
```
## Student's $t$ distribution
```{r continuous-distributions-t-1}
# Plot the distribution of Z~t()
```
## $F$ distribution
```{r continuous-distributions-F-1}
# Plot the distribution of X~F()
```
## Beta distribution
```{r continuous-distributions-Beta-1}
# Plot the distribution of Y~Beta()
```
## Uniform distribution {#continuous-uniform}
```{r continuous-distributions-uniform-1}
# Plot the distribution of Z~Uniform()
```
## Inverse transform method
The inverse transform method is a way to generate psuedo-random numbers from any probability distribution.
One possible algorithm is as follows:
1. Generate a random number $u$ from $U \sim \mathcal{U}(0, 1)
2. Find the inverse of the desired cumulative distribution function, $F^{-1}_X(x)$
3. Compute $X = F^{-1}_X(u)$
Suppose we wanted to draw 10,000 random numbers from $X \sim Exp(\lambda = 2)$. In order to use the inverse transform method we first need to find the inverse of the CDF. For any $X \sim Exp(\lambda)$, the inverse of the CDF is $\frac{-log(1-x)}{\lambda}$. We can thus use the inverse transform algorithm to generate random deviates following $X \sim Exp(2)$:
```{r continuous-distributions-inverse-transform-1}
# Step 0 - to guarantee reproducibility
set.seed(42)
# Step 1 - generate 10,000 random deviates from U[0,1]
u <- runif(10000)
# Step 2 - find the inverse of the CDF: 1 - exp(-lambda.x)
# Inverse of CDF = -log(1 - x) / lambda
# Step 3 - compute X using the inverse of the CDF [from step 2] and the random deviates u [from step 1]
x <- -log(1 - u) / 2
# Plot the resulting x deviates
library(ggplot2)
df <- data.frame(x = x)
ggplot(df, aes(x=x)) +
geom_histogram(binwidth = 0.5, colour="black", fill="white")
```
## `R` Practice {-#practice-continuous-distributions}
We finish with a comprehensive example of an univariate continuous distribution question in `R`.