-
Notifications
You must be signed in to change notification settings - Fork 8
/
notes-13_intro-to-regression-models_bda3-14.Rmd
76 lines (52 loc) · 2.81 KB
/
notes-13_intro-to-regression-models_bda3-14.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# Section 13. Notes on 'Ch 14. Introduction to regression models'
2021-12-06
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, dpi = 300, comment = "#>")
```
> These are just notes on a single chapter of *BDA3* that were not part of the course.
## Chapter 14. Introduction to regression models
### 14.1 Conditional modeling
- question: how does one quantity $y$ vary as a function of another quantity or vector of quantities $x$?
- conditional distribution of $y$ given $x$ parameterized as $p(y|\theta,x)$
- key statistical modeling issues:
1. defining $y$ and $x$ so that $y$ is reasonably linear as a function of the columns of $X$
- may need to transform $x$
2. set priors on the model parameters
### 14.2 Bayesian analysis of classical regression
- simplest case: *ordinary linear regression*
- observation errors are independent and have equal variance
$$
y | \beta, \sigma, X \sim \text{N}(X \beta, \sigma^2 I)
$$
#### Posterior predictive distribution for new data {-}
- posterior predictive distribution has two sources of uncertainty:
1. the inherent variability in the model represented by $\sigma$ in $y$
2. posterior uncertainty in $\beta$ and $\sigma$
- draw a random sample $\tilde{y}$ from the posterior predictive distribution:
- draw $(\beta, \sigma)$ from their posteriors
- draw $\tilde{y} \sim \text{N}(\tilde{X} \beta, \sigma^2 I)$
## 14.4 Goals of regression analysis
- at least three goals:
1. understand the behavior of $y$ given $x$
2. predict $y$ given $x$
3. causal inference; predict how $y$ would change if $x$ were changed
## 14.5 Assembling the matrix of explanatory variables
### Identifiability and collinearity
- "the parameters in a classical regression cannot be uniquely estimated if there are more parameters than data points or, more generally, if the columns of the matrix $X$ of explanatory variables are not linearly independent" (pg 365)
### Nonlinear relations
- may need to transform variables
- can include more than one transformation in the model as separate covariates
- GLMs and non-linear models are discussed in later chapters
### Indicator variables
- include a categorical variable in a regression using a indicator variable
- separate effect for each category
- or model as related with a hierarchical model
### Interactions
- "If the response to a unit change in $x_i$ depends in what value another predictor $x_j$ has been fixed at, then it is necessary to include *interaction* terms in the model" (pg 367)
- $(x_i - \bar{x_i})(x_j - \bar{x_j})$
## 14.6 Regularization and dimension reduction
- see lecture notes on regularization for more updated recommendations
- "Bayesian regularization":
- location and scale of the prior
- analytic form of the prior (e.g. normal vs. Laplacian vs. Cauchy)
- how the posterior inference is summarized