-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy paths2_Lab9_RM_ANOVA.Rmd
302 lines (223 loc) · 11.5 KB
/
s2_Lab9_RM_ANOVA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
```{r, include = FALSE}
source("global_stuff.R")
```
# Repeated Measures ANOVA
## Reading
Chapters 18 and 19 from @abdiExperimentalDesignAnalysis2009. Also see, additional examples and discussion of simple repeated measures in @crumpAnsweringQuestionsData2018, here <https://crumplab.github.io/statistics/repeated-measures-anova.html>.
## Practical 1: How to run a repeated measures ANOVA in R
This practical section contains three textbook examples of conducting repeated measures ANOVAs in R. The first two examples are single factor designs, each involving a single repeated measure. The third example is involves a 2x2 design, where both factors are repeated measures.
All examples use the `aov()` function that we have used for between-subjects designs. The major difference involves specifying the error term appropriately in the formula.
### Single factor Example 1
From example 18.13 in @abdiExperimentalDesignAnalysis2009.
```{r}
library(tibble)
library(tidyr)
library(dplyr)
e1 <- tribble(~Subject, ~Drug_A, ~Placebo, ~Drug_B,
"s1", 124, 108, 104,
"s2", 105, 107, 100,
"s3",107,90,100,
"s4",109,89,93,
"s5",94,105,89,
"s6",121,71,84)
# convert to long
e1 <- pivot_longer(e1,
cols = !Subject,
names_to = "IV",
values_to = "Latency")
# Convert IVs to factors
e1 <- e1 %>%
mutate(Subject = as.factor(Subject),
IV = as.factor(IV))
# conduct ANOVA, note the addition of the error term in the formula
aov_out <- aov(Latency ~ IV + Error(Subject), data=e1)
summary(aov_out)
```
### Formula syntax for repeated measures ANOVA
Now that we have seen an example of conducting a repeated measures ANOVA with `aov()`, let's closely inspect the formula syntax.
Here is the general syntax for a repeated measures ANOVA with one factor. You would replace `DV` with column name for the dependent variable in your `dataframe`; and replace `IV` with the column name for the independent variable in your `dataframe`; and, you would replacee `Subject` with the column name for your subject variable in your `dataframe`.
```{r, eval=FALSE}
aov(DV ~ IV + Error(Subject), data = dataframe)
```
Compare it to the syntax for between-subjects ANOVA with one factor, notice the between subjects version does not include the ` + Error(Subject)` component of the formula.
```{r, eval=FALSE}
aov(DV ~ IV, data = dataframe)
```
## Single Factor Example 2
From example 18.14 in @abdiExperimentalDesignAnalysis2009.
```{r}
e2 <- tribble(~Subject, ~r1, ~r2, ~r3, ~r4, ~r5, ~r6,
"s1",30,18,21,15,18,12,
"s2",21,23,16,17,13,12,
"s3",19,21,13,15,13,9,
"s4",19,19,16,9,11,10,
"s5",21,16,12,15,9,11,
"s6",22,17,14,12,10,9,
"s7",19,20,17,10,13,5,
"s8",17,18,11,11,9,12
)
# convert to long
e2 <- pivot_longer(e2,cols = !Subject,
names_to = "Rank",
values_to = "Recall")
# Convert IVs to factors
e2 <- e2 %>%
mutate(Subject = as.factor(Subject),
Rank = as.factor(Rank))
# conduct ANOVA, note the addition of the error term in the formula
aov_out <- aov(Recall ~ Rank + Error(Subject), data= e2)
summary(aov_out)
```
## Two factor repeated measures Example 3
When there are multiple within-subject factors, we need to specify the error terms appropriately. Here is the general syntax for a two factor ANOVA with two repeated measures.
```{r, eval=FALSE}
aov(DV ~ IV1*IV2 + Error(Subjects/IV1*IV2), data= dataframe)
```
From example 19.5 in @abdiExperimentalDesignAnalysis2009.
```{r}
godden_baddeley <- tribble(~Subjects,~LearningPlace,~TestingPlace,~Recall,
"s1","On Land","On Land",34,
"s2","On Land","On Land",37,
"s3","On Land","On Land",27,
"s4","On Land","On Land",43,
"s5","On Land","On Land",44,
"s1","On Land","Under Sea",18,
"s2","On Land","Under Sea",21,
"s3","On Land","Under Sea",25,
"s4","On Land","Under Sea",37,
"s5","On Land","Under Sea",34,
"s1","Under Sea","On Land",14,
"s2","Under Sea","On Land",21,
"s3","Under Sea","On Land",31,
"s4","Under Sea","On Land",27,
"s5","Under Sea","On Land",32,
"s1","Under Sea","Under Sea",22,
"s2","Under Sea","Under Sea",25,
"s3","Under Sea","Under Sea",33,
"s4","Under Sea","Under Sea",33,
"s5","Under Sea","Under Sea",42
)
# convert IVs to factors
godden_baddeley <- godden_baddeley %>%
mutate(Subjects = as.factor(Subjects),
LearningPlace = as.factor(LearningPlace),
TestingPlace = as.factor(TestingPlace))
# run ANOVA
aov_out <- aov(Recall ~ LearningPlace*TestingPlace + Error(Subjects/(LearningPlace*TestingPlace)), godden_baddeley)
# print out ANOVA summary table
summary(aov_out)
# generate plot of means
library(ggplot2)
ggplot(godden_baddeley, aes(x=TestingPlace,
y=Recall,
shape=LearningPlace,
group=LearningPlace))+
geom_point(stat="summary",fun="mean")+
geom_line(stat="summary",fun="mean")+
theme_classic(base_size=12)
```
## Concept 1: Sphericity
```{r}
## example data set to illustrate sphericity
textbook <- tribble(~S, ~a1, ~a2, ~a3, ~a4,
"s1",76,64,34,26,
"s2",60,48,46,30,
"s3",58,34,32,28,
"s4",46,46,32,28,
"s5",30,18,36,28
)
## selecting the table of values in each column
textbook[,2:5]
## computing the covariation matrix
cov(textbook[,2:5])
## ta
colMeans(cov(textbook[,2:5]))
## sa-s
colMeans(cov(textbook[,2:5])) - mean(cov(textbook[,2:5]))
## double-centering
cov_matrix <- cov(textbook[,2:5])
col_mean_matrix <- cov_matrix*0 + colMeans(cov_matrix)
row_mean_matrix <- t(cov_matrix*0 + rowMeans(cov_matrix))
dc_matrix <- cov_matrix - col_mean_matrix -row_mean_matrix + mean(cov_matrix)
## greenhouse-geisser
sum(diag(dc_matrix))^2 / ((dim(dc_matrix)[1]-1)*sum(dc_matrix^2))
```
### Sphericity corrections in R
The `aov()` function does not automatically apply corrections for sphericity as is sometimes common in statistics programs. However, there are other ANOVA functions that do, such as the ANOVA function from the `car` package. One wrinkle here is that this function requires data to be in wide format. Here is an example
```{r}
textbook <- tribble(~S, ~a1, ~a2, ~a3, ~a4,
"s1",76,64,34,26,
"s2",60,48,46,30,
"s3",58,34,32,28,
"s4",46,46,32,28,
"s5",30,18,36,28
)
## ANOVA using car
wide_data <- as.matrix(textbook[1:5,2:5])
aov_model <- lm(wide_data ~1)
rm_factor <- factor(c('a1','a2','a3','a4'))
library(car)
new_anova <- Anova(aov_model,
idata=data.frame(rm_factor),
idesign = ~rm_factor,
type="III")
summary(new_anova,multivariate=FALSE)
```
Note, the adjusted degrees of freedom are not displayed in the above printout. This repeated measures ANOVA yielded F(3,12) = 5.3571, p = .0142330.
```{r}
pf(q= 5.3571, df1= 3, df2= 12, lower.tail = FALSE)
```
The estimate of epsilon based on Greenhouse Geisser was .44596. Notice, the p-value is different in the printout above (p = .059999). The Greenhouse-Geisser estimate of epsilon is used to modify the degrees of freedom.
```{r}
GG_df1 = (4-1) * .44596
GG_df2 = (4-1) * (5-1) *.44596
GG_df1
GG_df2
pf(q= 5.3571, df1= GG_df1, df2= GG_df2, lower.tail = FALSE)
```
For comparison purposes, here is an ANOVA summary using `aov()` that does not include corrections.
```{r}
# compare to ANOVA using aov
# convert to long
long_data <- pivot_longer(textbook,cols = !S,
names_to = "IV",
values_to = "DV")
# Convert IVs to factors
long_data <- long_data %>%
mutate(S = as.factor(S),
IV = as.factor(IV))
# conduct ANOVA, note the addition of the error term in the formula
aov_out <- aov(DV ~ IV + Error(S), data= long_data)
summary(aov_out)
```
## Lab 9 Generalization Assignment
<iframe width="560" height="315" src="https://www.youtube.com/embed/C_FB5dgoSB4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
### Instructions
Your assignment instructions are the following:
1. Work inside the new R project for stats II that you created
2. Create a new R Markdown document called "Lab9.Rmd"
3. Use Lab9.Rmd to show your work attempting to solve the following generalization problems. Commit your work regularly so that it appears on your Github repository.
4. **For each problem, make a note about how much of the problem you believe you can solve independently without help**. For example, if you needed to watch the help video and are unable to solve the problem on your own without copying the answers, then your note would be 0. If you are confident you can complete the problem from scratch completely on your own, your note would be 100. It is OK to have all 0s or 100s anything in between.
5. Submit your github repository link for Lab 9 on blackboard.
### Problems
1. Create an R script that can generate simulated data for the following repeated measures design. (2 points)
A. The dependent variable is assumed to come from a normal distribution with mean = 0 and standard deviation = 1.
B. There is one repeated measures factor with 5 levels (Down1, Down2, Control, Up1, Up2). The control group is assumed to have no effect. The Down1 and Down2 levels shift the mean down by 1 and 2 standard deviations, respectively. The Up1 and Up2 levels shift the mean up by 1 and 2 standard deviations, respectively.
C. There are 6 subjects in the experiment, and they are each measured once in each condition. The 6 subjects are assumed to be different from one another (e.g., they will have different baseline means in the control condition), but they will all be influenced by the IV in the exact same way (e.g., no interaction).
2. Run a simulation to determine the proportion of experiments that would return a significant result for the above design. Assume that the effect of the levels of the IV are increments of .1 of a standard deviation, rather than increments of 1 as in the above design.
3. Demonstrate that the Godden and Baddeley example data from the textbook (19.5), which used a 2x2 repeated measures design, can be be analyzed with one-sample t-tests to return the same results. Specifically, show the one-sample t-tests for each main effect and the interaction. (2 points)
Bonus Points
These bonus questions are about the sphericity concept, and they will involve modifying the example data from the textbook (same as used in concept sectdion 1). The data is reprinted here for your convenience:
```{r}
sphericity <- tribble(~S, ~a1, ~a2, ~a3, ~a4,
"s1",76,64,34,26,
"s2",60,48,46,30,
"s3",58,34,32,28,
"s4",46,46,32,28,
"s5",30,18,36,28
)
```
4. Create a line plot showing how each of the 5 subjects perform across the levels of the IV. Discuss how the line plot visually shows the sphericity problem in the data (1 point)
5. Modify the above data to remove the sphericity problem. Specifically, ensure that all subjects are different from another (there overall means are different), and that the IV has the same effect for each level and each subject (no interaction). Then, plot the new data and discuss how the graph shows the sphericity problem has been removed. (1 point)
6. Calculate the Greenhouse-Geisser estimate of epsilon for your modified data to demonstrate you have removed the sphericity problem. (1 point)
## References