-
Notifications
You must be signed in to change notification settings - Fork 4
/
s2_Lab8_Contrasts.Rmd
253 lines (187 loc) · 10.2 KB
/
s2_Lab8_Contrasts.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
```{r, include = FALSE}
source("global_stuff.R")
```
# Factorial II
## Readings
Chapter 17 [@abdiExperimentalDesignAnalysis2009].
## Overview
This lab supplements the chapter on factorial designs and contrast analyses that discusses statistical techniques for interrogating specific patterns among means in factorial designs.
Fundamental to these ideas is a strong understanding of the concepts of main effect and interactions. In the current lab we build on the score model concept and use R to examine properties of main effects and interactions in greater detail.
## Concept I: Generating main effect and interactions
In this section we use R to generate predicted patterns of main effects and interactions for a design with two independent variables.
The following code chunk shows how the score model for a fixed design (model I) can be coded in R to produce any pattern of main effects and interactions, along with a graphical display of the patterns.
```{r}
#load libraries
library(tibble)
library(ggplot2)
library(patchwork)
# define 2-factor score model
grand_mean <- 50
A <- c(0,5,10,15,20,25,50)
B <- c(0,5,-15)
AB <- rep(0,length(A)*length(B))
# create design table
model_data <- tibble()
for(i in 1:length(A)){
for(j in 1:length(B)){
IVA <- i
IVB <- j
DV <- grand_mean+A[i]+B[j]+AB[(i-1)*length(B)+j]
sc_GM <- grand_mean
sc_A <- A[i]
sc_B <- B[j]
sc_AB <- AB[(i-1)*length(B)+j]
row_entry <- tibble(IVA,IVB,DV,
sc_GM,sc_A,sc_B,sc_AB)
model_data <- rbind(model_data,row_entry)
}
}
knitr::kable(model_data)
# generate plots
bar_graph <- ggplot(model_data,
aes(y=DV,
x=as.factor(IVA),
fill=as.factor(IVB)))+
geom_bar(stat='identity', position='dodge')
line_graph <- ggplot(model_data,
aes(y=DV,
x=IVA,
linetype=as.factor(IVB)))+
geom_line()+
geom_point()
(bar_graph/line_graph)
```
### One or two main effects and no interaction
It is important to recognize that just because a design uses multiple independent variables, it is not necessarily the case that interactions will occur.
For example, consider the following toy example. We will conduct an experiment, and the first IV will have three levels (0,5,10), indicating how much money participants are paid when they complete the experiment. The second IV will also have three levels (0,5,10), indicating how much bonus money participants are paid when they complete the experiment. The DV will be how much money in total the participant is given by the experimenter at the end of the experiment.
Consider how we would represent this experiment using the code from above. First, we program in the expected means for each level of the first and second IV. Second, we do not program an interaction term.
```{r}
# define 2-factor score model
grand_mean <- 10
A <- c(-5,0,5)
B <- c(-5,0,5)
AB <- rep(0,length(A)*length(B))
# create design table
model_data <- tibble()
for(i in 1:length(A)){
for(j in 1:length(B)){
IVA <- i
IVB <- j
DV <- grand_mean+A[i]+B[j]+AB[(i-1)*length(B)+j]
sc_GM <- grand_mean
sc_A <- A[i]
sc_B <- B[j]
sc_AB <- AB[(i-1)*length(B)+j]
row_entry <- tibble(IVA,IVB,DV,
sc_GM,sc_A,sc_B,sc_AB)
model_data <- rbind(model_data,row_entry)
}
}
knitr::kable(model_data)
# generate plots
bar_graph <- ggplot(model_data,
aes(y=DV,
x=as.factor(IVA),
fill=as.factor(IVB)))+
geom_bar(stat='identity', position='dodge')
line_graph <- ggplot(model_data,
aes(y=DV,
x=IVA,
linetype=as.factor(IVB)))+
geom_line()+
geom_point()
(bar_graph/line_graph)
```
This is a silly design where an interaction should, by definition, be impossible. There are two independent sources of money, and the total amount of money received by each subject is the sum of both sources. In this scenario, the concept of an interaction would occur if one of the combinations of IVA and IVB levels somehow produces more or less than the sum of the levels. For example, if the subjects who received 5 dollars in level 2 of IVA, and 5 dollars bonus from level 2 of IVB, somehow were paid more than 10 dollars in total, or less than 10 dollars in total, there would be an interaction. However, in our mock scenario, we are stipulating that 5+5 = 10, so we have created a situation where an interaction is not possible.
### Two main effects and an interaction
Let's consider an example from a classic demonstration in visual search. In a visual search task subjects are shown a visual display and asked to find a target among distractors.
For example, the task could be to find the T, and report whether it is rotated to the left or to the right. Some example displays are shown below.
```{r}
knitr::include_graphics('imgs/Factorial/visual_search.png')
```
The figure depicts a 3x2 design, with three levels of set-size (10,20, 30), and two levels of popout (no popout vs color popout),
A classic main effect in visual search involves set-size. In general, it takes longer to finder a target as the number of distractors (set-size) increases.
However, it is possible modulate the set-size effect. For example, the set-size effect would be very pronounced in the no color popout condition. However, in the color popout condition, the target just "pops" out, and it is very easy to find. In this case, the number of distractors (set-size) has a very small or possibly no effect on search time.
Let's implement these ideas to produce expected patterns of main effects and interactions for this design. We assume it takes about 500ms to find a target, and that the set size effect changes the time by about 10ms per distractor.
```{r}
# define 2-factor score model
grand_mean <- 500
A <- c(-100,0,100)
B <- c(100,-100)
AB <- c(0,0,0,-100,0,-200)
# create design table
model_data <- tibble()
for(i in 1:length(A)){
for(j in 1:length(B)){
IVA <- i
IVB <- j
DV <- grand_mean+A[i]+B[j]+AB[(i-1)*length(B)+j]
sc_GM <- grand_mean
sc_A <- A[i]
sc_B <- B[j]
sc_AB <- AB[(i-1)*length(B)+j]
row_entry <- tibble(IVA,IVB,DV,
sc_GM,sc_A,sc_B,sc_AB)
model_data <- rbind(model_data,row_entry)
}
}
knitr::kable(model_data)
# generate plots
bar_graph <- ggplot(model_data,
aes(y=DV,
x=as.factor(IVA),
fill=as.factor(IVB)))+
geom_bar(stat='identity', position='dodge')
line_graph <- ggplot(model_data,
aes(y=DV,
x=IVA,
linetype=as.factor(IVB)))+
geom_line()+
geom_point()
(bar_graph/line_graph)
```
## Concept II: Simulated power analysis
The following code block gives an example of simulating data for a 2x2 design that can be used to estimate power (proportion of experiments returning a significant result) for each main effect and interaction.
```{r}
# N per group
N <- 200
A_pvalue <- c()
B_pvalue <- c()
AB_pvalue <- c()
for(i in 1:1000){
IVA <- rep(rep(c("1","2"), each=2),N)
IVB <- rep(rep(c("1","2"), 2),N)
DV <- c(replicate(N,c(rnorm(1,0,1), # means A1B1
rnorm(1,0,1), # means A1B2
rnorm(1,.2,1), # means A2B1
rnorm(1,.2,1) # means A2B2
)))
sim_df <- data.frame(IVA,IVB,DV)
aov_results <- summary(aov(DV~IVA*IVB, sim_df))
A_pvalue[i]<-aov_results[[1]]$`Pr(>F)`[1]
B_pvalue[i]<-aov_results[[1]]$`Pr(>F)`[2]
AB_pvalue[i]<-aov_results[[1]]$`Pr(>F)`[3]
}
length(A_pvalue[A_pvalue<0.05])/1000
length(B_pvalue[B_pvalue<0.05])/1000
length(AB_pvalue[AB_pvalue<0.05])/1000
```
## Lab 8 Generalization Assignment
Note: there was a technical glitch for this video, where I zoomed to close on the RStudio window. Most of the screencast is visible, but sometimes I am working on code that is out of view. As always, the .Rmd files for the solutions are on the github repository for this course, so you can check those out too.
<iframe width="560" height="315" src="https://www.youtube.com/embed/tu-rYkuq1lI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
### Instructions
Your assignment instructions are the following:
1. Work inside the new R project for stats II that you created
2. Create a new R Markdown document called "Lab8.Rmd"
3. Use Lab8.Rmd to show your work attempting to solve the following generalization problems. Commit your work regularly so that it appears on your Github repository.
4. **For each problem, make a note about how much of the problem you believe you can solve independently without help**. For example, if you needed to watch the help video and are unable to solve the problem on your own without copying the answers, then your note would be 0. If you are confident you can complete the problem from scratch completely on your own, your note would be 100. It is OK to have all 0s or 100s anything in between.
5. Submit your github repository link for Lab 8 on blackboard.
### Problems
1. Consider a 2x2 design. Assume the DV is measured from a normal distribution with mean 0, and standard deviation 1. Assume that the main effect of A causes a total shift of .5 standard deviations of the mean between the levels. Assume that level 1 of B is a control, where you expect to measure the standard effect of A. Assume that level 2 of B is an experimental factor intended to reduce the effect of A by .25 standard deviations.
A. create a ggplot2 figure that depicts the expected results from this design (2 points)
Conduct simulation-based power analyses to answer the questions.
B. How many subjects are needed to detect the main effect of A with power = .8? (2 points)
C. How many subjects are needed to detect the interaction effect with power = .8? (2 points)
Bonus point question:
B1. Create a power curve showing how power for the interaction effect in this example is influenced by number of subjects. Choose a range of N from 25 to 800 (per cell) and run a simulation-based power analysis for increments of 25 subjects. Then plot the results using ggplot2 (2 points).
## References