-
Notifications
You must be signed in to change notification settings - Fork 0
/
final_project_solutions.Rmd
245 lines (181 loc) · 10.2 KB
/
final_project_solutions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
---
title: "final_project_solutions"
---
# Part 1
## Part 1.1: Loading the data
```{r echo=TRUE, message=FALSE, warning=FALSE}
library(tidyverse)
# define the data folder
data_folder = '~/Dropbox/teaching/r-course22/data/final_project/'
# define lists of files
list_files = paste(data_folder, list.files(path=data_folder), sep="")
# apply the read_delim function to each element in the list
data_list = lapply(list_files, function(x) read_delim(x, delim='\t'))
# loop through all the participants to add the participant number as a column as well as the name of the file, since we have no other way to know in which conditions the participants were in
data = NULL
participant = 0
for (n in 1:(length(data_list))){
participant = participant + 1
cond_list = strsplit(list_files[n], "_")
condition = paste(cond_list[[1]][3], "_", cond_list[[1]][4], sep="")
data=rbind(data,cbind(participant=participant, condition=condition, data_list[[n]]))
}
```
```{r}
head(data)
```
As you can see, there are a few things to fix to make the dataset look a bit cleaner. It's on you now to do the rest. The final data should consist of the following columns:
- participant
- block_number
- trial_number
- condition (use the labels in the description above)
- percent_coherence
- dots_direction
- response
- accuracy
- rt (change them to seconds, which means dividing them by 1000)
The columns in the dataset that are not in this list you should remove. The labels of the column that are different you should change to the ones below.
```{r}
data = rename(data,
block_number = blkNum,
trial_number = trlNum,
percent_coherence = percentCoherence,
dots_direction = winningDirection,
accuracy = correct,
rt = RT)
data = mutate(data,
condition = recode(condition,
Norm_Trial = "FixedTrial_LowInformation",
Info_Trial = "FixedTrial_MediumInformation",
Optim_Trial = "FixedTrial_HighInformation",
Norm_Time = "FixedTime_LowInformation",
Info_Time = "FixedTime_MediumInformation",
Optim_Time = "FixedTime_HighInformation"),
rt = rt/1000)
data = select(data, -c("coherentDots", "numberofDots", "eventCount", "averageFrameRate"))
head(data)
```
## Part 1.2: Describing the data
- How many participants are in the dataset?
```{r}
summarize(data, n_distinct(participant))
```
- How many participants were there in each condition?
```{r}
summarize(group_by(data, condition), n_distinct(participant))
```
- Did the participants in the fixed time conditions perform more or less trials than the participants in the fixed trials conditions?
```{r}
data = mutate(data,
blocktype = recode(condition,
FixedTrial_LowInformation = "FixedTrial",
FixedTrial_MediumInformation = "FixedTrial",
FixedTrial_HighInformation = "FixedTrial",
FixedTime_LowInformation = "FixedTime",
FixedTime_MediumInformation = "FixedTime",
FixedTime_HighInformation = "FixedTime"))
trials_blocktype = left_join(summarize(group_by(data, participant), n_trials=n()),
distinct(data, participant, blocktype),
by="participant")
summarize(group_by(trials_blocktype, blocktype), mean=mean(n_trials))
```
- Calculate a summary, which includes the average and SD of accuracy and response time per condition (which are 6 in total) and use this to describe the overall performance across participants in these conditions (i.e., which conditions produced hiigher accuracy, which produced faster responses? Do accuracy and speed always trade off?).
```{r}
summarize(group_by(data, condition), meanRT=mean(rt), meanACC=mean(accuracy))
```
- Calculate a summary, which includes the average accuracy and average response time per participant, as well as the percentage of trials below 150 ms (too fast trials) and above 5000 ms (too slow trials). Are there any participants with more then 10% fast or slow trials?
```{r}
data = mutate(data,
slow_trials = case_when(
rt >= 5 ~ 1,
rt < 5 ~ 0),
fast_trials = case_when(
rt <= .15 ~ 1,
rt > .15 ~ 0))
participant_performance = summarize(group_by(data, participant),
meanRT=mean(rt),
meanACC=mean(accuracy),
perc_slow_trials = mean(slow_trials)*100,
perc_fast_trials = mean(fast_trials)*100)
filter(participant_performance, perc_slow_trials >= 10)
filter(participant_performance, perc_fast_trials >= 10)
```
## Part 1.3: Exclude participants
Exclude from the dataset the participants that have less than 60% accuracy. The trials with a response time less than 150 ms or greater than 5000 ms should be also excluded. Check that the accuracy variable only contains 0 and 1.
```{r}
participants_to_exclude = filter(participant_performance, meanACC < .6)$participant
for (n in 1:length(participants_to_exclude)) {
data = data %>%
filter(participant != participants_to_exclude[n])
}
```
## Part 1.4: Data visualization
Now it's time to visualize the data set.
- First, we want to have a look at how the accuracy and response time evolve across the blocks. For this purpose, you should make a 2-by-1 grid plot that depicts the response time (top panel) and accuracy (lower panel) across the blocks for each condition. This should recreate Figure 2 fromt he paper: https://link.springer.com/article/10.3758/s13423-016-1135-1/figures/2 (excluding the top panel). What trends do you observe?
```{r}
grouped_data = group_by(data, condition, block_number)
data_summary = summarise(grouped_data, meanRT = mean(rt), meanACC = mean(accuracy))
ggplot(data = data_summary, mapping = aes(x = block_number, y = meanACC, color=condition)) +
geom_line()
ggplot(data = data_summary, mapping = aes(x = block_number, y = meanRT, color=condition)) +
geom_line()
```
- Second, we want to make two separate plots, one for accuracy and one for response times, which show the average performance iin the 6 conditions. The condition should be plotted in the x-axis and the performance (either accuracy or rt) in the y-axis. You can choose whether you would like to have bar plots or point plots. Add error bars representing confidence intervals.
```{r}
ggplot(data = data, mapping = aes(x = condition, y = accuracy)) +
stat_summary(fun = "mean", geom="point", size=3) +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", size=1, width=.2) +
theme(axis.text.x = element_text(angle = 90))
ggplot(data = data, mapping = aes(x = condition, y = rt)) +
stat_summary(fun = "mean", geom="point", size=3) +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", size=1, width=.2) +
theme(axis.text.x = element_text(angle = 90))
```
Alternatively:
```{r}
data = mutate(data,
trialtype = recode(condition,
FixedTrial_LowInformation = "LowInformation",
FixedTrial_MediumInformation = "MediumInformation",
FixedTrial_HighInformation = "HighInformation",
FixedTime_LowInformation = "LowInformation",
FixedTime_MediumInformation = "MediumInformation",
FixedTime_HighInformation = "HighInformation"))
ggplot(data = data, mapping = aes(x = trialtype, y = accuracy, color=blocktype)) +
stat_summary(fun = "mean", geom="point", size=3) +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", size=1, width=.2) +
theme(axis.text.x = element_text(angle = 90))
ggplot(data = data, mapping = aes(x = trialtype, y = rt, color=blocktype)) +
stat_summary(fun = "mean", geom="point", size=3) +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", size=1, width=.2) +
theme(axis.text.x = element_text(angle = 90))
```
## Part 1.5 Calculate the reward rate (option to do this for next week together with part 2)
In the paper, they define the reward rate as:
$\frac {PC}{MRT + ITI + FDT + (1-PC)*ET}$
where MRT and PC refer to the mean correct response time and probability of a correct response, ITI is the inter-trial interval (i.e., 100 ms), FDT is the feedback display time (i.e., 300 ms), and ET is the error timeout (i.e., 500 ms).
By calculating the MRT and PC per block per participant (i.e., create a summary using `participant` and `block_number` as grouping variables), you can calculate the `reward_rate` by filling in the rest of the equation above.
When you are done, you can add the reward rate to the plot in Part 1.4 (so recreate the full Figure 2 from the paper), and also plot the mean reward rate per condition, as you did in Part 1.4 for accuracy and RT.
```{r}
grouped_data_pos = group_by(filter(data, accuracy > 0), participant, block_number)
grouped_data = group_by(data, participant, block_number)
data_summary = full_join(summarise(grouped_data_pos, MRT = mean(rt)),
summarise(grouped_data, PC = mean(accuracy)))
data_summary = full_join(data_summary,
distinct(data, participant, block_number, condition, blocktype, trialtype))
data_summary = mutate(data_summary,
reward_rate = PC/(MRT + .1 + .3 + (1 - PC)*.5))
```
```{r}
grouped_data_summary = group_by(data_summary, condition, block_number)
data_summary_summary = summarise(grouped_data_summary, meanRR = mean(reward_rate))
ggplot(data = data_summary_summary, mapping = aes(x = block_number, y = meanRR, color=condition)) +
geom_line()
```
```{r}
ggplot(data = data_summary, mapping = aes(x = trialtype, y = reward_rate, color=blocktype)) +
stat_summary(fun = "mean", geom="point", size=3) +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", size=1, width=.2) +
theme(axis.text.x = element_text(angle = 90))
```
# Part 2: Multi-level regression and ANOVA