forked from laser-institute/aera-workshop
-
Notifications
You must be signed in to change notification settings - Fork 0
/
sna-demo.Rmd
228 lines (170 loc) · 6.4 KB
/
sna-demo.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
---
title: 'Social Network Analysis Demonstration'
author: "The LASER Team"
date: "`r format(Sys.Date(),'%B %e, %Y')`"
output:
html_document:
toc: true
toc_depth: 5
toc_float: yes
bibliography: lit/references.bib
csl: lit/apa.csl
---
Welcome to the \*social network analysis\* demo! To complete this, click the green arrows to the right of each code chunk.
## 1. Loading, Setting Up
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
In this section, we load packages with the `library()` function and read data using the `read_csv()` function.
- `d` refers to the data we loaded on teachers' relations
- `u` refers to "user"-level data (e.g., on teachers' years of experience)
```{r}
set.seed(0811)
library(tidyverse)
library(tidygraph)
library(ggraph)
library(here)
d <- read_csv(here('data', 'teacher-network-data-relations.csv'))
u <- read_csv(here('data', 'teacher-network-data-users.csv'))
```
#### [**Your Turn**]{style="color: green;"} **⤵**
Run the following two code chunks to take a _glimpse_ at your data. Below, add a few notes on what you _notice_ and _wonder_
```{r}
glimpse(d)
```
```{r}
glimpse(u)
```
What do you _notice_ and/or _wonder_ about this data? Add a note or two below!
-
-
## 2. Preparing Data
In this section, we prepare our data to be in edgelist format.
#### [**Your Turn**]{style="color: green;"} **⤵**
```{r}
d_long <- d %>%
pivot_longer(most_helpful_1:most_helpful_3, names_to = "nominee") %>%
mutate(nominee_rank = str_sub(nominee, start = -1),
nominee_rank = as.integer(nominee_rank)) %>%
select(-nominee) %>%
filter(!is.na(value))
d_long
```
What is different about the edgelist data now? Add one or more observations (you can add additional observations by adding more dashes):
-
-
## 3. Creating a Graph 'Object' and Preparing to Visualize the Network
This next step is key in that we use the `tbl_graph()` function to create a network "object"; if that sounds a bit vague, it should! An "object" refers to a type of data in R. Here, it's one that is specific to the packages we are using for social network analysis.
Here, we create a network object with only the edgelist.
```{r}
g <- tbl_graph(edges = d_long)
g
```
#### [**Your Turn**]{style="color: green;"} **⤵**
Let's create this object again, but also adding _nodes_ information, or the user-level information we also loaded earlier on.
```{r}
g <- tbl_graph(edges = d_long, nodes = u)
g
```
```{r}
g <- g %>%
mutate(popularity = centrality_degree(mode = 'in')) %>%
activate("edges") %>%
mutate(nominee_rank = as.factor(nominee_rank))
g
```
What do you notice about the `g` network object? How does it appear different from either the edgelist or the user-level data we loaded?
-
-
## 4. Creating and Refining a Network Visualization
Perhaps our question has to do with who is most central (and, possibly, the most influential) within our network.
Let's start with a simple visualization of our network using `geom_edge_fan()` and `geom_node_point()`.
```{r}
ggraph(g, layout = 'kk') +
geom_edge_fan() +
geom_node_point() +
theme_graph()
```
We can enhance this visualization in numerous ways, such as by:
- Sizing the points based on popularity, or in-degree centrality (how many times a person was a nominee)
- Coloring the points by subject
- Changing the hue of the edges based upon the order in which one was nominated
```{r}
ggraph(g, layout = 'kk') +
geom_edge_fan(aes(alpha = nominee_rank),
arrow = arrow(length = unit(4, 'mm')),
start_cap = circle(6, 'mm'),
end_cap = circle(6, 'mm')) +
geom_node_point(aes(size = popularity, color = subject)) +
theme_graph()
```
We might wish instead to size the points by years of experience, as we do below.
```{r}
ggraph(g, layout = 'kk') +
geom_edge_fan(aes(alpha = nominee_rank),
arrow = arrow(length = unit(4, 'mm')),
start_cap = circle(6, 'mm'),
end_cap = circle(6, 'mm')) +
geom_node_point(aes(size = years_of_experience, color = subject)) +
theme_graph()
```
We can use names (if anonymized or otherwise ethically appropriate for our analysis), as below.
```{r}
ggraph(g, layout = 'kk') +
geom_edge_fan(aes(alpha = nominee_rank),
arrow = arrow(length = unit(4, 'mm')),
start_cap = circle(6, 'mm'),
end_cap = circle(6, 'mm')) +
geom_node_label(aes(label = name, size = popularity)) +
theme_graph()
```
Lastly, we can identify sub-groups within our network and use color to indicate which individuals are a part of which sub-groups, as below.
```{r}
g %>%
activate(nodes) %>%
mutate(group = group_spinglass()) %>%
ggraph(layout = 'kk') +
geom_edge_fan(aes(alpha = nominee_rank),
arrow = arrow(length = unit(4, 'mm')),
start_cap = circle(6, 'mm'),
end_cap = circle(6, 'mm')) +
geom_node_label(aes(label = name, size = popularity, color = as.factor(group))) +
theme_graph() +
scale_color_discrete("Group", type = "qual")
```
#### [**Your Turn**]{style="color: green;"} **⤵**
Which visualization is most helpful for understanding who may be influential in the network? Why?
-
-
## 5. Calculating Descriptive Statistics
Finally, we can calculate a range of network statistics, as below.
```{r}
g <- g %>%
activate("nodes") %>%
rename(in_degree_centrality = popularity) %>%
mutate(out_degree = centrality_degree(mode = 'out')) %>%
mutate(betweenness_centrality = centrality_betweenness()) %>%
mutate(centrality_eigen = centrality_eigen())
g %>%
as_tibble() %>%
skimr::skim()
```
We could also group our data by either years of experience or subject to begin to understand differences in centrality (and, potentially, influence), as below.
```{r}
g %>%
as_tibble() %>%
group_by(subject) %>%
select(-name) %>%
summarize(mean_in_degree_centrality = mean(in_degree_centrality),
sd_in_degree_centrality = sd(in_degree_centrality))
g %>%
as_tibble() %>%
mutate(high_experience = if_else(years_of_experience > 5, 1, 0)) %>%
group_by(high_experience) %>%
summarize(mean_in_degree_centrality = mean(in_degree_centrality),
sd_in_degree_centrality = sd(in_degree_centrality))
```
#### [**Your Turn**]{style="color: green;"} **⤵**
Based on the descriptive statistics, what can we say is associated with an individual being more (or less) central in the network?
-
-