sna-demo.Rmd

---
title: 'Social Network Analysis Demonstration'
author: "The LASER Team"
date: "`r format(Sys.Date(),'%B %e, %Y')`"
output:
  html_document:
    toc: true
    toc_depth: 5
    toc_float: yes
bibliography: lit/references.bib
csl: lit/apa.csl
---

Welcome to the \*social network analysis\* demo! To complete this, click the green arrows to the right of each code chunk.

## 1. Loading, Setting Up

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

In this section, we load packages with the `library()` function and read data using the `read_csv()` function.

- `d` refers to the data we loaded on teachers' relations
- `u` refers to "user"-level data (e.g., on teachers' years of experience)

```{r}
set.seed(0811)

library(tidyverse)
library(tidygraph)
library(ggraph)
library(here)

d <- read_csv(here('data', 'teacher-network-data-relations.csv'))

u <- read_csv(here('data', 'teacher-network-data-users.csv'))
```

#### [**Your Turn**]{style="color: green;"} **⤵**

Run the following two code chunks to take a _glimpse_ at your data. Below, add a few notes on what you _notice_ and _wonder_

```{r}
glimpse(d)
```

```{r}
glimpse(u)
```

What do you _notice_ and/or _wonder_ about this data? Add a note or two below!

- 
- 

## 2. Preparing Data

In this section, we prepare our data to be in edgelist format. 

#### [**Your Turn**]{style="color: green;"} **⤵**

```{r}
d_long <- d %>% 
  pivot_longer(most_helpful_1:most_helpful_3, names_to = "nominee") %>% 
  mutate(nominee_rank = str_sub(nominee, start = -1),
         nominee_rank = as.integer(nominee_rank)) %>% 
  select(-nominee) %>% 
  filter(!is.na(value))

d_long
```
What is different about the edgelist data now? Add one or more observations (you can add additional observations by adding more dashes):

- 
- 

## 3. Creating a Graph 'Object' and Preparing to Visualize the Network

This next step is key in that we use the `tbl_graph()` function to create a network "object"; if that sounds a bit vague, it should! An "object" refers to a type of data in R. Here, it's one that is specific to the packages we are using for social network analysis. 

Here, we create a network object with only the edgelist. 

```{r}
g <- tbl_graph(edges = d_long)

g
```

#### [**Your Turn**]{style="color: green;"} **⤵**

Let's create this object again, but also adding _nodes_ information, or the user-level information we also loaded earlier on.

```{r}
g <- tbl_graph(edges = d_long, nodes = u)

g
```

```{r}
g <- g %>% 
  mutate(popularity = centrality_degree(mode = 'in')) %>% 
  activate("edges") %>% 
  mutate(nominee_rank = as.factor(nominee_rank))

g
```

What do you notice about the `g` network object? How does it appear different from either the edgelist or the user-level data we loaded?

- 
- 

## 4. Creating and Refining a Network Visualization

Perhaps our question has to do with who is most central (and, possibly, the most influential) within our network.

Let's start with a simple visualization of our network using `geom_edge_fan()` and `geom_node_point()`.

```{r}
ggraph(g, layout = 'kk') + 
  geom_edge_fan() +
  geom_node_point() +
  theme_graph()
```

We can enhance this visualization in numerous ways, such as by:

- Sizing the points based on popularity, or in-degree centrality (how many times a person was a nominee)
- Coloring the points by subject
- Changing the hue of the edges based upon the order in which one was nominated

```{r}
ggraph(g, layout = 'kk') + 
  geom_edge_fan(aes(alpha = nominee_rank), 
                arrow = arrow(length = unit(4, 'mm')),
                start_cap = circle(6, 'mm'),
                end_cap = circle(6, 'mm')) + 
  geom_node_point(aes(size = popularity, color = subject)) +
  theme_graph()
```
We might wish instead to size the points by years of experience, as we do below.

```{r}
ggraph(g, layout = 'kk') + 
  geom_edge_fan(aes(alpha = nominee_rank), 
                arrow = arrow(length = unit(4, 'mm')),
                start_cap = circle(6, 'mm'),
                end_cap = circle(6, 'mm')) + 
  geom_node_point(aes(size = years_of_experience, color = subject)) +
  theme_graph()
```

We can use names (if anonymized or otherwise ethically appropriate for our analysis), as below.

```{r}
ggraph(g, layout = 'kk') + 
  geom_edge_fan(aes(alpha = nominee_rank), 
                arrow = arrow(length = unit(4, 'mm')),
                start_cap = circle(6, 'mm'),
                end_cap = circle(6, 'mm')) + 
  geom_node_label(aes(label = name, size = popularity)) +
  theme_graph()
```
Lastly, we can identify sub-groups within our network and use color to indicate which individuals are a part of which sub-groups, as below. 

```{r}
g %>%
  activate(nodes) %>%
  mutate(group = group_spinglass()) %>% 
  ggraph(layout = 'kk') + 
  geom_edge_fan(aes(alpha = nominee_rank), 
                arrow = arrow(length = unit(4, 'mm')),
                start_cap = circle(6, 'mm'),
                end_cap = circle(6, 'mm')) + 
  geom_node_label(aes(label = name, size = popularity, color = as.factor(group))) +
  theme_graph() +
  scale_color_discrete("Group", type = "qual")
```

#### [**Your Turn**]{style="color: green;"} **⤵**

Which visualization is most helpful for understanding who may be influential in the network? Why?

- 
- 

## 5. Calculating Descriptive Statistics

Finally, we can calculate a range of network statistics, as below.

```{r}
g <- g %>% 
  activate("nodes") %>% 
  rename(in_degree_centrality = popularity) %>% 
  mutate(out_degree = centrality_degree(mode = 'out')) %>% 
  mutate(betweenness_centrality = centrality_betweenness()) %>% 
  mutate(centrality_eigen = centrality_eigen())

g %>% 
  as_tibble() %>% 
  skimr::skim()
```

We could also group our data by either years of experience or subject to begin to understand differences in centrality (and, potentially, influence), as below.

```{r}
g %>% 
  as_tibble() %>% 
  group_by(subject) %>% 
  select(-name) %>% 
  summarize(mean_in_degree_centrality = mean(in_degree_centrality),
            sd_in_degree_centrality = sd(in_degree_centrality))

g %>% 
  as_tibble() %>% 
  mutate(high_experience = if_else(years_of_experience > 5, 1, 0)) %>% 
  group_by(high_experience) %>% 
  summarize(mean_in_degree_centrality = mean(in_degree_centrality),
            sd_in_degree_centrality = sd(in_degree_centrality))
```

#### [**Your Turn**]{style="color: green;"} **⤵**

Based on the descriptive statistics, what can we say is associated with an individual being more (or less) central in the network?

- 
-