-
Notifications
You must be signed in to change notification settings - Fork 2
/
sgm.Rmd
137 lines (97 loc) · 5.29 KB
/
sgm.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: "Seeded Graph Matching"
author: "Ronak Mehta"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{SGM}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, warning=FALSE, message=FALSE}
require(graphstats)
require(igraph)
require(ggplot2)
```
# Background
Given two networks that we believe to have some correspondence, we may want to match the nodes of one network to those of the other; that is, we want to find the bijection between the vertex sets that minimizes the number of edge disagreements across the graph. An edge disagreement occurs when two vertices are adjacent on in one graph but their corresponding vertices in the other are not.
# Example 1: A Known Correspondence
Here, we produce a a simple random graph (adjacency matrix) A with n = 10 and p = 0.5. We then produce B by maintaining the first m = 3 rows (the seeds), and permuting the remaining n - m rows. The `sgm` function can be called in two ways.
* Unordered: Call `sgm` and supply `seeds` - an m by 2 matrix that contains the indices of the seeds in A and the indices of their corresponding nodes in B.
* Ordered: Call `sgm.ordered` and supply `m` - the number of seeds. It is assumed that the first m nodes in each adjacency matrix are the matched seeds, in order. We also supply `start` - a square n-m by n-m matrix that is the initial value of the permutation matrix for the non-seeds.
```{r, fig.width=4.5, fig.height=4}
# Number of seeds, total nubmer of vertices, and edge probability.
m <- 3
n <- 10
p <- 0.5
set.seed(12345)
# Sample matrix A, and permute to .
A <- matrix(as_adj(sample_sbm(n, p, n)), nrow = n)
B <- A[c(1:m, sample(n-m)+m),]
# P is the permutation matrix that will turn match B to A.
# Unordered call.
seeds <- matrix(cbind(1:m, 1:m), nrow = m)
P <- sgm(A, B, seeds)
# Ordered call.
start <- diag(n-m)[sample(n-m),]
P <- sgm.ordered(A, B, m, start)
# Display.
gs.plot.plot_matrix(P, title="Permutation Matrix", legend.name="Entries", ffactor = TRUE)
```
Here, we have the first m nodes along the diagonal, unpermuted. (NOTE: The matrix heatmaps in this demo have the diagonal ranging from the bottom left to the rop right.) By applying the returned matrix to B, we can assess the quality of the matching.
```{r, fig.width=4.5, fig.height=4}
# Apply permutation matrix.
B_matched <- P %*% B %*% t(P)
# Visualize matched graphs.
gs.plot.plot_matrix(A, title="Original A Matrix", legend.name="Entries", ffactor = TRUE)
gs.plot.plot_matrix(B_matched, title="Matched B Matrix", legend.name="Entries", ffactor = TRUE)
gs.plot.plot_matrix(B, title="Original B Matrix", legend.name="Entries", ffactor = TRUE)
```
We can see the similarity between A and the matched B, and the dissimilarity between the original and matched B. Below, we mark the edge disagreements of our approximate solution.
```{r, fig.width=5, fig.height=4}
gs.plot.plot_matrix(abs(A-B_matched), title="Disagreements", legend.name="Disagreement", ffactor = TRUE)
```
The overall connectivity of A is very closely recovered.
# Example 2: r-Correlated Stochastic Blockmodel
Here, we sample (A, B) from an r-Correlated SBM via a random dot-product graph, permute the matrix like before, and observe the results of Seeded Graph Matching.
```{r, fig.width=4.5, fig.height=4}
# Define latent vectors for 2-SBM RDPG in X.
n <- 10
X1 <- matrix(rep(c(0.85, 0), n/2), nrow = n/2, byrow = TRUE)
X2 <- matrix(rep(c(0.3,0.8), n/2), nrow = n/2, byrow = TRUE)
X <- rbind(X1, X2)
set.seed(6789)
# Pearson correlation coefficient.
r <- 0.75
# Sample r-SBM.
sampled_graphs <- rdpg.sample.correlated(X, r)
A <- as_adj(sampled_graphs[[1]], sparse = FALSE)
B <- as_adj(sampled_graphs[[2]], sparse = FALSE)
# Display overlap.
gs.plot.plot_matrix(A + B, title="A + B (Overlap)", legend.name="A_ij + B_ij")
```
These graphs are not exactly equal, but have high correlation. Their matrix addition, pictured above, has many 2 and 0 entries, and few 1 entries. We now choose the first m/2 = 1 nodes from each block of each graph to be the seeds.
```{r, fig.width=4.5, fig.height=4}
# Identify seeds.
m <- 2
seed_indices <- c(1:(m/2), 1:(m/2)+n/2)
seeds <- matrix(c(seed_indices, seed_indices), nrow = m)
# Set up permutation.
block1_permutation <- sample(n/2 - m/2) + m/2
block2_permutation <- sample(n/2 - m/2) + m/2 + n/2
B_permutation <- c(1:(m/2), block1_permutation, 1:(m/2)+n/2, block2_permutation)
# Redefine B matrix and match using unordered sgm.
B_p <- B[B_permutation,]
P <- sgm(A, B_p, seeds)
# Apply permutation matrix.
B_matched <- P %*% B_p %*% t(P)
# Display.
gs.plot.plot_matrix(A, title="Original A Matrix", legend.name="Entries", ffactor = TRUE)
gs.plot.plot_matrix(B_matched, title="Matched B Matrix", legend.name="Entries", ffactor = TRUE)
gs.plot.plot_matrix(B_p, title="Original B Matrix", legend.name="Entries", ffactor = TRUE)
```
As before, we mark the edge disagreements below.
```{r, fig.width=5, fig.height=4}
gs.plot.plot_matrix(abs(A-B_matched), title="Disagreements", legend.name="Disagreement", ffactor = TRUE)
```
While we allow more disagreements between A and matched B due to the random nature of the SBM, the within and between block structure of A is mirrored in B after seeded graph matching.