Skip to content

Basic Graph Theory

Rachel Kurchin edited this page Jan 4, 2021 · 6 revisions

This page introduces the bare minimum graph theory concepts to understand the math behind graph convolution. It is NOT intended as an exhaustive resource, but rather as a practical jumpstart.

What is a graph?

A graph is an object described by a set of nodes and a set of edges that connect them:

This is a useful data structure for all kinds of things, such as bridges in German cities, social networks, and many other applications. In this package, we're interested in using graphs to represent structures made of atoms such as crystals and molecules. Think of those ball-and-stick models from high school chemistry class.

So how do we represent graphs in a computer so that we can do things like machine learning on them? We turn them into matrices! Read on, dear friend...

Adjacency matrix

First, we assign numbers to each node, like so...

The first type of matrix (and, I would argue, the core one, as it has all the information used to build the other ones we will discuss) is the adjacency matrix. It is a square matrix with a size equal to the number of nodes. The entry at (i,j) is 1 if node i is connected to node j, and zero otherwise:

Note that a node can also be connected to itself (a self-loop):

A few potential complications to this idea...

Weighted graphs

If, rather than only 1's or 0's, we allow a continuum of values in the adjacency matrix, this is known as a weighted graph. AtomGraph objects as defined in ChemistryFeaturization.jl ARE weighted graphs, and the weights are scaled (without loss of generality) such that the largest weight is equal to 1. For more on how weights are calculated, see the ChemistryFeaturization.jl documentation.

Directed graphs

As introduced above, the adjacency matrix is symmetric (entry (i,j) is equal to entry (j,i)). While this IS always the case in the atomic graphs we work with, it need not be. A graph with a nonsymmetric adjacency matrix is known as a directed graph.

Degree matrix

The degree matrix of a graph is a diagonal matrix that describes how many edge terminations are at each node. So for our self-looped graph above, the degree matrix is:

In a weighted graph, the degree matrix is constructed by summing the weights rather than just counting the nonzero entries in that row/column of the adjacency matrix.

Graph Laplacian

Okay, now we have all the pieces we need to understand the centerpiece of graph convolution, the Laplacian matrix. It is defined as the difference of the degree and adjacency matrices:

So what's so special about this matrix? It seems like a pretty arbitrary definition at first glance. Well, the name should be fairly suggestive, as it's closely tied to the differential operator that comes up in, for example, the diffusion equation. The graph laplacian matrix as an operator is, in fact, diffusional in nature. To get a sense for this, let's define a simple graph signal, just a single number at each node of the graph, and we'll start with a "delta spike" on node 5 and zeros everywhere else. Watch what happens when we operate on that signal with the Laplacian a couple times...

If you want a deeper dive into the magic of the graph Laplacian, this is a fun read! But the important thing to know is that this, in the most general sense, is how we do graph convolution! In real applications, the "graph signal" is a feature matrix rather than a vector, as we'll generally have a vector of features for each node, and these get stacked to form the matrix. You can convince yourself with some basic linear algebra that the result of this is the same as if you applied the convolutional operation (i.e., multiplying by the Laplacian) to a bunch of vectors individually and stacked the results.

Remarks

A few other miscellaneous comments that might be useful to understand...

Equivariance to node indexing

We made a pretty arbitrary decision when assigning the numbers to the nodes of the graph up at the top. A comforting property of all we've done is that the results of convolution (i.e., the feature vector at each node) don't depend on how we assign those numbers! The adjacency, degree, Laplacian, and feature matrices will get rejiggered under node reindexing in just the right way for everything to work out! Don't believe me? Try out the example above with different node labels!

Normalized Laplacian

A not so nice aspect of the graph Laplacian as we've defined it is that the magnitude of the graph signal can change upon repeated convolutions. In practice, we used the normalized Laplacian, which we compute using the inverse square root degree matrix (you can read more here). This keeps things a bit more well-behaved.

How is this related to image convolution?

If you're familiar with image convolution (if not, there's a nice interactive tutorial here), it might not seem immediately obvious how this procedure is related. It turns out this is exactly the same thing, but in typical image convolution, we've imposed a very rigid structure on the underlying graph, namely that every node (pixel) has exactly the same sort of local structure (Cartesian neighbors). Graph convolution allows for a more generalized notion of which "pixels" are "neighbors."