Skip to content

Graph Convolutional Neural Networks

Rachel Kurchin edited this page Feb 23, 2021 · 2 revisions

Basic Architecture

This is the architecture of a typical graph convolutional neural network, or GCNN. There are three parts:

  1. Convolutional layers: any user-defined number of these in succession
  2. Pooling layers: Because input graphs may have differing numbers of nodes, but we want eventual output to be a standardized length, we need a pooling procedure to get the variable-length output of the convolutional layers to a standardized length. Typically this is via some kind of moving filter which could take (for example) the maxima or averages over each window.
  3. Dense layers: Usually, the last few layers of the network are "standard" densely-connected layers.

AtomicGraphNets.jl provides a model-builder for this architecture called Xie_model, in homage to Tian Xie, the original developer of cgcnn.py. However, there are some differences between how our models and those ones work, particularly in the details of the convolutional operation...

Comparison of cgcnn.py with AtomicGraphNets.jl

The cgcnn.py package was the first major package to implement atomic graph convolutional networks. However, the "convolutional" operation they use, while qualitatively similar, is not convolution by the strict definition involving the graph Laplacian. In their package, they introduce two such operations. Adopting the notation that v represents node features and u edge features; and i, j, k index nodes neighbors of nodes, and edge multiplicities, respectively, the first "simple" operation is (equation 4 in [this paper] (https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301), also available on arXiv here):

where the "circle-plus" indicates concatenation, and g is an activation function. Note that such an operation, which does not make use of the graph Laplacian, requires explicit computation of neighbor lists for every node, and that the convolutional weight matrix is of very large dimension due to the concatenation step. The authors found better performance with a somewhat more complicated operation (equation 5):

where z is a concatenation of neighbor features and edge features and "circle-dot" indicates element-wise multiplication. This operation entails yet more trainable parameters, and neither operation is particularly performant because the concatenation operation must be done at each step of the forward pass. Compare this to the operation implemented in AtomicGraphnets:

where X is a feature matrix, constructed by stacking feature vectors, B is a bias matrix (stacked identical copies of the per-feature bias vector) and nz is the so-called "z-score normalization" or regularized norm operation, which we have found to improve stability. In addition, since the graph Laplacian need only be computed once (and is in fact stored as part of the AtomGraph type), the forward pass is much more computationally efficient. Since no concatenation occurs, weight matrices are also smaller, meaning the model has fewer trainable parameters, and no sacrifice in accuracy that we have been able to observe, indicating comparable expressivity.

It is worth noting that one advantage of the cgcnn.py approach is that it allows for explicitly enumerating edge features. In the current version of AtomicGraphNets, the only "features" of graph edges are the weights. Convolutional operations that allow for edge features are under consideration for future versions.

Clone this wiki locally