Author: Nihai V. Nayak (March 2020)
- Graph Neural Networks (GNN) is a type of neural network which learns the structure of a graph.
- Learning graph structure allows us to represent the nodes of the graph in the euclidean space which can be useful for several downstream machine learning tasks.
- Recent work on GNN has shown impressive performance on link prediction（链接预测）, graph classification（图分类）and semi-supervised tasks (Hamil- ton et al., 2017b).
- Graphs consist of a set of nodes V and edges E. fig.1
- They are normally represented with an unweighted adjacency matrix(邻接矩阵) A, which consists of 1s and 0s.
- The degree matrix(度矩阵) is a diagonal matrix （对角矩阵）that contains information about the degree of each node.
- the degree matrix for fig. 1 is as follows:
- from Figure 1 the neighbours of the node b are: where N(.) is the neighbour function. Likewise, the neighbor of the node a is:
- Therefore, we need a neural network that can deal with the varying number of neighbors.
03 Learning on Graphs
A basic GNN layer consists of two steps：
- AGGREGATE：For a given node, ==the AGGREGATE step applies a permutation invariant function to its neighbors to generate the aggregated（汇总的） node feature==
- COMBINE: For a given node, ==COMBINE step passes the aggregated node feature to a learnable layer to generate node embedding for the GNN layer==
The GNN layers are stacked together（堆叠在一起） to increase the receptive field of clustering. In Fig. 2 we have 2 GNN layers. This means that for every node, we aggregate over the 2-hop neighborhood.
AGGREGATE and COMBINE steps are sometimes called message passing phase and readout phase （读出）in the message passing neural network framework (Gilmer et al., 2017).
03.1 Formal Definition
- Consider the Graph G = (V,E), where V is the set of node with features Xv. There are two main steps - AGGREGATE and COMBINE at each l-th layer of the GNN:
04 Graph Convolutional Networks （GCN）
- GCN (Kipf and Welling, 2016) is a graph neural network technique that makes use of the symmetrically normalized graph laplacian（对称归一化图拉普拉斯算子） to compute the node embeddings.
- Each GCN block receives node features from the (l−1)th GCN block, i.e. the node features from the previous layer is passed to the next GCN layer, which allows for greater laplacian smoothing（拉普拉斯平滑） (Li et al., 2018).
Since we don’t always have access to the entire graph structure to compute the symmetrically normalized graph laplacian, GCN Aggregator has been interpreted as a Mean Aggregator（均值聚合器） (Xu et al., 2018a). Therefore, For each node v, the algorithm takes the mean of the neighborhood node features along with its feature (self-loop) to compute the aggregated node feature for the l-th layer. This formulation makes GCN an inductive graph neural network（归纳图神经网络）.
The aggregated node feature av is multiplied with a learnable weight matrix followed by a non-linear activation (ReLU) to get the node feature hv for the lth layer,
- Graphsage (Hamilton et al., 2017a) is a spatial graph neural network algorithm （空间图神经网络算法）that learns node embeddings by sampling and aggregating neighbors from multiple hops.
- Graphsage, unlike GCN, is an inductive learning algorithm（归纳学习算法） which means that ==it does not require the entire graph structure during learning==.
- In the original Graphsage paper, the authors describe multiple aggregators-Mean, Pooling and LSTM Aggregator.
- In this section, we’ll be describing the Pool Aggregator and the combine from the original paper.
- For simplicity, ==we do not sample the neighborhood nodes as well.==
- Graphsage makes use of a max-pooling aggregator to compute an aggregated node feature.
- All the neighborhood node features are passed through a linear layer.
- We then pass these features through a non-linear activation before applying max filter. Formally,
06 Graph Attention Network（GAT）
- Graph Attention Network (Veliˇckovi ́c et al., 2018) is a spatial graph neural network technique that uses ==Self Attention to aggregate the neighborhood node features.==
- Self Attention is equivalent to computing a weighted mean of the neighbourhood node features.
- The original paper uses K masked self attention modules to aggregate node features. For simplicity, we consider only one self attention block.
07 Related Works
- Apart from GCN, Graphsage, and GAT, there are few more works that frequently appear in the literature.
- SGN (Wu et al., 2019) simplifies GCN by removing non-linearity between the GCN layers. Furthermore, to achieve the same “receptive field”（感受野） of having K GCN layers, they take the graph laplacian to the power of K and show equivalent performance.
- Xu et al. (2018b) introduced jumping knowledge networks that aggregate node features generated from all the previous layers to compute the final vector using mean or max-pooling. This allows the model to learn better structure-aware representations（结构感知表示） from different neighborhood aggregations. The works described so far have not taken advantage of the relations in the graph.
- R-GCN (Schlichtkrull et al., 2018) and Directed-GCN (Marcheggiani and Titov, 2017) extend GCN to relational graphs.
- Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. (2017). Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1263–1272. JMLR. org.
- Hamilton, W., Ying, Z., and Leskovec, J. (2017a). Inductive representation learning on large graphs. In Advances in Neural Information Processing Sys- tems, pages 1024–1034.
- Hamilton, W. L., Ying, R., and Leskovec, J. (2017b). Representation learning on graphs: Methods and applications. IEEE Data Eng. Bull., 40:52–74.
- Kipf, T. N. and Welling, M. (2016). Semi-supervised classification with graph convolutional networks.
- Li, Q., Han, Z., and Wu, X.-M. (2018). Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence.
- Marcheggiani, D. and Titov, I. (2017). Encoding sentences with graph convolu- tional networks for semantic role labeling. arXiv preprint arXiv:1703.04826.
- Schlichtkrull, M., Kipf, T. N., Bloem, P., Van Den Berg, R., Titov, I., and Welling, M. (2018). Modeling relational data with graph convolutional net- works. In European Semantic Web Conference, pages 593–607.
- Springer. Veliˇckovi ́c, P., Cucurull, G., Casanova, A., Romero, A., Li`o, P., and Bengio, Y. (2018). Graph Attention Networks. International Conference on Learning Representations.
- Wu, F., Souza, A., Zhang, T., Fifty, C., Yu, T., and Weinberger, K. (2019). Simplifying graph convolutional networks. In International Conference on Machine Learning, pages 6861–6871.
- Xu, K., Hu, W., Leskovec, J., and Jegelka, S. (2018a). How powerful are graph neural networks?
- Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K.-i., and Jegelka, S. (2018b). Representation learning on graphs with jumping knowledge net- works. arXiv preprint arXiv:1806.03536.