A Tutorial On Graph Neural Networks - GitHub Pages

Transcription

A Tutorial on Graph Neural NetworksGraph Convolution, Attention and SAmple and aggreGatEZhiming Xuzhimingxu@smail.nju.edu.cnDepartment of ComputingThe Hong Kong Polytechnic UniversityOctober 15, 2020Data Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 20201 / 22

OverviewIntroductionGraph Convolutional NetworksGraphSAGEGraph Attention NetworkData Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 20202 / 22

RecapI GraphA data structure consists of Vertices1 and Edges. Denoted by set Vand E, respectively, a graph G (V, E).I Neural NetworksAn interconnected group of neurons performing a series ofcomputations.(a) A graph with six vertices andeight edges.(b) A neural network with onehidden layer.Figure 2: Example of graph and neural network.1 Theword ”node” and ”vertex” are used interchangeably in this tutorial.Data Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 20203 / 22

Graph Neural Networks (GNNs)I A type of neural networks operating directly on graphs [1].I To learn a state representation which contains information of eachvertex’s neighborhood.I Notations in this tutorialNotationRma, a, AAXDIN h, HWσ, · , ·k·Descriptionm-dimensional Euclidean spacescalar, vector, matrixadjacent matrix(node) feature matrixPdegree matrix, Dii j AijN -dimensional identity matrixlearned hidden vector, matrixneural network weight matrixnon-linear activation, transpose, concatenationTable 1: Notations used in this tutorialData Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 20204 / 22

The Operation of ConvolutionI ConvolutionAn operation on two functions f and g that produces a thirdfunction f ? g.I Convolutional Neural Network (CNN)Neural networks with the operation of convolution, usually used onimages where g is grid and f is called filter.I Convolution on GraphsGraphs are not as regular as grids. New methods are needed togeneralize convolution to them.(a) An example of 2D convolution.Data Exploration & Extracting Lab @ PolyUGNN Tutorial(b) Convolution on graphs?October 15, 20205 / 22

Generalize Convolution to GraphsI Spectral convolutions on graphs with signal x Rn in the Fourierdomaingθ ? x U gθ U x(1)where111. Normalized graph Laplacian L IN D 2 AD 2 U ΛU 2. U is the matrix of eigenvectors of normalized graph Laplacian3. U x is the Fourier transformation on x4. gθ is the spectral convolutional filter. Can be seen as a functiongθ (Λ) on eigenvalues of LI Equation 1 is computationally expensive and thus needed an efficientapproximation.Data Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 20206 / 22

Generalize Convolution to Graphs (cont.)Approximations1. K th order Chebyshev polynomialgθ0 (Λ) KXθk0 Tk (Λ̃)(2)k 02. In Equation 1, substitute gθ with Equation 2gθ0 ? x KXθk0 Tk (L̃) x, L̃ k 02L INλmax(3)3. Limit order K to 1, round λmax to 2, and reduce parameters11gθ0 ? x θ00 x θ10 D 2 AD 2 x 11 θ IN D 2 AD 2 xData Exploration & Extracting Lab @ PolyUGNN Tutorial(4)October 15, 20207 / 22

Generalize Convolution to Graphs (cont.)Renormalization11I In Equation 4, the IN D 2 AD 2 term’s eigenvalues are in [0, 2].Stacking layers with this operation might cause vanishing/explodinggradients.I The renormalization trick is thus introduced to alleviate this problem1111IN D 2 AD 2 D̃ 2 ÃD̃ 2where à A IN , D̃ is Ã’s degree matrixData Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 20208 / 22

Graph Convolutional Network (GCN)Fast Approximate Graph ConvolutionI Generalize to vector signal nodes 1111θ D̃ 2 ÃD̃ 2 x D̃ 2 ÃD̃ 2 XΘI Propagation ruleMulti-layer Graph Convolution Network [2] 11H (l 1) σ D̃ 2 ÃD̃ 2 H (l) W (l) , H (0) X11I Two-layer example (Calculate  D̃ 2 ÃD̃ 2 in advance) Z f (X, A) softmax  ReLU ÂXW (0) W (1)Data Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 20209 / 22

Graph Representation LearningGoalI Distill high-dimensional information and reduce it to a dense vector.I Low-dimensional vector embeddings of nodes in large graphs arevery useful in various downstream tasks.Problem with Using GCNsI Whole graph is large and computationally prohibitive. Mini-batch isslow to train and hard to converge.I Full graph is needed, training in a transductive way.I Difficult to use on real-world dynamic graphs.Data Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 202010 / 22

Graph SAmple and aggreGatE (GraphSAGE)Sample neighborhood and aggregate the information.Figure 4: Illustration of GraphSAGE forward propagation.22 http://snap.stanford.edu/graphsage/Data Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 202011 / 22

GraphSAGE Embedding GenerationGraphSAGE Forward Propagation [3]Result: Node i’s representation zi after K iterations h0 hi , i V;ifor k 1 . . . K dofor i V do hk AGGREGATEk { hj , j Ni } ;Ni hi hk σ W k · hk 1 k hk;iNiiend hk i hki, i khki k2 V;end zi hKi , i VData Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 202012 / 22

Parameter Learning of GraphSAGEGraph-Based Loss Function [3] hj h hv Q·Elogσ LG ( hi ) log σ h v Piiiin(i)I j is a node that co-occurs near i on fixed-length random walk.1I σ is the sigmoid function, σ(x) 1 exp( x)I Pn is a negative sampling distribution, Q is # of negative samples.Based on loss LG , the parameters in Algorithm 1 are optimized withstochastic gradient descend.Data Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 202013 / 22

Choice of Aggregator FunctionsI Mean Aggregatoro no n hk σ W · MEAN hk 1 hk 1 , j NiijiI LSTM Aggregator no hk LSTM π hj , j NiiI Pooling Aggregator hk maxiData Exploration & Extracting Lab @ PolyU n o σ Wpool hkj b , j NiGNN TutorialOctober 15, 202014 / 22

Attention MechanismI Attention mechanism achieves great successes in sequence-basedtasks.I They can be used to deal with variable size inputs, and focus on themost relevant parts by assigning different weights.I Attention used on a single sequence is called self-attention.(a) Self-attention(b) Sentence pair attentionFigure 5: Attention visualization, generated by bertvizData Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 202015 / 22

Graph Attentional Layer(Self-)Attention Mechanism on GraphsnoI Input: Set of node features H h1 , h2 , · · · , hN , hi Rd . N isthe number of nodes and d is the feature dimension.I Output:featuresn A new set of nodeo000 00 H h1 , h2 , · · · , hN , hi Rd .I Attention a : Rd Rd R with weight matrix Weij a(W hi , W hj )eij is called attention coefficients.Data Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 202016 / 22

Graph Attentional Layer (cont.)Details of Attention aI Masked: Calculate eij for j Ni , i.e., i’s neighborhood.I Normalized: Use softmax to normalize across all j’sexp(eij )k Ni exp(eik )αij softmaxj (eij ) PI Attention a’s implementation exp LeakyReLU( a [W hi W hj ]) αij P [W hj ])expLeakyReLU( ah Wik Ni LeakyReLU Data Exploration & Extracting Lab @ PolyUα · x,x,x 0x 0GNN TutorialOctober 15, 202017 / 22

Graph Attentional Layer (cont.)Attention Acts on Hidden RepresentationsI Linear combination and activation X h0 σ αij W hj ij NiI Multi-head attention Concatenation h0iK k 1σ Xk αij W hj j Ni Average K XX1k σ αij W hj Kj N h0ik 1Data Exploration & Extracting Lab @ PolyUGNN TutorialiOctober 15, 202018 / 22

Graph Attention Network (GAT)Graph Attention Network Propagation Rule and Illustration [4] h0 σ iXj Ni hi exp LeakyReLU a W hi W hj hi W hj P a W hi W hjk Ni exp LeakyReLU Figure 6: Left: Attention mechanism a. Right: Multi-head attention on agraph.Data Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 202019 / 22

Thank You for Your AttentionQ&AData Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 202020 / 22

References[1]F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, andG. Monfardini, “The graph neural network model,” IEEE Trans.Neural Networks, vol. 20, no. 1, pp. 61–80, 2009. doi:10.1109/TNN.2008.2005605. [Online]. .[2] T. N. Kipf and M. Welling, “Semi-supervised classification withgraph convolutional networks,” in 5th International Conference onLearning Representations, ICLR 2017, Toulon, France, April 24-26,2017, Conference Track Proceedings, OpenReview.net, 2017.[Online]. Available:https://openreview.net/forum?id SJU4ayYgl.Data Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 202021 / 22

References (cont.)[3]W. L. Hamilton, Z. Ying, and J. Leskovec, “Inductive representationlearning on large graphs,” in Advances in Neural InformationProcessing Systems 30: Annual Conference on Neural InformationProcessing Systems 2017, 4-9 December 2017, Long Beach, CA,USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach,R. Fergus, S. V. N. Vishwanathan, and R. Garnett, Eds., 2017,pp. 1024–1034. [Online]. verepresentation-learning-on-large-graphs.[4] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, andY. Bengio, “Graph attention networks,” in 6th InternationalConference on Learning Representations, ICLR 2018, Vancouver,BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings,OpenReview.net, 2018. [Online]. Available:https://openreview.net/forum?id rJXMpikCZ.Data Exploration & Extracting Lab @ PolyUGNN TutorialOctober 15, 202022 / 22

(b)A neural network with one hidden layer. Figure 2:Example of graph and neural network. 1The word "node" and "vertex" are used interchangeably in this tutorial. Data Exploration & Extracting Lab @ Po