A Gentle Introduction To Graph Neural Networks - AiFrenz

Transcription

A gentle introduction to graph neural networksSeongok Ryu, Department of Chemistry @ KAIST

Motivation Graph Neural Networks Applications of Graph Neural Networks

Motivation

Non-Euclidean data structureSuccessful deep learning architectures1. Convolutional neural networks

Non-Euclidean data structureSuccessful deep learning architectures1. Convolutional neural networks

Non-Euclidean data structureSuccessful deep learning architectures1. Convolutional neural 7b852

Non-Euclidean data structureSuccessful deep learning architectures1. Convolutional neural networksLee, Jason, Kyunghyun Cho, and Thomas Hofmann."Fully character-level neural machine translation without explicit segmentation." arXiv preprint arXiv:1610.03017 (2016).

Non-Euclidean data structureSuccessful deep learning architectures2. Recurrent neural network

Non-Euclidean data structureSuccessful deep learning architectures2. Recurrent neural network

Non-Euclidean data structureSuccessful deep learning architectures2. Recurrent neural networkSentiment analysis

Non-Euclidean data structureSuccessful deep learning architectures2. Recurrent neural networkNeural machine translation

Non-Euclidean data structureData structures of shown examples are regular.Image โ€“ values on pixels (grids)Sentence โ€“ sequential structure

Non-Euclidean data structureHOWEVER, there are lots of irregular data structure, Social Graph(Facebook, Wikipedia)3D MeshAll you need is GRAPH!Molecular Graph

Graph structure๐‘ฎ๐’“๐’‚๐’‘๐’‰ ๐‘ฎ(๐‘ฟ, ๐‘จ)X : Node, Vertex- Individual person in a social network- Atoms in a moleculeRepresent elements of a system

Graph structure๐‘ฎ๐’“๐’‚๐’‘๐’‰ ๐‘ฎ(๐‘ฟ, ๐‘จ)A : Adjacency matrix- Edges of a graph- Connectivity, RelationshipRepresent relationship or interaction between elements of the system

Graph structureMore detail, Battaglia, Peter W., et al."Relational inductive biases, deep learning, and graph networks." arXiv preprint arXiv:1806.01261 (2018).

Graph structureAriana Grande 25 years old Female American Singer Node featuresDonald Trump 72 years old Male American President, business man Vladimir Putin 65 years old Male Russian President

Graph structureEdge featuresAromatic bondDouble bondSingle bond

Learning relation and interactionWhat can we do with graph neural networks?Battaglia, Peter W., et al."Relational inductive biases, deep learning, and graph networks." arXiv preprint arXiv:1806.01261 (2018).

Learning relation and interactionWhat can we do with graph neural networks? Node classification Link prediction Node2Vec, Subgraph2Vec, Graph2Vec: Embedding node/substructure/graph structure to a vector Learning physics law from data And you can do more amazing things with GNN!

Graph neural networks

Overall architecture of graph neural networks Updating node states- Graph Convolutional Network (GCN)- Graph Attention Network (GAT)- Gated Graph Neural Network (GGNN) Readout : permutation invariance on changing node orders Graph Auto-Encoders Practical issues- Skip connection- Inception- Dropout

Principles of graph neural networkWeights using in updating hidden states of fully-connected Net, CNN and RNNBattaglia, Peter W., et al."Relational inductive biases, deep learning, and graph networks." arXiv preprint arXiv:1806.01261 (2018).

Overall neural network structureโ€“ case of supervised learning(๐ŸŽ)Input node features, ๐‘ฏ๐’Š: Raw node information(๐‘ณ)Final node states, ๐‘ฏ๐’Š: How the NN recognizes the nodesGraph features, Z

Principles of graph neural networkUpdates in a graph neural network Edge update : relationship or interactions, sometimes called as โ€˜message passingโ€™ex) the forces of spring Node update : aggregates the edge updates and used in the node updateex) the forces acting on the ball Global update : an update for the global attributeex) the net forces and total energy of the physical systemBattaglia, Peter W., et al."Relational inductive biases, deep learning, and graph networks." arXiv preprint arXiv:1806.01261 (2018).

Principles of graph neural networkWeights using in updating hidden states of GNNSharing weights for all nodes in graph,(๐’)but nodes are differently updated by reflecting individual node features, ๐‘ฏ๐’‹

GCN : Graph Convolutional Networkhttp://tkipf.github.io/Famous for variational autoencoder (VAE)Kipf, Thomas N., and Max Welling."Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016)

GCN : Graph Convolutional NetworkWhat NN learns(๐’ ๐Ÿ)๐‘ฟ๐’Š ๐ˆ((๐’)(๐’)(๐’)๐’‹ [๐’Š ๐’Œ,๐’Š ๐’Œ] ๐‘พ๐’‹ ๐‘ฟ๐’‹ ๐’ƒ๐’‹ )

GCN : Graph Convolutional Network2134(๐’ ๐Ÿ)๐‘ฏ๐Ÿ๐’ ๐ˆ ๐‘ฏ๐Ÿ ๐‘พ๐’๐’ ๐‘ฏ๐Ÿ ๐‘พ๐’๐’ ๐‘ฏ๐Ÿ‘ ๐‘พ๐’๐’ ๐‘ฏ๐Ÿ’ ๐‘พ๐’

GCN : Graph Convolutional NetworkWhat NN learns(๐’ ๐Ÿ)๐‘ฏ๐’Š๐’ ๐ˆ๐‘ฏ๐’‹ ๐‘พ๐’‹ ๐‘ต ๐’Š๐’๐’ ๐ˆ ๐‘จ๐‘ฏ๐’‹ ๐‘พ๐’

GCN : Graph Convolutional Network Classification of nodes of citation networks and a knowledge graph ๐ฟ ๐‘™ ๐‘ฆ๐ฟ๐น๐‘“ 1 ๐‘Œ๐‘™๐‘“ln ๐‘๐‘™๐‘“Kipf, Thomas N., and Max Welling."Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016)

GAT : Graph Attention Network Attention revisitsWhat NN learnsGCN :(๐’ ๐Ÿ)๐‘ฏ๐’Š๐’ ๐ˆ๐‘ฏ๐’‹ ๐‘พ๐’‹ ๐‘ต ๐’Š๐’๐’ ๐ˆ ๐‘จ๐‘ฏ๐’‹ ๐‘พ๐’

GAT : Graph Attention Network Attention revisitsWhat NN learns: Convolution weight and attention coefficientGAT :(๐’ ๐Ÿ)๐‘ฏ๐’Š(๐’) ๐ˆ๐’๐œถ๐’Š๐’‹ ๐‘ฏ๐’‹ ๐‘พ๐’‹ ๐‘ต ๐’Š๐’Velickovic, Petar, et al."Graph attention networks." arXiv preprint arXiv:1710.10903 (2017).

GAT : Graph Attention Network Attention mechanism in natural language -neural-machine-translation-gpus-part-3/figure6 sample translations-2/

GAT : Graph Attention Network Attention mechanism in natural language and-memory-in-deep-learning-and-nlp/

GAT : Graph Attention Network Attention mechanism in natural language processing๐’‚๐’• ๐‘  ๐‘Ž๐‘™๐‘–๐‘”๐‘› ๐’‰๐’• , ๐’‰๐’” ๐‘ ๐‘œ๐‘“๐‘ก๐‘š๐‘Ž๐‘ฅ ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’ ๐’‰๐’• , ๐’‰๐’”๐›ฝ ๐’‰๐‘ป๐’• ๐’‰๐’”๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’(๐’‰๐’• , ๐’‰๐’” ) ๐’‰๐‘ป๐’• ๐‘พ๐’‚ ๐’‰๐’”๐’—๐‘ป๐’‚ tanh(๐‘พ๐’‚ [๐’‰๐’• , ๐’‰๐’” ])Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning."Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).

GAT : Graph Attention Network Attention revisitsWhat NN learns: Convolution weight and attention coefficient(๐’ ๐Ÿ)๐‘ฏ๐’Š(๐’) ๐ˆ๐’๐œถ๐’Š๐’‹ ๐‘ฏ๐’‹ ๐‘พ๐’‹ ๐‘ต ๐’Š๐’๐œถ๐’Š๐’‹ ๐’‡(๐‘ฏ๐’Š ๐‘พ, ๐‘ฏ๐’‹ ๐‘พ)Velickovic, Petar, et al."Graph attention networks." arXiv preprint arXiv:1710.10903 (2017).

GAT : Graph Attention Network Attention revisitsWhat NN learns: Convolution weight and attention coefficient(๐’ ๐Ÿ)๐‘ฏ๐’Š(๐’) ๐ˆ๐’๐œถ๐’Š๐’‹ ๐‘ฏ๐’‹ ๐‘พ๐’‹ ๐‘ต ๐’Š๐’๐œถ๐’Š๐’‹ ๐’‡(๐‘ฏ๐’Š ๐‘พ, ๐‘ฏ๐’‹ ๐‘พ) Velickovic, Petar, et al. โ€“ network analysis๐œถ๐’Š๐’‹ ๐’”๐’๐’‡๐’•๐’Ž๐’‚๐’™ ๐’†๐’Š๐’‹ ๐’†๐’Š๐’‹๐’†๐’™๐’‘๐’Œ ๐‘ต ๐’Š๐’†๐’Š๐’Œ Seongok Ryu, et al. โ€“ molecular applications๐œถ๐’Š๐’‹ ๐ญ๐š๐ง๐ก๐‘ฏ๐’Š ๐‘พ ๐‘ป ๐‘ช ๐‘ฏ๐’‹ ๐‘พ๐’†๐’Š๐’‹ ๐‘ณ๐’†๐’‚๐’“๐’Œ๐’š๐‘น๐’†๐‘ณ๐‘ผ(๐’‚๐‘ป ๐‘ฏ๐’Š ๐‘พ, ๐‘ฏ๐’‹ ๐‘พ )

GAT : Graph Attention Network Multi-head attention Average(๐’ ๐Ÿ)๐‘ฏ๐’Š ๐ˆ๐Ÿ๐‘ฒ๐‘ฒ(๐’)๐’๐œถ๐’Š๐’‹ ๐‘ฏ๐’‹ ๐‘พ๐’๐’Œ ๐Ÿ ๐’‹ ๐‘ต ๐’Š Concatenation๐‘ฒ(๐’ ๐Ÿ)๐‘ฏ๐’Š (๐’)๐ˆ๐’Œ ๐Ÿ๐’๐œถ๐’Š๐’‹ ๐‘ฏ๐’‹ ๐‘พ๐’๐’‹ ๐‘ต ๐’ŠVelickovic, Petar, et al."Graph attention networks." arXiv preprint arXiv:1710.10903 (2017).

GGNN : Gated Graph Neural Network Message Passing Neural Network (MPNN) frameworkUsing previous node state and message stateto update the ith hidden node state(๐‘™ 1)โ„Ž๐‘–๐‘™ ๐‘ˆ(โ„Ž๐‘– , ๐‘š๐‘–๐‘™ 1)The message state is updated by previousneighbor states and the route state, and edge states.(๐‘™ 1)๐‘š๐‘– ๐‘€๐‘™ 1(โ„Ž๐‘– , โ„Ž๐‘— , ๐‘’๐‘–๐‘— )๐‘— ๐‘(๐‘–)Wu, Zhenqin, et al."MoleculeNet: a benchmark for molecular machine learning." Chemical science 9.2 (2018): 513-530.

GGNN : Gated Graph Neural Network Using recurrent unit for updating the node states, in this case GRU.(๐’ ๐Ÿ)๐’‰๐’Š๐’ ๐‘ผ(๐’‰๐’Š , ๐’Ž๐’Š๐’ ๐Ÿ(๐’ ๐Ÿ))๐’‰๐’Š๐’ ๐‘ฎ๐‘น๐‘ผ(๐’‰๐’Š , ๐’Ž๐’Š๐’ ๐Ÿ)Updating rate of the temporal state(๐’ ๐Ÿ)๐’‰๐’Š ๐’›๐’ ๐Ÿ ๐’‰๐’Š(๐’ ๐Ÿ) ๐Ÿ ๐’›Temporal node state๐’ ๐Ÿ(๐’) ๐’‰๐’ŠPrevious node stateLi, Yujia, et al."Gated graph sequence neural networks." arXiv preprint arXiv:1511.05493 (2015).

Readout : permutation invariance on changing nodes order(๐ŸŽ)Input node features, ๐‘ฏ๐’Š: Raw node information(๐‘ณ)Final node states, ๐‘ฏ๐’Š: How the NN recognizes the nodesGraph features, Z

Readout : permutation invariance on changing nodes orderMapping Images to Scene Graphs with Permutation-Invariant Structured Prediction - Scientific Figure on ResearchGate. Available eling-function-F-is fig1 323217335 [accessed 8 Sep, 2018]

Readout : permutation invariance on changing nodes order Graph feature๐‘ง๐บ ๐‘“(๐ฟ)๐ป๐‘– Node-wise summation๐‘ง๐บ ๐œ๐‘€๐ฟ๐‘ƒ ๐ป๐‘–๐ฟ๐‘– ๐บ Graph gathering๐ฟ๐‘ง๐บ ๐œ๐œŽ ๐‘€๐ฟ๐‘ƒ1 ๐ป๐‘– , ๐ป๐‘–๐‘– ๐บ0 ๐‘€๐ฟ๐‘ƒ2 ๐ป๐‘–๐ฟ ๐œ : ReLU activation ๐œŽ : sigmoid activationGilmer, Justin, et al."Neural message passing for quantum chemistry." arXiv preprint arXiv:1704.01212 (2017).

Graph Auto-Encoders (GAE) Clustering Link prediction Matrix completion and recommendationKipf, Thomas N., and Max Welling."Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).https://github.com/tkipf/gae

Graph Auto-Encoders (GAE)Input graph, ๐‘ฎ(๐‘ฟ, ๐‘จ)EncoderNode features, ๐‘ž(๐’ ๐‘ฟ, ๐‘จ)DecoderReconstructedAdjacency matrix, ๐ด ๐ˆ(๐’๐’๐‘ป )Kipf, Thomas N., and Max Welling."Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).https://github.com/tkipf/gae

Graph Auto-Encoders (GAE)Encoder - Inference modelInput graph, ๐‘ฎ(๐‘ฟ, ๐‘จ) Two-layer GCN๐บ๐ถ๐‘ ๐‘ฟ, ๐‘จ ๐‘จ๐‘…๐‘’๐ฟ๐‘ˆ ๐‘จ๐‘ฟ๐‘พ๐ŸŽ ๐‘พ๐Ÿ12 12 ๐‘จ ๐‘ซ ๐‘จ๐‘ซ: symmetrically normalized adjacency matrix Variational Inference๐‘ž(๐’›๐’Š ๐‘ฟ, ๐‘จ)๐‘– 1Node features, ๐‘ž(๐’ ๐‘ฟ, ๐‘จ)Decoder๐‘๐‘ž ๐’ ๐‘ฟ, ๐‘จ Encoder๐‘ž ๐’›๐’Š ๐‘ฟ, ๐‘จ ๐‘ ๐’›๐’Š ๐๐’Š , ๐‘‘๐‘–๐‘Ž๐‘” ๐ˆ๐Ÿ๐’ŠReconstructedAdjacency matrix, ๐ด ๐ˆ(๐’๐’๐‘ป )Kipf, Thomas N., and Max Welling."Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).https://github.com/tkipf/gae

Graph Auto-Encoders (GAE)Decoder - Generative model Inner product between latent vectors๐‘๐‘๐‘ ๐‘จ๐’ ๐‘(๐‘จ๐’Š๐’‹ ๐’›๐’Š , ๐’›๐’‹ )๐‘ ๐‘จ๐’Š๐’‹ ๐’›๐’Š , ๐’›๐’‹ ๐œŽ ๐’›๐‘‡๐’Š ๐’›๐’‹๐‘– 1 ๐‘— 1Input graph, ๐‘ฎ(๐‘ฟ, ๐‘จ)EncoderNode features, ๐‘ž(๐’ ๐‘ฟ, ๐‘จ)๐ด๐‘–๐‘— : the elements of (reconstructed) A๐œŽ : sigmoid activation๐‘จ ๐ˆ(๐’๐’๐‘ป )with ๐’ ๐‘ฎ๐‘ช๐‘ต(๐‘ฟ, ๐‘จ) LearningDecoderReconstructedAdjacency matrix, ๐‘จ ๐ˆ(๐’๐’๐‘ป )โ„’ ๐”ผ๐‘ž(๐’ ๐‘ฟ,๐‘จ) log ๐‘(๐‘จ ๐’) KL ๐‘ž ๐’ ๐‘ฟ, ๐‘จ (๐‘(๐’)Reconstruction lossKL-divergenceKipf, Thomas N., and Max Welling."Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).https://github.com/tkipf/gae

Practical Issues: InceptionGoogLeNet โ€“ Winner of 2014 ImageNet 1

Practical Issues: InceptionInception guide-to-deep-network-architectures-65fdc477db41

Practical Issues: Inception(๐Ÿ)๐‘ฏ๐’Š๐ŸŽ ๐ˆ ๐‘จ๐‘ฏ๐’‹ ๐‘พ(๐ŸŽ)(๐Ÿ)๐‘ฏ๐’Š๐Ÿ ๐ˆ ๐‘จ๐‘ฏ๐’‹ ๐‘พ(๐Ÿ)

Practical Issues: Inception Make network wider Avoid vanishing gradient

Practical Issues: Skip-connectionResNet โ€“ Winner of 2015 ImageNet ChallengeGoing Deeeeeeeeper!

Practical Issues: Skip-connectionResNet โ€“ Winner of 2015 ImageNet Challenge(๐‘™ 1)๐‘ฆ ๐ป๐‘–(๐‘™) e-guide-to-deep-network-architectures-65fdc477db41

Practical Issues: DropoutDropout rate (๐‘) 0.25# parameters: 4x4 16# parameters: 4x4 16# parameters: 16x0.75 12# parameters: 16x0.75x0.75 9For a dense network, the dropout of hidden state reduces the number of parametersfrom ๐‘๐‘ค to ๐‘๐‘ค 1 ๐‘2

Practical Issues: DropoutHidden states in a graph neural networkGraph Conv. :(๐‘™ 1)๐ป๐‘– ๐ด๐‘ฏ(๐’) ๐‘ŠNode 1Node 2Node 3 Age284036 SexMFF NationalityKoreanAmericanFrench JobStudentMedicalDoctorPolitician ๏ฟฝ๏ฟฝ๐Ÿ’

Practical Issues: DropoutHidden states in a graph neural networkNode 1Node 2Node 3 Age284036 ctor(๐’)๐‘ฏ๐Ÿ(๐’)๐‘ฏ๐ŸNode 1Node 2Node 3 Age284036 SexMFF French NationalityKoreanAmericanFrench Politician JobStudentMedicalDoctorPolitician (๐’)๐‘ฏ๐Ÿ‘Mask individual ๏ฟฝ๏ฟฝ๏ฟฝ)๐‘ฏ๐Ÿ‘(๐’)๐‘ฏ๐Ÿ’Mask information (features)of node statesAnd many other options are possible. The proper method depends on your task.

Applications of graph neural networks

Network Analysis1. Node classification2. Link prediction3. Matrix completion Molecular Applications1. Neural molecular fingerprint2. Quantitative Structure-Property Relationship (QSPR)3. Molecular generative model Interacting physical system

Network analysis1. Node classification โ€“ karate club networkKarate club graph, colors denote communities obtainedvia modularity-based clustering (Brandes et al., 2008). All figures and descriptions are taken from Thomas N. Kipfโ€™s blog. Watch video on his blog.GCN embedding (with random weights)for nodes in the karate club network.Kipf, Thomas N., and Max Welling."Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 etworks/

Network analysis1. Node classification Good node features Good node classification resultsKipf, Thomas N., and Max Welling."Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 etworks/

Network analysis1. Node classification Semi-supervised learning โ€“ low label rate Citation network โ€“ Citeseer, Cora, Pubmed / Bipartite graph - NELLKipf, Thomas N., and Max Welling."Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 etworks/

Network analysis1. Node classification Outperforms classical machine learning methodsKipf, Thomas N., and Max Welling."Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 etworks/

Network analysis1. Node classification Spectral graph convolution๐ป(๐‘™ 1) ๐œŽ ๐ท 1/2 ๐ด๐ท 1/2 ๐ป ๐‘™ ๐‘Š (๐‘™)๐ด ๐ด ๐ผ๐‘ , ๐ท๐‘–๐‘– ๐‘— ๐ด๐‘–๐‘— Spectral graph filtering๐‘”๐œƒ ๐‘ฅ ๐‘ˆ๐‘”๐œƒโ€ฒฮ›๐‘ˆ๐‘‡ ๐‘ฅ๐‘ˆ : the matrix of eigenvectors of the normalized graph Laplacian๐ฟ ๐ผ๐‘ 11 2 2๐ท ๐ด๐ท ๐‘ˆฮ›๐‘ˆ ๐‘‡๐พ๐œƒ๐‘˜โ€ฒ ๐‘‡๐‘˜ (ฮ›)๐‘”๐œƒ โ€ฒ ฮ› Polynomial approximation (In this case, Chebyshev polynomial)๐‘˜ 0Kipf, Thomas N., and Max Welling."Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 etworks/

Network analysis1. Node classification Spectral graph convolution๐ป(๐‘™ 1) ๐œŽ ๐ท 1/2 ๐ด๐ท 1/2 ๐ป ๐‘™ ๐‘Š (๐‘™)๐ด ๐ด ๐ผ๐‘ , ๐ท๐‘–๐‘– ๐‘— ๐ด๐‘–๐‘— Spectral graph filtering๐พ๐‘”๐œƒ ๐‘ฅ ๐œƒ0โ€ฒ ๐‘ฅ ๐œƒ1โ€ฒ ๐ฟ ๐ผ๐‘ ๐‘ฅ ๐œƒ0โ€ฒ ๐‘ฅ ๐œƒ1โ€ฒ ๐ท 1/2 ๐ด๐ท 1/2 ๐‘ฅ๐œƒ๐‘˜โ€ฒ ๐‘‡๐‘˜ ๐ฟ ๐‘ฅ๐‘”๐œƒ ๐‘ฅ Linear approx.๐‘˜ 0๐‘”๐œƒ ๐‘ฅ ๐œƒ ๐ท 1/2 ๐ด๐ท 1/2 ๐‘ฅUse a single parameter ๐œƒ ๐œƒ0โ€ฒ ๐œƒ1โ€ฒ๐‘”๐œƒ ๐‘ฅ ๐œƒ ๐ผ๐‘ ๐ท 1/2 ๐ด๐ท 1/2 ๐‘ฅRenormalization trick๐ผ๐‘ ๐ท 1/2 ๐ด๐ท 1/2 ๐ท 1/2 ๐ด๐ท 1/2Kipf, Thomas N., and Max Welling."Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 etworks/

Network analysis1. Node classification Spectral graph convolution๐ป(๐‘™ 1) ๐œŽ ๐ท 1/2 ๐ด๐ท 1/2 ๐ป ๐‘™ ๐‘Š (๐‘™)๐ด ๐ด ๐ผ๐‘ , ๐ท๐‘–๐‘– ๐‘— ๐ด๐‘–๐‘—Kipf, Thomas N., and Max Welling."Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 etworks/

Network analysis2. Link prediction Clustering Link prediction Matrix completion and recommendationKipf, Thomas N., and Max Welling."Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).https://github.com/tkipf/gae

Network analysis2. Link predictionInput graph, ๐‘ฎ(๐‘ฟ, ๐‘จ)EncoderNode features, ๐‘ž(๐’ ๐‘ฟ, ๐‘จ)DecoderReconstructedAdjacency matrix, ๐ด ๐ˆ(๐’๐’๐‘ป )Kipf, Thomas N., and Max Welling."Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).https://github.com/tkipf/gae

Network analysis2. Link prediction Trained on an incomplete version of {Cora, Citeseer, Pubmed} datasets where parts of the citation links(edges) have been removed, while all node features are kept. Form validation and test sets from previously removed edges and the same number of randomlysampled pairs of unconnected nodes (non-edges).Kipf, Thomas N., and Max Welling."Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).https://github.com/tkipf/gae

Network analysis3. Matrix completion Matrix completion Can be applied for recommending systemBerg, Rianne van den, Thomas N. Kipf, and Max Welling."Graph convolutional matrix completion." arXiv preprint arXiv:1706.02263 (2017).

Network analysis3. Matrix completion A rating matrix ๐‘€ of shape ๐‘๐‘ข ๐‘๐‘ฃ๐‘๐‘ข : the number of users, ๐‘๐‘ฃ : the number of items User ๐‘– rated item ๐‘–, or the rating is unobserved (๐‘€๐‘–๐‘— 0). Matrix completion problem or recommendation a link prediction problem on a bipartite user-item interaction graph.Input graph : ๐‘ฎ(๐“ฆ, ๐“”, ๐“ก) ๐“ฆ ๐“ค ๐“ฅ : user nodes ๐’–๐ข ๐“ค, with ๐‘– {1, , ๐‘๐‘ข } and item nodes ๐’—๐’‹ ๐“ฅ, with ๐‘— 1, , ๐‘๐‘ฃ ๐’–๐’Š , ๐‘Ÿ, ๐’—๐’‹ ๐“” represent rating levels, such as ๐‘Ÿ 1, , ๐‘… ๐“ก.Berg, Rianne van den, Thomas N. Kipf, and Max Welling."Graph convolutional matrix completion." arXiv preprint arXiv:1706.02263 (2017).

Network analysis3. Matrix completion GAE for the link prediction taskTake as input an ๐‘ ๐ท feature matrix, ๐‘ฟ๐‘ ๐ธ node embedding matrix, ๐’ ๐’›๐‘ป๐Ÿ , , ๐’›๐‘ป๐‘ต๐‘ป๐’ ๐‘“(๐‘ฟ, ๐‘จ)A graph adjacency matrix, ๐‘จ๐‘จ ๐‘”(๐’)which takes pairs of node embeddings ๐’›๐’Š , ๐’›๐’‹ and predicts respective entries ๐‘จ๐’Š๐’‹Berg, Rianne van den, Thomas N. Kipf, and Max Welling."Graph convolutional matrix completion." arXiv preprint arXiv:1706.02263 (2017).

Network analysis3. Matrix completion GAE for the bipartite recommender graphs, ๐‘ฎ(๐“ฆ, ๐“”, ๐“ก)Encoder๐‘ˆ, ๐‘‰ ๐‘“(๐‘‹, ๐‘€1 , , ๐‘€๐‘… ), where ๐‘€๐‘Ÿ 0,1๐‘๐‘ข ๐‘๐‘ฃis the adjacency matrix associated with rating type ๐‘Ÿ โ„›๐‘ˆ, ๐‘‰ : matrices of user and item embeddings with shape ๐‘๐‘ข ๐ธ and ๐‘๐‘ฃ ๐ธ, respectively.Decoder๐‘€ ๐‘”(๐‘ˆ, ๐‘‰), rating matrix ๐‘€ of shape ๐‘๐‘ข ๐‘๐‘ฃ๐บ(๐‘ฟ, ๐‘จ)๐บ(๐“ฆ, ๐“”, ๐“ก)๐’ ๐‘“(๐‘ฟ, ๐‘จ)๐‘จ ๐‘”(๐’)๐‘ผ, ๐‘ฝ ๐‘“(๐‘ฟ, ๐‘ด๐Ÿ , , ๐‘ด๐‘น )๐‘ด ๐‘”(๐‘ผ, ๐‘ฝ)GAE for the link prediction taskGAE for the bipartite recommenderBerg, Rianne van den, Thomas N. Kipf, and Max Welling."Graph convolutional matrix completion." arXiv preprint arXiv:1706.02263 (2017).

Network analysis3. Matrix completion GAE for the bipartite recommender graphs, ๐‘ฎ(๐“ฆ, ๐“”, ๐“ก)๐บ(๐“ฆ, ๐“”, ๐“ก)๐‘ผ, ๐‘ฝ ๐‘“(๐‘ฟ, ๐‘ด๐Ÿ , , ๐‘ด๐‘น )๐‘ด ๐‘”(๐‘ผ, ๐‘ฝ)Berg, Rianne van den, Thomas N. Kipf, and Max Welling."Graph convolutional matrix completion." arXiv preprint arXiv:1706.02263 (2017).

Network analysis3. Matrix completionEncoder๐‘ข๐‘– ๐œŽ ๐‘Šโ„Ž๐‘– : the final user embeddingโ„Ž๐‘– ๐œŽ ๐‘Ž๐‘๐‘๐‘ข๐‘š๐œ‡๐‘— ๐‘–,1 , ,๐‘— ๐’ฉ๐‘– ,1๐œ‡๐‘— ๐‘–,๐‘Ÿ 1๐‘Š๐‘ฅ๐‘๐‘–๐‘— ๐‘Ÿ ๐‘—๐œ‡๐‘— ๐‘–,๐‘…: intermediate node state๐‘— ๐’ฉ๐‘– ,๐‘…: Message function from item ๐‘— to user i๐‘๐‘–๐‘— ๐’ฉ๐‘– ๐’ฉ๐‘—Berg, Rianne van den, Thomas N. Kipf, and Max Welling."Graph convolutional matrix completion." arXiv preprint arXiv:1706.02263 (2017).

Network analysis3. Matrix completionDecoder๐‘ ๐‘€๐‘–๐‘— ๐‘Ÿ ๐‘’๐‘ข๐‘–๐‘‡ ๐‘„๐‘Ÿ ๐‘ฃ๐‘—๐‘‡๐‘ข๐‘– ๐‘„๐‘  ๐‘ฃ๐‘—๐‘  ๐‘… ๐‘’๐‘€๐‘–๐‘— ๐‘” ๐‘ข๐‘– , ๐‘ฃ๐‘— ๐”ผ๐‘๐‘€๐‘–๐‘— ๐‘Ÿ: probability that rating ๐‘€๐‘–๐‘— is ๐‘Ÿ๐‘Ÿ ๐‘Ÿ ๐‘(๐‘€๐‘–๐‘— ๐‘Ÿ): expected rating๐‘Ÿ ๐‘…Loss function๐‘…โ„’ ๐ผ ๐‘Ÿ ๐‘€๐‘–๐‘— log ๐‘ ๐‘€๐‘–๐‘— ๐‘Ÿ๐ผ ๐‘˜ ๐‘™ 1, when ๐‘˜ ๐‘™ and 0 otherwise๐‘–,๐‘— ๐‘Ÿ 1Berg, Rianne van den, Thomas N. Kipf, and Max Welling."Graph convolutional matrix completion." arXiv preprint arXiv:1706.02263 (2017).

Network analysis3. Matrix completionBerg, Rianne van den, Thomas N. Kipf, and Max Welling."Graph convolutional matrix completion." arXiv preprint arXiv:1706.02263 (2017).

Molecular applications: Which kinds of datasets exist? Bioactive molecules with drug-like properties 1,828,820 compounds https://www.ebi.ac.uk/chembldb/ Drugs and targets FDA approved, investigational, experimental, 7,713 (all drugs), 4,115 (targets), https://drugbank.ca/ Drugs and targets FDA approved, investigational, experimental, 7,713 (all drugs), 4,115 (targets), https://drugbank.ca/

Molecular applications: Which kinds of datasets exist?Tox21 Data Challenge (@ Kaggle) 12 types of toxicity Molecule species (represented with SMILES)and toxicity labels are given But too small to train a DL sp

Molecular applications1. Neural molecular fingerprintHash function have been used to generate molecular fingerprints.* Molecular fingerprint : a vector representation of molecular ii/

Molecular applications1. Neural molecular fingerprintSuch molecular fingerprints can be easily obtained by open source packages, e.g.) RDKit.http://kehang.github.io/basic roperty-prediction/

Molecular applications1. Neural molecular fingerprintIn recent days, neural fingerprints generated by graph convolutional network is widely used formore accurate molecular property predictions.http://kehang.github.io/basic roperty-prediction/

Molecular applications1. Neural molecular fingerprintDuvenaud, David K., et al."Convolutional networks on graphs for learning molecular fingerprints." Advances in neural information processing systems. 2015.

Molecular applications1. Neural molecular fingerprintDuvenaud, David K., et al."Convolutional networks on graphs for learning molecular fingerprints." Advances in neural information processing systems. 2015.

Molecular applications1. Neural molecular fingerprintDuvenaud, David K., et al."Convolutional networks on graphs for learning molecular fingerprints." Advances in neural information processing systems. 2015.

Molecular applications1. Neural molecular fingerprintDuvenaud, David K., et al."Convolutional networks on graphs for learning molecular fingerprints." Advances in neural information processing systems. 2015.

Molecular applications2. Quantitative Structure-Property Relationships (QSPR)Ryu, Seongok, Jaechang Lim, and Woo Youn Kim."Deeply learning molecular structure-property relationships using graph attention neural network." arXiv preprint arXiv:1805.10988 (2018).

Molecular applications2. Quantitative Structure-Property Relationships (QSPR)Ryu, Seongok, Jaechang Lim, and Woo Youn Kim."Deeply learning molecular structure-property relationships using graph attention neural network." arXiv preprint arXiv:1805.10988 (2018).

Molecular applications2. Quantitative Structure-Property Relationships (QSPR)Learning solubility of moleculesFinal node statesThe neural network recognizesseveral functional groups differentlyRyu, Seongok, Jaechang Lim, and Woo Youn Kim."Deeply learning molecular structure-property relationships using graph attention neural network." arXiv preprint arXiv:1805.10988 (2018).

Molecular applications2. Quantitative Structure-Property Relationships (QSPR)Learning photovoltaic efficiency (QM phenomena)Final node statesInterestingly, The NN also can differentiate nodesaccording to the quantum mechanical characteristics.Ryu, Seongok, Jaechang Lim, and Woo Youn Kim."Deeply learning molecular structure-property relationships using graph attention neural network." arXiv preprint arXiv:1805.10988 (2018).

Molecular applications2. Quantitative Structure-Property Relationships (QSPR)Learning solubility of moleculesAscendingorderGraph features๐’๐’Š ๐’๐’‹๐ŸSimilar molecules are located closelyin the graph latent space

Molecular applications3. Molecular generative modelMotivation : de novo molecular design Chemical space is too huge: only 108 molecules have beensynthesized as potential drug candidates,whereas it is estimated that there are1023 to 1060 molecules. Limitation of virtual screening

Molecular applications3. Molecular generative modelMotivation : de novo molecular designMolecule GraphSimplified Molecule Line-Entry System(SMILES)Molecules can be represented as strings according to defined rules

Molecular applications3. Molecular generative modelMotivation : de novo molecular designMany SMILES-based generative models existSegler, Marwin HS, et al. "Generating focused molecule libraries for drug discoverywith recurrent neural networks." ACS central science 4.1 (2017): 120-131.Gรณmez-Bombarelli, Rafael, et al. "Automatic chemical design using a data-drivencontinuous representation of molecules." ACS central science 4.2 (2018): 268-276.

Molecular applications3. Molecular generative modelMotivation : de novo molecular designSMILES representation also has a fatal problem thatsmall changes of structure can lead to quite different expressions. Difficult to reflect topological information of molecules

Molecular applications3. Molecular generative modelLiteratures Li, Y., Vinyals, O., Dyer, C., Pascanu, R., & Battaglia, P. (2018). Learning deep generative models of graphs.arXiv preprint arXiv:1803.03324. Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for MolecularGraph Generation." arXiv preprint arXiv:1802.04364 (2018). Constrained Graph Variational Autoencoders for Molecule Design." arXiv preprint arXiv:1805.09076(2018). "GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders." arXiv preprintarXiv:1802.03480 (2018). De Cao, Nicola, and Thomas Kipf. "MolGAN: An implicit generative model for small molecular graphs."arXiv preprint arXiv:1805.11973 (2018).

Molecular applications3. Molecular generative modelLi, Yujia, et al."Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

Molecular applications3. Molecular generative model1. Sample whether to add a new node of a particular type or terminate : if a node type is chosen2. Add a node of this type to the graph3. Check if any further edges are needed to connect the new node to the existing graph4. If yes, select a node in the graph and add an edge connecting the new to the selected node.Li, Yujia, et al."Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

Molecular applications3. Molecular generative model Determine that add a node or notLi, Yujia, et al."Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

Molecular applications3. Molecular generative model If a node is added, determine that add edges between current node and other nodes.Li, Yujia, et al."Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

Molecular applications3. Molecular generative modelLi, Yujia, et al."Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

Molecular applications3. Molecular generative modelGraph propagation process : readout all node states and generating a graph feature๐’‰๐‘ฎ๐‘ฝ๐’‰๐‘ฎ or๐‘ฃ ๐’ฑ๐’‰โ€ฒ๐’— ๐‘“๐‘› ๐’‚๐’— , ๐’‰๐’—๐’ˆ๐‘ฎ๐’— ๐’‰๐บ๐‘ฃ๐’‰๐‘ฎ ๐’ˆ๐‘ฎ๐’— ๐œŽ ๐‘”๐‘š ๐’‰๐’—๐‘ฃ ๐’ฑ ๐‘ฃ ๐’ฑ๐’‚๐’— ๐‘“๐‘’ ๐’‰๐’– , ๐’‰๐’— , ๐’™๐’–,๐’—๐‘ข,๐‘ฃ โ„ฐ ๐‘ฃ ๐’ฑLi, Yujia, et al."Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

Molecular applications3. Molecular generative modelAdd node : a step to decide whether or not to add a node๐’‡๐’‚๐’…๐’…๐’๐’๐’…๐’† ๐‘ฎ softmax ๐‘“๐‘Ž๐‘› ๐’‰๐‘ฎ(๐‘ป)If โ€œyesโ€The new node vectors ๐’‰๐‘ฝ are carried over to the next stepLi, Yujia, et al."Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

Molecular applications3. Molecular generative modelAdd edge : a step to add an edge to the new node(๐‘ป)๐‘“๐‘Ž๐‘‘๐‘‘๐‘’๐‘‘๐‘”๐‘’ ๐บ, ๐‘ฃ ๐œŽ ๐‘“๐‘Ž๐‘’ ๐’‰๐‘ฎ , ๐’‰๐’—: the probability of adding an edge to the newly created node ๐‘ฃ.๐‘ปIf โ€œyesโ€(๐‘ป)๐‘“๐‘›๐‘œ๐‘‘๐‘’๐‘  ๐บ, ๐‘ฃ softmax ๐‘“๐‘  ๐’‰๐’– , ๐’‰๐’—: Score for each node toconnect the edgesLi, Yujia, et al."Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

Molecular applications3. Molecular generative modelObjective function๐‘(๐บ) : marginal likelihood๐‘ ๐บ permutation๐‘(๐บ, ๐œ‹) ๐”ผ๐‘ž๐œ‹ ๐’ซ(๐บ)๐œ‹ ๐บ๐‘(๐บ, ๐œ‹)๐‘ž ๐œ‹ ๐บFollowing all possible permutations is intractable samples from data : ๐‘ž ๐œ‹ ๐บ ๐‘๐‘‘๐‘Ž๐‘ก๐‘Ž (๐œ‹ ๐บ)๐”ผ๐‘๐‘‘๐‘Ž๐‘ก๐‘Ž (๐บ,๐œ‹) log ๐‘(๐บ, ๐œ‹) ๐”ผ๐‘๐‘‘๐‘Ž๐‘ก๐‘Ž(๐บ) ๐”ผ๐‘๐‘‘๐‘Ž๐‘ก๐‘Ž๐œ‹ ๐บlog ๐‘(๐บ, ๐œ‹)Li, Yujia, et al."Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

Molecular applications3. Molecular generative modelLi, Yujia, et al."Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

Molecular applications3. Molecular generative modelLi, Yujia, et al."Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

Interacting physical system Interacting systems Nodes : particles, Edges : (physical) interaction between particle pairs Latent code : the underlying interaction graphKipf, Thomas, et al."Neural relational inference for interacting systems." arXiv preprint arXiv:1802.04687 (2018).

Interacting physical systemInput graph : ๐“– ๐“ฅ, ๐“” with vertices ๐‘ฃ ๐“ฅ and edges ๐‘’ (๐‘ฃ, ๐‘ฃ โ€ฒ ) โ„ฐ๐‘ฃ ๐‘’ ๐’‰๐’(๐’Š,๐’‹) ๐‘“๐‘’๐‘™ ๐’‰๐’๐’Š , ๐’‰๐’๐’‹ , ๐’™๐’Š,๐’‹๐’‰๐’ ๐’Š,๐’‹ , ๐’™๐’‹๐‘’ ๐‘ฃ ๐’‰๐’ ๐Ÿ ๐‘“๐‘ฃ๐‘™๐’‹๐’™๐’Š : initial node features, ๐’™๐’Š,๐’‹: initial edge features๐‘– ๐’ฉ๐‘—Kipf, Thomas, et al."Neural relational inference for interacting systems." arXiv preprint arXiv:1802.04687 (2018).

Interacting physical systemKipf, Thomas, et al."Neural relational inference for interacting systems." arXiv preprint arXiv:1802.04687 (2018).

Interacting physical systemEncoder๐’‰๐Ÿ(๐’Š,๐’‹) ๐‘“๐‘’1 ๐’‰๐Ÿ๐’Š , ๐’‰๐Ÿ๐’‹๐’‰๐Ÿ๐’‹ ๐‘“๐‘ฃ1๐‘– ๐‘—๐’‰๐Ÿ(๐’Š,๐’‹)๐’‰๐Ÿ(๐’Š,๐’‹) ๐‘“๐‘’2 ๐’‰๐Ÿ๐’Š , ๐’‰๐Ÿ๐’‹๐’‰๐Ÿ๐’‹ ๐‘“๐‘’๐‘š๐‘ (๐’™๐’‹ )๐‘ž๐œ™ ๐‘ง๐‘–๐‘— ๐‘ฅ softmax ๐‘“๐‘’๐‘›๐‘,๐œ™ ๐‘ฅ๐‘–๐‘—,1:๐พ2 softmax โ„Ž(๐‘–,๐‘—)Kipf, Thomas, et al."Neural relational inference for interacting systems." arXiv preprint arXiv:1802.04687 (2018).

Interacting physical system๐‘€๐‘†๐บ๐‘—๐‘ก Decoder๐’‰๐’• ๐Ÿ ๐บ๐‘…๐‘ˆ๐’‹๐’‰๐’•(๐’Š,๐’‹) ๐‘ง๐‘–๐‘—,๐‘˜ ๐‘“๐‘’๐‘˜ ๐’‰๐’•๐’Š , ๐’‰๐’•๐’‹๐‘˜๐‘– ๐‘—๐’‰(๐’Š,๐’‹)๐’•๐‘€๐‘†๐บ๐‘—๐‘ก , ๐’™๐’•๐’‹ , ๐’‰๐’‹๐๐’• ๐Ÿ ๐’™๐’•๐’‹ ๐‘“๐‘œ๐‘ข๐‘ก ๐’‰๐’‹๐’‹๐’•๐’• ๐Ÿ๐‘ ๐’™๐’• ๐Ÿ ๐’™๐’• , ๐’› ๐’ฉ ๐๐’• ๐Ÿ , ๐œŽ 2 ๐ˆKipf, Thomas, et al."Neural relational inference for interacting systems." arXiv preprint arXiv:1802.04687 (2018).

Interacting physical systemโ„’ ๐”ผ ๐‘ž๐œ™๐‘๐œƒ ๐’™ ๐’› ๐‘ ๐’› ๐‘‡๐‘ก 1 ๐‘๐œƒ๐’Š ๐’‹ ๐‘๐œƒ๐’™๐’• ๐Ÿ ๐’™๐’• , , ๐’™๐Ÿ ๐’› ๐’› ๐’™๐‘‡๐‘ก 1 ๐‘๐œƒlog ๐‘๐œƒ ๐’™ ๐’› KL ๐‘ž๐œ™ ๐’› ๐’™ ๐‘(๐’›)๐’™๐’• ๐Ÿ ๐’™๐’• ๐’› , since the dynamics is Markovian.๐’›๐’Š๐’‹ , the prior is a factorized uniform distribution over edge typesKipf, Thomas, et al."Neural relational inference for interacting systems." arXiv preprint arXiv:1802.04687 (2018).

Interacting physical systemKipf, Thomas, et al."Neural relational inference for interacting systems." arXiv preprint arXiv:1802.04687 (2018).

Interacting physical systemKipf, Thomas, et

Principles of graph neural network Updates in a graph neural network Edge update : relationship or interactions, sometimes called as 'message passing' ex) the forces of spring Node update : aggregates the edge updates and used in the node update ex) the forces acting on the ball Global update : an update for the global attribute ex) the net forces and total energy of the .