Deep Learning On Graphs - Michigan State University

Transcription

Book Website: https://cse.msu.edu/ mayao4/dlg book/Deep Learning on GraphsYao Ma and Jiliang Tang

Book Website: https://cse.msu.edu/ mayao4/dlg book/

Book Website: https://cse.msu.edu/ mayao4/dlg book/Contentspage xPreface12Deep Learning on Graphs: An Introduction1.1Introduction1.2Why Deep Learning on Graphs?1.3What Content is Covered?1.4Who Should Read the Book?1.5Feature Learning on Graphs: A Brief History1.5.1 Feature Selection on Graphs1.5.2 Representation Learning on Graphs1.6Conclusion1.7Further Reading1113689101313PART ONE15FOUNDATIONSFoundations of Graphs2.1Introduction2.2Graph Representations2.3Properties and Measures2.3.1 Degree2.3.2 Connectivity2.3.3 Centrality2.4Spectral Graph Theory2.4.1 Laplacian Matrix2.4.2 The Eigenvalues and Eigenvectors of theLaplacian Matrix2.5Graph Signal Processing2.5.1 Graph Fourier Transformiii171718191921232626282930

Book Website: https://cse.msu.edu/ mayao4/dlg book/ivContents2.62.72.82.93Complex Graphs2.6.1 Heterogeneous Graphs2.6.2 Bipartite Graphs2.6.3 Multi-dimensional Graphs2.6.4 Signed Graphs2.6.5 Hypergraphs2.6.6 Dynamic GraphsComputational Tasks on Graphs2.7.1 Node-focused Tasks2.7.2 Graph-focused TasksConclusionFurther ReadingFoundations of Deep Learning3.1Introduction3.2Feedforward Networks3.2.1 The Architecture3.2.2 Activation Functions3.2.3 Output Layer and Loss Function3.3Convolutional Neural Networks3.3.1 The Convolution Operation and ConvolutionalLayer3.3.2 Convolutional Layers in Practice3.3.3 Non-linear Activation Layer3.3.4 Pooling Layer3.3.5 An Overall CNN Framework3.4Recurrent Neural Networks3.4.1 The Architecture of Traditional RNNs3.4.2 Long Short-Term Memory3.4.3 Gated Recurrent Unit3.5Autoencoders3.5.1 Undercomplete Autoencoders3.5.2 Regularized Autoencoders3.6Training Deep Neural Networks3.6.1 Training with Gradient Descent3.6.2 Backpropagation3.6.3 Preventing Overfitting3.7Conclusion3.8Further 8585859606163636566676768717172

Book Website: https://cse.msu.edu/ mayao4/dlg book/ContentsPART TWOMETHODSv734Graph Embedding4.1Introduction4.2Graph Embedding on Simple Graphs4.2.1 Preserving Node Co-occurrence4.2.2 Preserving Structural Role4.2.3 Preserving Node Status4.2.4 Preserving Community Structure4.3Graph Embedding on Complex Graphs4.3.1 Heterogeneous Graph Embedding4.3.2 Bipartite Graph Embedding4.3.3 Multi-dimensional Graph Embedding4.3.4 Signed Graph Embedding4.3.5 Hypergraph Embedding4.3.6 Dynamic Graph Embedding4.4Conclusion4.5Further Reading7575777786899194949697991021041051065Graph Neural Networks5.1Introduction5.2The General GNN Frameworks5.2.1 A General Framework for Node-focused Tasks5.2.2 A General Framework for Graph-focused Tasks5.3Graph Filters5.3.1 Spectral-based Graph Filters5.3.2 Spatial-based Graph Filters5.4Graph Pooling5.4.1 Flat Graph Pooling5.4.2 Hierarchical Graph Pooling5.5Parameter Learning for Graph Neural Networks5.5.1 Parameter Learning for Node Classification5.5.2 Parameter Learning for Graph Classification5.6Conclusion5.7Further 361376Robust Graph Neural Networks6.1Introduction6.2Graph Adversarial Attacks6.2.1 Taxonomy of Graph Adversarial Attacks6.2.2 White-box Attack6.2.3 Gray-box Attack138138138139141144

Book Website: https://cse.msu.edu/ mayao4/dlg book/viContents6.36.46.56.2.4 Black-box AttackGraph Adversarial Defenses6.3.1 Graph Adversarial Training6.3.2 Graph Purification6.3.3 Graph Attention6.3.4 Graph Structure LearningConclusionFurther Reading1481511521541551591601607Scalable Graph Neural Networks7.1Introduction7.2Node-wise Sampling Methods7.3Layer-wise Sampling Methods7.4Subgraph-wise Sampling Methods7.5Conclusion7.6Further Reading1621621661681721741758Graph Neural Networks on Complex Graphs8.1Introduction8.2Heterogeneous Graph Neural Networks8.3Bipartite Graph Neural Networks8.4Multi-dimensional Graph Neural Networks8.5Signed Graph Neural Networks8.6Hypergraph Neural Networks8.7Dynamic Graph Neural Networks8.8Conclusion8.9Further Reading1761761761781791811841851871879Beyond GNNs: More Deep Models on Graphs9.1Introduction9.2Autoencoders on Graphs9.3Recurrent Neural Networks on Graphs9.4Variational Autoencoders on Graphs9.4.1 Variational Autoencoders for Node Representation Learning9.4.2 Variational Autoencoders for Graph Generation9.5Generative Adversarial Networks on Graphs9.5.1 Generative Adversarial Networks for NodeRepresentation Learning9.5.2 Generative Adversarial Networks for 99200201203

Book Website: https://cse.msu.edu/ mayao4/dlg book/Contents9.7Further ReadingPART THREE10APPLICATIONSvii203205Graph Neural Networks in Natural Language Processing10.1 Introduction10.2 Semantic Role Labeling10.3 Neural Machine Translation10.4 Relation Extraction10.5 Question Answering10.5.1 The Multi-hop QA Task10.5.2 Entity-GCN10.6 Graph to Sequence Learning10.7 Graph Neural Networks on Knowledge Graphs10.7.1 Graph Filters for Knowledge Graphs10.7.2 Transforming Knowledge Graphs to SimpleGraphs10.7.3 Knowledge Graph Completion10.8 Conclusion10.9 Further Reading20720720821121121321321421621821811Graph Neural Networks in Computer Vision11.1 Introduction11.2 Visual Question Answering11.2.1 Images as Graphs11.2.2 Images and Questions as Graphs11.3 Skeleton-based Action Recognition11.4 Image Classification11.4.1 Zero-shot Image Classification11.4.2 Few-shot Image Classification11.4.3 Multi-label Image Classification11.5 Point Cloud Learning11.6 Conclusion11.7 Further aph Neural Networks in Data Mining12.1 Introduction12.2 Web Data Mining12.2.1 Social Network Analysis12.2.2 Recommender Systems12.3 Urban Data Mining236236236237240244219220221221

Book Website: https://cse.msu.edu/ mayao4/dlg book/viiiContents12.412.512.61312.3.1 Traffic Prediction12.3.2 Air Quality ForecastingCybersecurity Data Mining12.4.1 Malicious Account Detection12.4.2 Fake News DetectionConclusionFurther Reading244246247247249250251Graph Neural Networks in Biochemistry and Healthcare13.1 Introduction13.2 Drug Development and Discovery13.2.1 Molecule Representation Learning13.2.2 Protein Interface Prediction13.2.3 Drug-Target Binding Affinity Prediction13.3 Drug Similarity Integration13.4 Polypharmacy Side Effect Prediction13.5 Disease Prediction13.6 Conclusion13.7 Further Reading252252252253254256258259262264264PART FOUR265ADVANCES14Advanced Topics in Graph Neural Networks14.1 Introduction14.2 Deeper Graph Neural Networks14.2.1 Jumping Knowledge14.2.2 DropEdge14.2.3 PairNorm14.3 Exploring Unlabeled Data via Self-supervised Learning14.3.1 Node-focused Tasks14.3.2 Graph-focused Tasks14.4 Expressiveness of Graph Neural Networks14.4.1 Weisfeiler-Lehman Test14.4.2 Expressiveness14.5 Conclusion14.6 Further 5Advanced Applications in Graph Neural Networks15.1 Introduction15.2 Combinatorial Optimization on Graphs15.3 Learning Program Representations281281281283

Book Website: https://cse.msu.edu/ mayao4/dlg book/Contents15.415.515.6Reasoning Interacting Dynamical Systems in PhysicsConclusionFurther ReadingBibliographyIndexix285286286289315

Book Website: https://cse.msu.edu/ mayao4/dlg book/PrefaceGraphs have been leveraged to denote data from various domains rangingfrom social science, linguistics to chemistry, biology, and physics. Meanwhile,numerous real-world applications can be treated as computational tasks ongraphs. For examples, air quality forecasting can be regarded as a node classification task, friends recommendation in social networks can be solved as alink prediction task and protein interface prediction can be regarded as a graphclassification task. To better take advantage of modern machine learning models for these computational tasks, effectively representing graphs plays a keyrole. There are two major ways to extract features to represent graphs including feature engineering and representation learning. Feature engineering relieson hand-engineered features, which is time-consuming and often not optimalfor given downstream tasks. While representation learning is to learn featuresautomatically, which requires minimal human efforts and is adaptive to givendownstream tasks. Thus, representation learning on graphs has been extensively studied.The field of graph representation learning has been greatly developed overthe past decades that can be roughly divided into three generations including traditional graph embedding, modern graph embedding, and deep learning on graphs. As the first generation of graph representation learning, traditional graph embedding has been investigated under the context of classicdimension reduction techniques on graphs such as IsoMap, LLE, and eigenmap. Word2vec is to learn word representations from a large corpus of textand the generated word representations have advanced many natural languageprocessing tasks. The successful extensions of word2vec to the graph domainhave started the second generation of representation learning on graphs, i.e.,modern graph embedding. Given the huge success of deep learning techniquesin representation learning in the domains of images and text, efforts have beenx

Book Website: https://cse.msu.edu/ mayao4/dlg book/Prefaceximade to generalize them to graphs, which have opened a new chapter of graphrepresentation learning, i.e., deep learning on graphs.More and more evidence has demonstrated that the third generation of graphrepresentation learning especially graph neural networks (GNNs) has tremendously facilitated computational tasks on graphs including both node-focusedand graph-focused tasks. The revolutionary advances brought by GNNs havealso immensely contributed to the depth and breadth of the adoption of graphrepresentation learning in real-world applications. For the classical applicationdomains of graph representation learning such as recommender systems andsocial network analysis, GNNs result in state-of-the-art performance and bringthem into new frontiers. Meanwhile, new application domains of GNNs havebeen continuously emerging such as combinational optimization, physics, andhealthcare. These wide applications of GNNs enable diverse contributions andperspectives from disparate disciplines and make this research field truly interdisciplinary.Graph representation learning is a rapidly growing field. It has attracted significant amounts of attention from different domains and consequently accumulated a large body of literature. Thus, it is a propitious time to systematicallysurvey and summarize this field. This book is our diligent attempt to achievethis goal by taking advantage of our teaching and research experience of manyyears in this field. In particular, we aim to help researchers to acquire essentialknowledge of graph representation learning and its wide range of applicationsand understand its advances and new frontiers.An Overview of the Book. This book provides a comprehensive introduction to graph representation learning with a focus on deep learning on graphsespecially GNNs. It consists of four parts: Foundations, Methods, Applications, and Advances. The Foundations part introduces the necessary background and basic concepts of graphs and deep learning. Topics covered bythe Methods part include modern graph embedding, GNNs for both simple andcomplex graphs, the robustness and scalability issues of GNNs, and deep graphmodels beyond GNNs. Each of these topics is covered by a chapter with fundamental concepts and technical details on representative algorithms. The Applications part presents GNN applications in the most typical domains includingNatural Language Processing, Computer Vision, Data Mining, Biochemistry,and Healthcare. One chapter is used to cover the most representative sub-fieldsadvanced by GNNs for each domain. New emerging methods and applicationdomains are discussed by the Advances part. For each chapter, further readingis included at the end for readers who are interested in more advanced topicsand recent trends.Target Audience. This book is designed to be as self-contained as possible

Book Website: https://cse.msu.edu/ mayao4/dlg book/xiiPrefacethough the basic background of graph theory, calculus, linear algebra, probability, and statistics can help better understand its technical details. Thus, itis suitable for a wide range of readers with diverse backgrounds and differentpurposes of reading. This book can serve as both a learning tool and a referencefor students at the senior undergraduate or graduate levels who want to obtaina comprehensive understanding of this research field. Researchers who wishto pursue this research field can consider this book as a starting point. Projectmanagers and practitioners can learn from GNNs applications in the book onhow to adopt GNNs in their products and platforms. Researchers outside ofcomputer science can find an extensive set of examples from this book on howto apply GNNs to different disciplines.Yao MaJiliang TangEast Lansing, MIAugust, 2020

Book Website: https://cse.msu.edu/ mayao4/dlg book/1Deep Learning on Graphs: An Introduction1.1 IntroductionWe start this chapter by answering a few questions about the book. First, wediscuss why we should pay attention to deep learning on graphs. In particular,why do we represent real-world data as graphs, why do we want to bridge deeplearning with graphs, and what are challenges for deep learning on graphs?Second, we introduce what content this book will cover. Specifically, whichtopics we will discuss and how we organize these topics? Third, we provideguidance about who should read this book. Especially what is our target audience and how to read this book with different backgrounds and purposes ofreading? To better understand deep learning on graphs, we briefly review thehistory under the more general context of feature learning on graphs.1.2 Why Deep Learning on Graphs?Since data from real-world applications have very diverse forms, from matrix and tensor to sequence and time series, a natural question that arises iswhy we attempt to represent data as graphs. There are two primary motivations. First, graphs provide a universal representation of data. Data frommany systems across various areas can be explicitly denoted as graphs suchas social networks, transportation networks, protein-protein interaction networks, knowledge graphs, and brain networks. Meanwhile, as indicated byFigure 1.1, numerous other types of data can be transformed into the formof graphs (Xu, 2017). Second, a vast number of real-world problems can beaddressed as a small set of computational tasks on graphs. Inferring nodes’ attributes, detecting anomalous nodes (e.g., spammers or terrorists), identifyingrelevant genes to diseases, and suggesting medicines to patients can be sum1

Book Website: https://cse.msu.edu/ mayao4/dlg book/2Deep Learning on Graphs: An IntroductionRaw DataLosslessGraphsLossyPairwiseSimple GraphWeighted PairwiseWeighted GraphDirected PairwiseTemporal PairwiseMatrixDirected GraphTemporal GraphTensorDynamic e SeriesHyper GraphHigher-order GraphFigure 1.1 Representing real-world data as graphs. The figure is reproducedfrom (Xu, 2017). Solid lines are utilized to denote lossless representations, whiledotted lines are used to indicate lossy representations. Note that we replace “network” in the original figure with “graph” .marized as the problem of node classification (Bhagat et al., 2011). Recommendations, polypharmacy side effect prediction, drug-target interaction identification, and knowledge graph completion are essentially the problem of linkprediction (Liben-Nowell and Kleinberg, 2007).Nodes on graphs are inherently connected that suggests nodes not independent. However, traditional machine learning techniques often assume that datais independent and identically distributed. Thus, they are not suitable to directly tackle the computational tasks on graphs. There are two main directionsto develop solutions. As shown in Figure 1.2, we will use node classificationas an illustrative example to discuss these two directions. One direction is tobuild a new mechanism specific to graphs. The classification problem designedfor graphs is known as collective classification (Sen et al., 2008) as demonstrated in Figure 1.2a. Different from traditional classification, for a node, col-

Book Website: https://cse.msu.edu/ mayao4/dlg book/1.3 What Content is Covered?3lective classification considers not only the mapping between its features andits label but also the mapping for its neighborhood. The other direction is toflatten a graph by constructing a set of features to denote its nodes where traditional classification techniques can be applied, as illustrated in Figure 1.2b.This direction can take advantage of traditional machine learning techniques;thus, it has become increasingly popular and dominant. The key to the success of this direction is how to construct a set of features for nodes (or noderepresentations). Deep learning has been proven to be powerful in representation learning that has greatly advanced various domains such as computervision, speech recognition, and natural language processing. Therefore, bridging deep learning with graphs present unprecedented opportunities. However,deep learning on graphs also faces immense challenges. First, traditional deeplearning has been designed for regular structured data such as images and sequences, while graphs are irregular where nodes in a graph are unordered andcan have distinct neighborhoods. Second, the structural information for regulardata is simple; while that for graphs is complicated especially given that thereare various types of complex graphs (as shown in Figure 1.1) and nodes andedges can associate with rich side information; thus traditional deep learningis not sufficient to capture such rich information. A new research field has beencultivated – deep learning on graphs, embracing unprecedented opportunitiesand immense challenges.1.3 What Content is Covered?The high-level organization of the book is demonstrated in Figure 1.3. Thebook consists of four parts to best accommodate our readers with diversebackgrounds and purposes of reading. Part ONE introduces basic concepts;Part TWO discusses the most established methods; Part THREE presents themost typical applications, and Part FOUR describes advances of methods andapplications that tend to be important and promising for future research. Foreach chapter, we first motivate the content that will be covered, then present thematerial with compelling examples and technical details; and finally, providemore relevant content as further reading. Next, we briefly elaborate on eachchapter. Part ONE: Foundations. These chapters focus on the basics of graphs anddeep learning that will lay the foundations for deep learning on graphs. InChapter 2, we introduce the key concepts and properties of graphs, GraphFourier Transform, graph signal processing, and formally define various

Book Website: https://cse.msu.edu/ mayao4/dlg book/4Deep Learning on Graphs: An ollectiveClassificationNodeFeaturesGraphGraph(a) Collective classification(b) Traditional classification with nodefeature extractionFigure 1.2 Two major directions to develop solutions for node classification ongraphstypes of complex graphs and computational tasks on graphs. In Chapter 3,we discuss a variety of basic neural network models, key approaches to traindeep models, and practical techniques to prevent overfitting during training. Part TWO: Methods. These chapters cover the most established methods

Book Website: https://cse.msu.edu/ mayao4/dlg book/1.3 What Content is Covered?1. IntroductionPart One: Foundations2. Foundations of Graphs3. Foundations of Deep LearningPart Two: Methods4. Graph Embedding5. Graph Neural Networks6. Robust Graph Neural Networks7. Scalable Graph Neural Networks8. Graph Neural Networks on Complex Graphs9. Beyond GNNs: More Deep Models on GraphsPart Three: Applications10. Graph Neural Networks in Natural Language Processing11. Graph Neural Networks in Computer Vision12. Graph Neural Networks in Data Mining13. Graph Neural Networks in Biochemistry and HealthcarePart Four: Advances14. Advanced Methods in Graph Neural Networks15. Advanced Applications in Graph Neural NetworksFigure 1.3 The high-level organization of the book5

Book Website: https://cse.msu.edu/ mayao4/dlg book/6Deep Learning on Graphs: An Introductionof deep learning on graphs from the basic to advanced settings. In Chapter 4, we introduce a general graph embedding framework from the information preserving perspective, provide technical details on representativealgorithms to preserve numerous types of information on graphs and presentembedding approaches specifically designed for complex graphs. A typicalgraph neural network (GNN) model consists of two important operations,i.e., the graph filtering operation and the graph pooling operation. In Chapter 5, we review the state of the art graph filtering and pooling operationsand discuss how to learn the parameters of GNNs for a given downstreamtask. As the generalizations of traditional deep models to graphs, GNNs inherit the drawbacks of traditional deep models and are vulnerable to adversarial attacks. In Chapter 6, we focus on concepts and definitions of graphadversarial attacks and detail representative adversarial attack and defensetechniques. GNNs perform the recursive expansion of neighborhoods acrosslayers. The expansion of the neighborhood for a single node can rapidly involve a large portion of the graph or even the whole graph. Hence, scalability is a pressing issue for GNNs. We detail representative techniques toscale GNNs in Chapter 7. In Chapter 8, we discuss GNN models that havebeen designed for more complicated graphs. To enable deep learning techniques to advance more tasks on graphs under wider settings, we introducenumerous deep graph models beyond GNNs in Chapter 9. Part THREE: Applications. Graphs provide a universal representation forreal-world data; thus, methods of deep learning on graphs have been appliedto various fields. These chapters present the most typical applications ofGNNs, including Natural Language Processing in Chapter 10, ComputerVision in Chapter 11, Data Mining in Chapter 12 and Biochemistry andHealthcare in Chapter 13. Part FOUR: Advances. These chapters focus on recent advances in bothmethods and applications. In Chapter 14, we introduce advanced methodsin GNNs such as expressiveness, depth, fairness, interpretability, and selfsupervised learning. We discuss more areas that GNNs have been appliedto, including Combinatorial Optimization, Physics, and Program Representation in Chapter 15.1.4 Who Should Read the Book?This book is easily accessible to readers with a computer science background.Basic knowledge of calculus, linear algebra, probability, and statistics can helpunderstand its technical details in a better manner. This book has a wide range

Book Website: https://cse.msu.edu/ mayao4/dlg book/71.4 Who Should Read the Book?32469571581410111312Figure 1.4 The guidance on how to read this book. Note that the number in thecircle indicates the corresponding chapter as shown in Figure 1.3.of target audiences. One target audience is senior undergraduate and graduate students interested in data mining, machine learning, and social networkanalysis. This book can be independently used for a graduate seminar courseon deep learning on graphs. It also can be utilized as a part of a course. Forexample, Part TWO and Part FOUR can be considered as advanced topics incourses of data mining, machine learning, and social network analysis, andPart THREE can be used as advanced methods in solving traditional tasks

Book Website: https://cse.msu.edu/ mayao4/dlg book/8Deep Learning on Graphs: An Introductionin computer vision, natural language processing, and healthcare. Practitioners and project managers, who want to learn the basics and tangible examplesof deep learning on graphs and adopt graph neural networks into their products and platforms, are also our target audience. Graph neural networks havebeen applied to benefit numerous disciplines beyond computer science. Thus,another target audience is researchers who do not have a computer sciencebackground but want to use graph neural networks to advance their disciplines.Readers with different backgrounds and purposes of reading should go throughthe book differently. The suggested guidance on how to read this book isdemonstrated in Figure 1.4. If readers aim to understand graph neural network methods on simple graphs (or Chapter 5), knowledge about foundationsof graphs and deep learning and graph embedding is necessary (or Chapters 2,3 and 4). Suppose readers want to apply graph neural networks to advancehealthcare (or Chapter 13). In that case, they should first read prerequisite materials in foundations of graphs and deep learning, graph embedding and graphneural networks on simple and complex graphs. (or Chapters 2, 3, 4, 5,and 8). For Part THREE, we assume that the readers should have the necessarybackground in the corresponding application field. Besides, readers should feelfree to skip some chapters if they have already been equipped with the corresponding background. Suppose readers know the foundations of graphs anddeep learning. In that case, they should skip Chapters 2 and 3 and only readChapters 4 and 5 to understand GNNs on simple graphs.1.5 Feature Learning on Graphs: A Brief HistoryAs aforementioned, to take advantage of traditional machine learning for computational tasks on graphs, it is desired to find vector node representations. Asshown in Figure 1.5, there are mainly two ways to achieve this goal: featureengineering and feature learning. Feature engineering relies on hand-designedfeatures such as node degree statistics, while feature learning is to learn nodefeatures automatically. On the one hand, we often do not have prior knowledgeabout what features are essential, especially for a given downstream task; thus,features from feature engineering could be suboptimal for the downstreamtask. The process requires an immense amount of human efforts. On the otherhand, feature learning is to learn features automatically, and the downstreamtask can guide the process. Consequently, the learned features are likely tobe suitable for the downstream task that often obtain better performance thanthose via feature engineering. Meanwhile, the process requires minimal humanintervention and can be easily adapted to new tasks. Thus, feature learning on

Book Website: https://cse.msu.edu/ mayao4/dlg book/1.5 Feature Learning on Graphs: A Brief odelsFeatureLearningDeepModelsNode FeatureExtractionRepresentationLearningFigure 1.5 Node feature extractiongraphs has been extensively studied, and various types of feature learning techniques have been proposed to meet different requirements and scenarios. Weroughly divide these techniques into feature selection on graphs that aims toremove irrelevant and redundant node features and representation learning ongraphs that targets on generating a set of new node features. In this section,we briefly review these two groups of techniques that provide a general andhistorical context for readers to understand deep learning on graphs.1.5.1 Feature Selection on GraphsReal-world data is often high-dimensional, and there exist noisy, irrelevant,and redundant features (or dimensions), particularly when considering a given

Book Website: https://cse.msu.edu/ mayao4/dlg book/10Deep Learning on Graphs: An Introductiontask. Feature selection aims to automatically select a small subset of featuresthat have minimal redundancy but maximal relevance to the target, such as theclass labels under the supervised setting. In many applications, the original features are crucial for knowledge extraction and model interpretation. For example, in genetic analysis for cancer study, in addition to differentiating canceroustissues, it is of greater importance to identify the genes (i.e., original features)that induce cancerogenesis. In these demanding applications, feature selectionis particularly preferred since it maintains the original features, and their semantics usually provide critical insights to the learning problem. Traditionalfeature selection assumes that data instances are independent and identicallydistributed (i.i.d.). However, data samples in many applications are embeddedin graphs that are inherently not i.i.d. It has motivated the research area offeature selection on graphs. Given a graph G {V, E} where V is the node setand E is the edge set, we assume that each node is originally associated with aset of d features F { f1 , f2 , . . . , fd }. Feature selection on graphs aims to selectK features from F to denote each node where K d. The problem was firstinvestigated under the supervised setting in (Tang and Liu, 2012a; Gu and Han,2011). A linear classifier is employed to map from the selected features to theclass labels, and a graph regularization term is introduced to capture structuralinformation for feature selection. In particular, the term aims to ensure thatconnected nodes with the selected features can be mapped into similar labels.Then, the problem was further studied under the unsupervised setting (Weiet al., 2016, 2015; Tang and Liu, 2012b). In (Tang and Liu, 2012b), pseudolabels are extracted from the structural information that serve as the supervision to guide the feature selection process. In (Wei et al., 2016), both the nodecontent and structural information are assumed to be generated from a set ofhigh-quality features that can be obtained by maximizing the likelihood ofthe generation

vi Contents 6.2.4 Black-box Attack 148 6.3 Graph Adversarial Defenses 151 6.3.1 Graph Adversarial Training 152 6.3.2 Graph Purification 154 6.3.3 Graph Attention 155