Graph And Link Anaylsis: Discovering Network Relationships .

Transcription

Graph and Link Analysis:Discovering Network Relationships in Big DataXavier Lopez, Ph.D.Senior Director, Product ManagementZhe Wu, Ph.D.Architect, DevelopmentSeptember 19, 2016Copyright 2016, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Internal/Restricted/Highly Restricted

Safe Harbor StatementThe following is intended to outline our general product direction. It is intended forinformation purposes only, and may not be incorporated into any contract. It is not acommitment to deliver any material, code, or functionality, and should not be relied uponin making purchasing decisions. The development, release, and timing of any features orfunctionality described for Oracle’s products remains at the sole discretion of Oracle.Copyright 2016, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Internal/Restricted/Highly Restricted2

Program Agenda with Highlight1Graph Data Management and Analysis: Usage & Use Cases2Oracle Big Data Spatial and Graph3In Memory Analyst (PGX)4What’s New5DemosCopyright 2016, Oracle and/or its affiliates. All rights reserved. 3

Relational Model vs. Property Graph Model Relational Model Graph ModelCourtesy: Tom Sawyer 2016Copyright 2016, Oracle and/or its affiliates. All rights reserved.

The Property Graph Data Model A set of vertices (or nodes)name “lop”lang “java”name “marko”age 29weight 0.43created9created178knows2name “vadas”age 27createdweight 1.0weight 0.4knowsweight 0.5weight 0.211126name “peter”age 354name “josh”age 3210weight 1.0created5name “ripple”lang “java”– each vertex has a unique identifier.– each vertex has a set of in/out edges.– each vertex has a collection of key-valueproperties. A set of edges (or links)– each edge has a unique identifier.– each edge has a head/tail vertex.– each edge has a label denoting type ofrelationship between two vertices.– each edge has a collection of lueprints/wiki/Property-Graph-ModelCopyright 2016, Oracle and/or its affiliates. All rights reserved. 5

How graph analysis enhances business intelligence Answers from Tabular Aggregation– Who spends the most?– Who buys the highest margin goods?– Who is most consistently a top contributor?Tabular questions:Well-suited to SQL-like tools Answers from Graph Connectivity– Who’s most influential?– Which supplier do I depend on the most?– What is the right product mix for millennials?Graph questions:We need something different!Copyright 2016, Oracle and/or its affiliates. All rights reserved.

How is graph analysis important to business? What patterns are there in fraudulent behavior? Which supplier am I most dependent upon? Who is the most influential customer? Do my products appeal to certain communities? What targeted products or services do I recommend to customers?Copyright 2016, Oracle and/or its affiliates. All rights reserved. 7

Graph Use Case Scenarios Fraud detection– Find parties in insurance data who are on both sides of multiple claims, who live near each other Internet of Things– Manage graph of interconnected devices and predict the effect of an disruptions across network Cyber Security– Find entry points and affected machines Border Control– Analyze flight histories of a suspicious passenger. Indentify his co-travelers, co-traveler’s cotravelers, Copyright 2016, Oracle and/or its affiliates. All rights reserved.

Graph Analysis in BusinessRecommend the mostsimilar item purchased bysimilar peopleProduct RecommendationFind out people that arecentral in the givennetwork – e.g. influencermarketingIdentify group of peoplethat are close to each other– e.g. target groupmarketingFind out all the sets ofentities that match to thegiven pattern – e.g. frauddetectionCommunity DetectionGraph Pattern MatchingCopyright 2016, Oracle and/or its affiliates. All rights reserved. 9Influencer Identificationcustomer itemsPurchase RecordCommunicationStream (e.g. tweets)

Program Agenda with Highlight1Graph Data Management and Analysis2Oracle Big Data Spatial and Graph: Architecture & Features3In-memory Analyst (PGX)4What’s New5DemosCopyright 2016, Oracle and/or its affiliates. All rights reserved. 10

Oracle Big Data Spatial and GraphProperty Graph ArchitectureGraph AnalyticsAccess LayerApache Blueprints & Lucene/SolrCloudJava, Groovy, Python, Java APIsREST/Web ServiceParallel In-Memory Graph Analytics (PGX)Java APIsOracle Big Data Spatial and GraphApache HBaseProperty graphformats supportedRDF (RDF/XML,NGraphMLTriples, N-Quads,GMLTriG,N3,JSON)Graph-SONFlat FilesCSVRelational Data SourcesOracle NoSQLDatabaseCopyright 2016, Oracle and/or its affiliates. All rights reserved. 11

Property Graph Workflow Graph Data Management– Transform and load relational data (or files) to a graph schema Analysis and Exploration (in-memory analysis engine)– Data scientists try different ideas (algorithms) on the data– Flexible, interactive, iterative, small-scale (sampled), . Production– Operational queries and reportingData EntitiesGraph PersistenceGraph Queryand AnalysisCopyright 2016, Oracle and/or its affiliates. All rights reserved.

Graph Construction: Convert from Relational to Flat FilesEmployeeTab Two Key Java APIs:– OraclePropertyGraphUtils.convertRDBMSTable2OPV (E)– ean20120.0102Mary2150.0 Key Steps:– Column Mapping– Data Type Definition– ConversionExample output .opv file1101,name,1,Jean,, ry,, 1102,age,2,,21,1102,salary,4,,50.0,Copyright 2016, Oracle and/or its affiliates. All rights reserved. 13

Data Access (APIs) Blueprints 2.3.0, Gremlin 2.3.0, Rexster 2.3.0 Groovy shell for accessing property graph data REST APIs (through Rexster integration) PGQL (Property Graph Query Languge)9/28/2016Copyright 2016, Oracle and/or its affiliates. All rights reserved. 14

Text Search through Apache Lucene/SolrCloud Integration with Apache Lucene & SolrCloud Support manual and auto indexing of Graph elements Manual index: oraclePropertyGraph.createIndex(“my index", Vertex.class); indexVertices oraclePropertyGraph.getIndex(“my index” ,Vertex.class); indexVertices.put(“key”, “value”, myVertex); Auto Index oraclePropertyGraph.createKeyIndex(“name”, Edge.class); oraclePropertyGraph.getEdges(“name”, “*hello*world”); Enables queries to use syntax like “*oracle* or *graph*”Copyright 2016, Oracle and/or its affiliates. All rights reserved. 15

Support for Cytoscape Open Source VisualizationCopyright 2016, Oracle and/or its affiliates. All rights reserved.

Program Agenda with Highlight1Graph Data Management and Analysis2Oracle Big Data Spatial and Graph: Architecture & Features3In-memory Analyst (PGX)4What’s New5DemosCopyright 2016, Oracle and/or its affiliates. All rights reserved. 17

Parallel In-Memory Graph Analyst An in-memory, parallel framework forfast graph analytics–Read a graph from NoSQL or HBase–Handles analytic workloads while the dataaccess layer handles transactionalworkloads–Supports multiple users/graphs–Dozens of graph analysis inNoSQL or HBase9/28/2016Copyright 2016, Oracle and/or its affiliates. All rights reserved. 18

Social Network Analysis Algorithms (1) Structure EvaluationRanking Conductance closenessCentralityUnitLength countTriangles degreeCentrality inDegreeDistribution eigenvectorCentrality outDegreeDistribution Hyperlink-Induced Topic Search (HITS) partitionConductance inDegreeCentrality partitionModularity nodeBetweennessCentrality sparsify outDegreeCentrality K-Core computes pagerankCommunity Detection personalizedPagerank communitiesLabelPropagation randomWalkWithRestart approximatePagerank weighted PagerankCopyright 2016, Oracle and/or its affiliates. All rights reserved. 19

Social Network Analysis Algorithms (2)PathfindingRecommendation fattestPath salsa shortestPathBellmanFord personalizedSalsa shortestPathBellmanFordReverse whomToFollow shortestPathDijkstra shortestPathDijkstraBidirectionalClassic - Connected Components shortestPathFilteredDijkstra sccKosaraju shortestPathFilteredDijkstraBidirectional sccTarjan shortestPathHopDist wcc shortestPathHopDistReverseCopyright 2016, Oracle and/or its affiliates. All rights reserved. 20

“No Coding” Graph AnalysisDegree CentralityheroInfluence analyst.inDegreeCentrality()Page RankheroPR analyst.pageRank().topK(15)Betweenness Centralityb analyst.betweenness().topK(15)Community Detectioncomic coms analyst.communities()Copyright 2016, Oracle and/or its affiliates. All rights reserved. 21

Computational Analytics: Built-in PackageRich set of built-in parallel graph algorithmsDetecting Components andCommunitiesTarjan’s, Kosaraju’s,Weakly ConnectedComponents, LabelPropagation (w/ variants),Soman and Narang’sSpacificationEvaluating Community Structures Conductance,ModularityClustering Coefficient(Triangle Counting)Adamic-AdarLink Prediction SALSA(Twitter’s Who-to-follow)Ranking and WalkingPagerank, Personalized Pagerank,Betwenness Centrality (w/ variants),Closeness Centrality, DegreeCentrality,Eigenvector Centrality, HITS,Random walking and sampling (w/variants) and parallel graph mutation operationsadgecHop-Distance (BFS)Dijkstra’s,Bi-directional Dijkstra’sBellman-Ford’sOther Classics Vertex CoverMinimum Spanning-Tree(Prim’s)dbgbegbehichCreate UndirectedGraphicCreate BipartiteGraphPath-FindingfdifdaLeft Set: “a,b,e”baThe original graphaSort-By-Degree (Renumbering)fdgeigbeFilteredSubgraphCopyright 2016, Oracle and/or its affiliates. All rights reserved. bdiafcghechiSimplify Graph22

Program Agenda with Highlight1Graph Data Management and Analysis2Oracle Big Data Spatial and Graph: Architecture & Features3In-memory Analyst (PGX)4What’s New5DemosCopyright 2016, Oracle and/or its affiliates. All rights reserved. 23

What’s New: Property Graph FeaturesBig Data Spatial and Graph 2.0Faster, more powerful and scalable Integration with Apache Spark Vertex Label Support PGQL: Declarative Graph Query Language Node.js Client Support Distributed In-memory Graph Analysis Many new SNA Algorithms Hortonworks 2.4; Apache Solr 5.2x Data type support: long, char, byte, short, spatial Conversion of CSV & Relational data to Graph and many more Copyright 2016, Oracle and/or its affiliates. All rights reserved. 24

Oracle Differentiators -- Graph Complete, Supported, Graph Solution:– Storage: NoSQL, Hbase, RDBMS back-ends– Data Access: Blueprints, Java, Property Graph Query Language (PGQL)– Rich Graph Analytics: 40 pre-built, in-memory graph algorithms Scalable:– Analyze 20-30 billion edge graph in memory on single BDA node– Persist extremely large graphs on disk Security: Secure NoSQL, Kerberos CDH 10-50x Faster than graph analysis competitorsCopyright 2016, Oracle and/or its affiliates. All rights reserved.

Program Agenda with Highlight1Graph Data Management and Analysis2Oracle Big Data Spatial and Graph: Architecture & Features3In-memory Analyst (PGX)4What’s New5DemosCopyright 2016, Oracle and/or its affiliates. All rights reserved. 26

Resources on Big Data Spatial and Graph Oracle Big Data Spatial and Graph on l-and-graph OTN product page (white papers, software downloads, dgraph Oracle Big Data Lite Virtual Machine - a free sandbox to get a-appliance/oracle-bigdatalite-2104726.html Hands On Lab for Big Data Spatial: tinyurl.com/BDSG-HOL Blog – examples, tips & tricks: blogs.oracle.com/bigdataspatialgraph @OracleBigData, @SpatialHannes, @JeanIhmCopyright 2016, Oracle and/or its affiliates. All rights reserved. 27

A set of vertices (or nodes) – each vertex has a unique identifier. – each vertex has a set of in/out edges