An Introduction To RDF, Property, And Network . - Oracle

Transcription

Many Graphs for Many Uses:An Introduction to RDF, Property, and Network Graphs inOracle Database 12cXavier Lopez, Senior Director, OracleZhe Wu, ArchitectCopyright 2015 Oracle and/or its affiliates. All rights reserved.

Safe Harbor StatementThe following is intended to outline our general product direction. It is intended forinformation purposes only, and may not be incorporated into any contract. It is not acommitment to deliver any material, code, or functionality, and should not be relied uponin making purchasing decisions. The development, release, and timing of any features orfunctionality described for Oracle’s products remains at the sole discretion of Oracle.Copyright 2015 Oracle and/or its affiliates. All rights reserved.

Program Agenda1Oracle Spatial and Graph (RDF Graph)2Oracle Spatial and Graph (Spatial Graph)3Big Data Spatial and Graph (Property Graph)Copyright 2015 Oracle and/or its affiliates. All rights reserved.4

Linked DataRDF Graph DatabaseCopyright 2015 Oracle and/or its affiliates. All rights reserved.

Graph Data ModelsProperty Graph ModelSocial NetworkAnalysis Graph Data Management Social Network Analysis Entity analyticsSpatial NetworkAnalysisNetwork Data Model Network path analysis Transportation modelingRDF Data ModelLinked Data /Metadata LayerUse Case Data federation Knowledge representation Semantic WebGraph Model Public Safety Social Media search Marketing - Sentiment LogisticsTransportationUtilitiesTelcoms Life SciencesHealth CarePublishingFinanceIndustry DomainCopyright 2015 Oracle and/or its affiliates. All rights reserved.6

RDF Graph: A Purpose-built Graph ModelLinked Data Unified metadata model fordistributed data sources Flexible metadata productvocabularies and ontologies Validates semantic andstructural consistencyCopyright 2015 Oracle and/or its affiliates. All rights reserved.7

Linked Data in Support of Federated QueryGraph-based Metadata Layer–W3C standard, flexible model forsparse and evolving data–Common vocabulary enablesdata integration & appdevelopment–Relational data stays in place,apps don’t need to changeMid-Tier ServerApplication 2Application 1Application 3SPARQLMetadata CatalogSQLRDF GraphInventoryGraphSales GraphHR SchemaInventory SchemaSales SchemaHRDatabaseInventoryDatabaseSales DatabaseDatabase ServerCopyright 2015 Oracle and/or its affiliates. All rights reserved.8

Integration and Discovery of Open and Proprietary DataAppsRDF Graph (metatdata, ontolgies)LIMS DatabaseGenomic DataMolecularModelsInstance DataRDF graph is an enterprise metadata framework. The metadata graph associates underlyinginstance data to other data resources based on their semantics. This linking of resources enablesinteroperability between apps that exchange data.Copyright 2015 Oracle and/or its affiliates. All rights reserved.9

Enterprise Information HarmonizationIndustries Life Sciences Health Care Finance Media Networks &Communications Defense & IntelligenceHutchinson3G AustriaCopyright 2015 Oracle and/or its affiliates. All rights reserved.10

Consolidated Knowledge LayerBusiness Challenge Link database information on genes,proteins, metabolic pathways, compounds,ligands, etc. to original sources. Increase productivity for accessing, sharing,searching, navigating, cross-linking,analyzing internal /external dataSolution Semantic integration layer using RDF graph Rich domain-specific terminology (biology,chemistry and medicine) 1.6 M terms Terminology Hub: 8 GB of referential data(ontologies) that cross-reference various datarepositories.Copyright 2015 Oracle and/or its affiliates. All rights reserved.11

EU Publications OfficeLinked Metadata Platform for European UnionObjectivesBenefits Common metadata model supports: Evolving data model that flexibly supports a variety of Search and discovery of EU Publications Multiple domains and languagesSolution Validate and tag EU law, tenders, and publicityto standardized vocabularies Unified RDF graph metadata model Supports discovery of content through user’sterminology and language Provides variety of dissemination modesbusiness use cases Scalability: Over billion RDF triples in Oracle Graph DB 2.5 TB of compressed data in Oracle DB Links to 3.9 TB (60M) files of EU pubs Reliability and maintainability Oracle ASM (Automatic Storage Management) Two failover systemsCopyright 2015 Oracle and/or its affiliates. All rights reserved.12

National Intelligence AgencyExtracted Entities &RelationshipsInformationExtractionuy UyFeature Extraction,Term ExtractionHuPAAukAduuMuPA d uddBy MSearch, Presentation, Report,Visualization, Queryy yyu?P?udHuy P ky P kRDFIntelligence OntologiesSQL/SPARQLEnterprise DataSpatial images DocumentsData SourcesContents RepositoryDatabasesWeb resourcesBlogs, Mails, news, RSS feedsNational Intelligence ScenarioCopyright 2015 Oracle and/or its affiliates. All rights reserved.13

Oracle Database 12c RDF Semantic Graph Database Exadata ready Compression & partitioning Parallel load, inference, query High availability Label security: triple-level W3C standards compliance Semantic Indexing of text Enterprise ManagerLoad /StorageQueryReasoningAnalytics Native RDF graph data store Manages trillions of triples Optimized storage architecture SPARQL-Jena/Joseki/Fuseki SQL/graph query, B-tree indexing Ontology assisted SQL query RDFS, OWL2 RL, EL, SKOSUser-defined rulesIncremental, parallel reasoningUser-defined inferencingPlug-in architecture Semantic indexing framework Integration with OBIEE, Oracle R Enterprise Oracle Data MiningCopyright 2015 Oracle and/or its affiliates. All rights reserved.14

Viewing Transforming Relational Data to RDFRDB to RDF Mapping RDF views on relational tables Enables SPARQL query ondistributed resources Views: Automatic and custom Aligns with W3C RDB2RDF standard No duplication of data and storageCopyright 2015 Oracle and/or its affiliates. All rights reserved.15

RDF Semantic Graph: Graphical ToolsGraph VisualizationCytoscapeOntology ModelingRDF StudioProtégéSQL DeveloperCopyright 2015 Oracle and/or its affiliates. All rights reserved.16

Core Inferencing Features Forward-chaining based inference engine in the database-- Removes on-the-fly reasoning and results in fast query times Native rulebases: RDFS, OWL 2 RL, OWL 2 EL, SKOS- SNOMED (subset of OWL 2 EL) Validation of inferred data Proof generation User defined inferencing- Temporal reasoning, Spatial reasoning Ladder Based Inference- Fine grained security for inference graph Integration with external OWL 2 reasoners (TrOWL)Copyright 2015 Oracle and/or its affiliates. All rights reserved.17

Manageability of RDF Semantic GraphBuilt in support from Oracle Database utilities and toolsIngest / Replicate /RecoverBulk load: Apache Jena bulk loader Oracle external tables & SQL*Loader (Direct Path)w/ PL/SQL Bulk Load APIReplicate & recover: Data Guard: physical standby Data Pump: staging tables Recovery Manager: RMANTune / AnalyzeTune load/ query/ inference: Parallelism Btree indexing triple/quad Typed literals indexing SPARQL query hints Statistics gathering Dynamic SamplingAnalyze performance: Enterprise Manager: viewoptimizer plans, monitorexecution / resource usageManageControl query execution: in database & Jena clientCreate & monitor graphw/ SQL Developer: Semantic Network Models, virtual models Btree indexes Rule bases Entailments Security data labels Semantic index policiesCopyright 2015 Oracle and/or its affiliates. All rights reserved.18

World’s Fastest Big Data Graph Benchmark1 Trillion Triple RDF Benchmark with Oracle Spatial and GraphOracle Database 12c can load, query andinference millions of RDF graph edgesper second World’s fastest data loading performance World’s fastest query performanceMillions of triples per second Worlds fastest inference performance2.00 Massive scalability: 1.08 trillion edges1.421.501.521.131.00 Platform: Oracle Exadata X4-2 Database Machine0.50 Source: w3.org/wiki/LargeTripleStores, 9/26/20140.00QueryLoad InferenceCopyright 2015 Oracle and/or its affiliates. All rights reserved.19

What Sets Oracle RDF Triple Database Apart? Scalability: Trillions of triples Transactional: Concurrent loading and updates with ACID properties Security: OLS security labels at “triple” level (OLS). Standards based: W3C, SPARQL, RDF, OWL, REST, JSON Manageable: Use existing DB tools, utilities and expertise Multi-type support: graph, relational, geospatial Copyright 2015 Oracle and/or its affiliates. All rights reserved.20

Upcoming RDF Graph Features SPARQL 1.1 completion and performance Major enhancements in Jena adapter Performance improvements in querying RDF views of relational data Improved visualization and management of RDF using graphical toolsCopyright 2015 Oracle and/or its affiliates. All rights reserved.21

Graph Data ModelsProperty Graph ModelSocial NetworkAnalysis Graph Data Management Social Network Analysis Entity analyticsSpatial NetworkAnalysisNetwork Data Model Network path analysis Transportation modelingRDF Data ModelLinked Data /Metadata LayerUse Case Data federation Knowledge representation Semantic WebGraph Model Public Safety Social Media search Marketing - Sentiment LogisticsTransportationUtilitiesTelcoms Life SciencesHealth CarePublishingFinanceIndustry DomainCopyright 2015 Oracle and/or its affiliates. All rights reserved.22

Oracle SpatialNetwork GraphCopyright 2015 Oracle and/or its affiliates. All rights reserved.23

Oracle Spatial and GraphCompleteOpenIntegratedMost Widely UsedCopyright 2015 Oracle and/or its affiliates. All rights reserved.24

Graph Features – Network Data Model Graph A storage model to represent graphs andnetworks Graph tables consist of links and nodes Explicitly stores and maintains connectivityof the network graph Attributes at link and node level Logical or spatial graphs Directed and undirected graphs with orwithout cost Java API to perform Analysis in memory Loads and retains only the partitions needed Dynamic costs Multi-level / priority routing Shortest path, within cost, nearest neighbors Traveling salesman, spanning tree, . Multiple Cost Support in Path/SubpathAnalysis Can logically partition the network graphCopyright 2015 Oracle and/or its affiliates. All rights reserved.25

Network Data ModelA purpose-built graph model Road and Multimodal Networks Drive Time Polygon Analysis Trade Area Management Service Delivery OptimizationOra cleSpa tia l a ndGra ph Water, Gas, Electric Utility,Network ApplicationsCopyright 2015 Oracle and/or its affiliates. All rights reserved.26

Directorate Data Service, UKIllustrations Transport for LondonCopyright 2015 Oracle and/or its affiliates. All rights reserved.27

COTRAL, larger Rome region, ItalyCityServiceProactive info on route changes,optimal route planningCityOperationGeographical Motion Monitoringand Management withMapViewerCityInfrastructureOracle Spatial Database storesreal time bus infoSustainableCityBus position, speed, enginestatus, # passengers, fuel Illustrations COTRALCopyright 2015 Oracle and/or its affiliates. All rights reserved.28

Istanbul Municipality Diverse Transport Modes– Buses– Tramways– Metro– Trains But also– Ferries– Private mini-buses– Shared taxisCopyright 2015 Oracle and/or its affiliates. All rights reserved.29

GarminConnect Fitness Data PortalObjectives Match user’s fitness activities to popular routes Create leader boards for popular routesSolution Stores and simplifies processing of more than 5billion miles of user activities Utilizes parallel processing, DB partitioning andpipelined table functions to analyze the data onOracle Exadata Matches user’s activity to a segment using LRSCopyright 2015 Oracle and/or its affiliates. All rights reserved.30

Consistent Vision Use the unique qualities of Oracle technology to create a Geospatial DataPlatform to address challenges being faced by “GIS” systems & users Make geospatial information available to every operational system, everyapplication, every analysis tool in the way that is appropriate to thedeveloper or the application So users get the most value from their investmentsCopyright 2015 Oracle and/or its affiliates. All rights reserved.31

Oracle Big Data Spatial and GraphProperty GraphCopyright 2015 Oracle and/or its affiliates. All rights reserved.32

Graph Data ModelsProperty Graph ModelSocial NetworkAnalysis Graph Data Management Social Network Analysis Entity analyticsSpatial NetworkAnalysisNetwork Data Model Network path analysis Transportation modelingRDF Data ModelLinked Data /Metadata LayerUse Case Data federation Knowledge representation Semantic WebGraph Model Public Safety Social Media search Marketing - Sentiment LogisticsTransportationUtilitiesTelcoms Life SciencesHealth CarePublishingFinanceIndustry DomainCopyright 2015 Oracle and/or its affiliates. All rights reserved.33

The Property Graph Data Model A set of vertices (or nodes)name “lop”lang “java”weight 0.4name “marko”age 293createdweight 0.29created18createdweight 1.07weight 0.4knowsweight 0.5knows211126name “peter”age 35name “josh”age 32410weight 1.0name “vadas”age 27created5name “ripple”lang “java”– each vertex has a unique identifier.– each vertex has a set of in/out edges.– each vertex has a collection of key-valueproperties. A set of edges (or links)– each edge has a unique identifier.– each edge has a head/tail vertex.– each edge has a label denoting type ofrelationship between two vertices.– each edge has a collection of lueprints/wiki/Property-Graph-ModelCopyright 2015 Oracle and/or its affiliates. All rights reserved.34

Modeling and Analyzing The Internet of Things Cyber-SecuityCritical / Alternate Path AnalysisCommunity DetectionNetwork MonitoringPredictive Analysis Multiple System Impact AnalysisTransportationUtilitiesFinanceCopyright 2015 Oracle and/or its affiliates. All rights reserved.35

Property Graph : Usage Scenarios Insurance fraud detection– Find parties in insurance data who are on both sides of multiple claims, who live near each other Software Code Analysis– Score the risk of individual change and determine need for review, probability of affecting schedule Border Control– Analyze flight histories of a suspicious passenger. Indentify his co-travelers, co-traveler’s cotravelers, Electrical Grid– Determine the effect of an outage across network Network intrusion forensics– Find entry points and affected machinesCopyright 2015 Oracle and/or its affiliates. All rights reserved.

Common Graph Analysis Use CasesRecommend the mostsimilar item purchased bysimilar peopleProduct RecommendationFind out people that arecentral in the givennetwork – e.g. influencermarketingInfluencer IdentificationIdentify group of peoplethat are close to each other– e.g. target groupmarketingCommunity DetectionFind out all the sets ofentities that match to thegiven pattern – e.g. frauddetectionGraph Pattern Matchingcustomer itemsPurchase RecordCommunicationStream (e.g. tweets)Copyright 2015 Oracle and/or its affiliates. All rights reserved.37

Property Graph Workflow Graph Data Management– Raw business data is converted to a graph schema– In Database graph queries using SQL (useful for breath first search) Analysis and Exploration (in-memory analysis engine)– Data scientists try different ideas (algorithms) on the data– Flexible, interactive, iterative, small-scale (sampled), .Data EntitiesGraph Persistence(RDBMS)Graph Queryand AnalysisCopyright 2015 Oracle and/or its affiliates. All rights reserved.

Architecture of Property Graph SupportGraph AnalyticsJava APIsGraph Data Access Layer (DAL)Blueprints & Lucene/SolrCloudJava, Groovy, Python, REST/Web ServiceParallel In-Memory Graph Analytics (PGX)Java APIs/JDBC/SQL/PLSQLScalable and Persistent Storage ManagementApache HBaseProperty GraphformatsGraphMLGMLGraph-SONFlat FilesOracle NoSQL DatabaseCopyright 2015 Oracle and/or its affiliates. All rights reserved.39

In-Memory Parallel Graph AnalyticsGraph Analyst An in-memory, parallel framework forfast graph analytics–Read a graph from Oracle Database(through data access layer) SQL parallel query support on PG data–Handles analytic workloads while the dataaccess layer handles transactionalworkloads–Supports concurrent sessions and nalystTransactionalRequestSQLData StorageCopyright 2015 Oracle and/or its affiliates. All rights reserved.(delta update)40

Two Kinds of Graph WorkloadsComputational Graph AnalyticsGraph Pattern MatchingexConnected ComponentsModularity ConductanceCompute certain values onnodes and edgesWhile (repeatedly) traversingShortest Pathor iterating on the graphSpanning TreePagerankClustering CoefficientCentralityfriendfriendfriendGiven a description of apatternFind every sub-graphthat matches itIn certain procedural waysColoringIn-Memory Analyst supports both kinds, as well as combinations of the two11/3/2015Copyright 2015 Oracle and/or its affiliates. All rights reserved.41

In-Memory Parallel Graph Analytics : APIs, Formats,Analytics, Search Optimized built-in graph analytics and search– 30 parallel algorithm implementations– Clustering, ranking, path finding, recommendation, and more– Graph pattern search Client Shell Groovy integration Graph format (in addition to those supported by the data access layer)– EBin, Adjacency list, Edge List J2EE container support (WLS, Tomcat)11/3/2015Copyright 2015 Oracle and/or its affiliates. All rights reserved.42

Computational Analytics: Built-in Package and parallel graph mutation operationsRich set of built-in parallel graph algorithmsDetecting Components andCommunitiesTarjan’s, Kosaraju’s,Weakly ConnectedComponents, LabelPropagation (w/ variants),Soman and Narang’sSpacificationEvaluating Community Structures Conductance,ModularityClustering Coefficient(Triangle Counting)Adamic-AdarLink Prediction SALSA(Twitter’s Who-to-follow)aPagerank, Personalized Pagerank,Betwenness Centrality (w/ variants),Closeness Centrality, DegreeCentrality,Eigenvector Centrality, HITS,Random walking and sampling (w/variants)Hop-Distance (BFS)Dijkstra’s,Bi-directional Dijkstra’sBellman-Ford’sfddaLeft Set: “a,b,e”gfgbbdeegibhiCreate UndirectedGraphicadSort-By-Degree (Renumbering)fdgbegbiOther Classics Vertex CoverMinimum Spanning-Tree(Prim’s)hcecCreate BipartiteGraphPath-FindingaThe original graphRanking and WalkingebdiafcgehhcFilteredSubgraphCopyright 2015 Oracle and/or its affiliates. All rights reserved.iSimplify Graph43

Text Search through Apache Lucene/Solr Cloud Integration with Apache Lucene/Solr Support manual and auto indexing of Graph elements Manual index: oraclePropertyGraph.createIndex(“my index", Vertex.class); indexVertices oraclePropertyGraph.getIndex(“my index” ,Vertex.class); indexVertices.put(“key”, “value”, myVertex); Auto Index oraclePropertyGraph.createKeyIndex(“name”, Edge.class); oraclePropertyGraph.getEdges(“name”, “*hello*world”); Enables queries to use syntax like “*oracle* or *graph*”Copyright 2015 Oracle and/or its affiliates. All rights reserved.44

The Big Picture – Oracle Big Data Management SystemDATA RESERVOIRDATA WAREHOUSECloudera HadoopOracle Big DataConnectorsOracle Big Data SQLOracle DatabaseIn-Memory, Multi-tenantOracle NoSQLOracle Industry ModelsOracle R DistributionOracle AdvancedAnalyticsOracle Big DataSpatial and GraphBig Data ApplianceApacheFlumeOracleGoldenGateOracle Spatial and GraphOracle DataIntegratorExadataOracleGoldenGateOracle EventProcessingSOURCESOracle EventProcessingBOracle DataIntegratorCopyright 2015 Oracle

Entity analytics Life Sciences Health Care Publishing Finance Spatial Network Analysis Logistics Transportation Utilities . Built in support from Oracle Database utilities and tools . Control query execution: in database & Jena client . Creat