ISWC 2017 Tutorial: Semantic Data Management In Practice Part 2 .

Transcription

ISWC 2017 Tutorial:Semantic Data Management in PracticePart 2: Storage and QueryingOlaf HartigOlivier CuréLinköping UniversityUniversity of Paris-Est Marne la fhartig@oliviercure

Goals Achieve an initial understanding of the RDF databasemanagement ecosystemUnderstand differences between 7 identifiedproduction-ready storesISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying2

Overview RDF storageSeven production-ready RDF storesOntology Based Data AccessDemoAPIsISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying3

RDF Storage Although most production-ready RDF stores supportACID properties, they are best considered as–– OLAP (online analytical processing)not OLTP (On line transaction processing)This implies that updates are performed in batch– Mainly due to reasoning (see Section 5)ISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying4

RDF Storage RDF is a logical data model and thus does not imposeany physical storage solutionExisting RDF stores are either– based on an existing DataBase ManagementSystem, relational model, e.g., PostgreSQL NoSQL, e.g., Cassandra– Designed from scratch, e.g., as a Graph storeISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying5

RDF Stores TaxonomyISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying6

RDF Store EcosystemISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying7

RDF Distributed data management RDF storage is part of Big dataDistribution of RDF triples over a cluster of machinesISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying8

Overview RDF storageSeven production-ready RDF storesOntology Based Data AccessDemoAPIsISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying9

7 Production-Ready Systems They all guarantee– ACID transactions– Replication (mostly Master-Slave, some MasterMaster)– Partition (Range, Hashing)ISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying10

Data Models and Querying Some of these systems support other data models– XML for MarkLogic and Virtuoso– Property graph for GraphDB, BlazeGraph andStardogtypehttp://njh.me/ssn#QBE01geo lat 48.83geo lon 2.21comment e 2012ISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying11

Data Models and Querying Some of these systems support other data models– XML for MarkLogic and Virtuoso– Property graph for GraphDB, BlazeGraph andStardog– Relational for Virtuoso and Oracle– Document for MarkLogicHence other query languages than SPARQL (v1.1)can be supported– Gremlin for property graph, Xquery for XML, SQLfor relational, PrologISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying12

License Some of these systems have free editions but withsome feature or use limitations:– MarkLogic’s dev license is free for up to 1TB and10 months max– Stardog: community (10DB max with 25Mtriples/DB, 4 users), dev (no limits but 30 day trial)– Allegrograph: free and dev have restrictions of 5Mand 50M respectively– Virtuoso and GraphDB: free but no clustering andno replication– Blazegraph: free for a single machineAll systems have commercial editions (Oracle iscommercial only)ISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying13

Summary of production-ready systemsTriple storeFull-textsearchCloudreadyExtra featuresAllegrographIntegrated solrAMIBlazegraphIntegrated solrAMIReification done rightGraphDBIntegrated solr elacticsearch(ent.)AMIRDF rankingMarkLogicIntegratedAMIWith Xquery, JavascriptOracleIntegratedInline in SQLStardogIntegrated LuceneAMIIntegrity constraints, ExplanationsVirtuosoIntegratedAMIInline in SQLISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying14

Overview RDF storageSeven production-ready RDF storesOntology Based Data AccessDemoAPIsISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying15

OBDA (Ontology Based Data Access) Alternative Relevant when you have an existing (relational)database and want to reason over it using an ontologyThe ontology models the domain, hides the structureof the data sources and enriches incomplete dataThe ontology is connected to the data sources viamappings that relate concepts and properties to SQLviews over the sourcesQueries, expressed in SPARQL, are translated intothe sources query language (usually SQL)State of the art is OntopISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying16

Overview RDF storageSeven production-ready RDF storesOntology Based Data AccessDemoAPIsISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying17

Demo With Blazegraph (v2.1.4)– Website: https://www.blazegraph.com/– files/bigdata/2.1.4/blazegraph.jar/download– Start: java -server -Xmx4g -jar blazegraph.jar– http://localhost:9999/blazegraphAnd an extract of our sensor database instantiatingthe Semantic Sensor Network ontologyISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying18

Overview RDF storageSeven production-ready RDF storesOntology Based Data AccessDemoAPIsISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying19

Available APIs Two popular Java APIs to process and handle RDFdata and SPARQL queries are:– RDF4J (formerly Sesame)– Apache JenaThey both– provide a JDBC-like API and REST-like API– storing, querying and reasoning capabilitiesISWC 2017 Tutorial: Semantic Data Management in PracticeOlaf Hartig and Olivier CuréPart 2 – Storage and Querying20

- MarkLogic's dev license is free for up to 1TB and 10 months max - Stardog: community (10DB max with 25M triples/DB, 4 users), dev (no limits but 30 day trial) - Allegrograph: free and dev have restrictions of 5M and 50M respectively - Virtuoso and GraphDB: free but no clustering and no replication - Blazegraph: free for a single .