Introduction To Linked Data - PlanetData

Transcription

Introduction to Linked DataMarko Grobelnik, Andreas Harth, Dumitru RomanBig Linked Data Tutorial Semantic Days 2012

Tutorial AgendaIntroduction to Linked Data (45 m – 60 m) AndreasConsuming Norwegian Linked Data (30 m) TitiLarge Scale Linked Data Management (30 m) AndreasBig Data Intro and Analytics (60 m – 90 m) MarkoQuestions & Answers Session (30 m) allMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Introduction to Linked Data (Andreas)MotivationLinked DataPrinciples (Web Architecture and RDF, Resource DescriptionFramework)SPARQL RDF Query LanguageOntology LanguagesRDF Vocabulary Description Language (RDFS)Web Ontology Language (OWL)Application ArchitecturesSummaryMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

MOTIVATIONMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

MotivationWith increased use of computers more and more data isbeing storedOrganisations rely on data for business decisionsData drives policy decisions in governmentIndividuals rely on data from the Web for information andcommunicationData volumes explodeMore and more data available on the Web is represented inSemantic Web standardsLinking Open Data (LOD) initiativeSemantic Web technologies facilitate the integration of datafrom multiple sourcesCombining data from multiple sources enables insightsMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Linked Data on the Web2007-10Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Linked Data on the Web2007-11Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Linked Data on the Web2008-02Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Linked Data on the Web2008-03Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Linked Data on the Web2008-09Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Linked Data on the Web2009-03Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Linked Data on the Web2009-07Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Linked Data on the Web2010-09Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Linked Data on the Web2011-09Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Types of Data in the Linking Open Data e/ (Sept 2010)Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Scenario OverviewSemantic Technologies facilitate access to data1. Query?2. AnswerQ: data about Berlin?Q: famous people that died in Berlin?Q: data about Hegel?Q: Hegel’s publications?Q: data about Marlene Dietrich?Q: Dietrich’s songs?Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data!

DBpediaLinked Data version of WikipediaScripts that extract data (text, links, infoboxes) fromWikipediaPublished as Linked DataInterlinking hub in the Linked Data tp://dbpedia.org/resource/Georg Wilhelm Friedrich HegelMarlene Dietrichhttp://dbpedia.org/resource/Marlene DietrichMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

BBC MusicData about BBC (radio) programmes, artists, songs Combination of BBC-internal data (playlists), MusicBrainz(artists, albums), Wikipedia (artists)Underpinning the BBC Music websiteData published according to Linked Data principlesMarlene a-b83f-49ca-883c02b20c7a9dd5.rdf#artistMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Virtual International Authority File (VIAF)Joint project of national libraries and related organisations21 institutions, among them the Library of Congress, DeutscheNationalbibliothek, Bibliothèque nationale de FranceProvide access to “authority files”Matching and interlinking collections from /89774942/Marlene Dietrichhttp://viaf.org/viaf/97773925/Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

LINKED DATA PRINCIPLESMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Semantic TechnologiesSemantic Web technologies,standardised by the W3C, aremature:RDF recommendation in 1999,update in 2004RDFa (RDF in HTML) note in 2008RDFS recommendation in 2004SPARQL recommendation in 2008OWL recommendation in 2004,update in 2009Linked Data is a subset of theSemantic Web stack, including webarchitecture:IRI (IETF RFC 3987, 2005)HTTP (IETF RFC 2616, 1999)Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Linked Data Principles1.2.3.4.Use URIs as names for thingsUse HTTP URIs so that people can look up those names.When someone looks up a URI, provide usefulinformation, using the standards (RDF*, SPARQL)Include links to other URIs. so that they can discover taMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

1. Use URIs as Names for ThingsUse a unique identifier to denote thingsURIs are defined in RFC 2396Hegel, Georg Wilhelm Friedrichhttp://dbpedia.org/resource/Georg Wilhelm Friedrich Hegelhttp://viaf.org/viaf/89774942/ Hegel, Georg Wilhelm Friedrich: Gesammelte Werke /Vorlesungen über die Logikurn:isbn:978-3-7873-1964-0Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Names for ThingsMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

2. Use HTTP URIsEnables “lookup” of URIsVia Hypertext Transfer Protocol (HTTP)Piggy-backs on hierarchical Domain Name System toguarantee uniqueness of identifiersUses established HTTP infrastructureConnects logical level (thing) with physical level (source)Important: distinction between “thing URI” and “sourceURI” („other resource“ vs. „information resource“)Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Information Resources vs. Other ResourcesMarlene Dietrich, the personFile containing data aboutMarlene DietrichMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked DataName?Creator?Birth date?Last change date?License?Copyright?

Correspondence between thing-URI and source-URI(„hash URIs“)User 83f-49ca-883c-02b20c7a9dd5.rdf#artistHTTPGETRDFWeb b83f-49ca-883c-02b20c7a9dd5.rdfMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

RESPONSEREQUESTHypertext Transfer Protocol (HTTP) curl -H "Accept: application/rdf xml" -49ca-883c02b20c7a9dd5.rdf#artist GET 5.rdfHTTP/1.1 User-Agent: curl/7.25.0 Host: bbc.co.uk Accept: application/rdf xml {HTTP/1.1 200 OKDate: Tue, 08 May 2012 07:12:19 GMTServer: Apache/2.2.3 (Red Hat)Content-Type: application/rdf xmlContent-Length: 1956[data not shown]Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Correspondence between thing-URI and source-URI(„slash URIs“)User Agenthttp://dbpedia.org/resource/Marlene DietrichHTTPGET303 HTTPGETWeb ServerRDFhttp://dbpedia.org/data/Marlene Dietrichhttp://dbpedia.org/page/Marlene DietrichMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

3. Provide Useful InformationWhen somebody looks up a URI, return data using thestandards (RDF*, SPARQL)Resource Description Framework, a format for encodinggraph-structured data (with URIs to identify nodes/verticesand links/edges)Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Resource Description FrameworkDirected, labeled graphtriple(subject, predicate, object)subject: URI (or blank node)predicate: URIobject: URI (or blank node) or RDF literal (string, integer, date )RDF/XML is the most widely deployed serialisationOther serialisations possible (N-Triples, Turtle, Notation3 )Quadruples (or quads) used as internal representation whenintegrating dataquad(subject, predicate, object, context)context: URI (used to store origin of triple)Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

RDF Exampledbpedia:Georg Wilhelm Friedrich Hegel rdf:typefoaf:Person .dbpedia:Georg Wilhelm Friedrich Hegel rdf:typeyago:PoliticalPhilosophers .dbpedia:Georg Wilhelm Friedrich Hegelrdfs:comment "Georg Wilhelm Friedrich Hegel varen tysk filosof."@no .dbpedia:Georg Wilhelm Friedrich Hegel dbpediaowl:influenced dbpedia:Francis Fukuyama .dbpedia:Georg Wilhelm Friedrich Hegel dbpediaowl:influenced dbpedia:Friedrich Nietzsche .Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Merging Data with RDF Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

4. Link to Other URIsEnable people (and machines) to jump from server toserverExternal links vs. internal links (for any predicate)Special owl:sameAs links to denote equivalence ofidentifiers (useful for data merging)Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Equivalences via edia.org/resource/Georg Wilhelm Friedrich org/resource/Marlene ia.org/resource/Marlene Dietrich 87186835223032381 - Berlin 28088573361 - Gaspe Peninsula (Quebec) (?)Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

SPARQL RDF PROTOCOL ANDQUERY LANGUAGEMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

SPARQLSPARQL Protocol and RDF Query LanguageQuery language for RDF graphs“SQL for RDF”SPARQL specification consists ofQuery languageResult formats (representation of results in RDF and XML)Query protocol (mechanisms to pose queries and retrieve results)Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Simple Query ExamplePREFIX dct: http://purl.org/dc/terms/ PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# SELECT *WHERE {?s dct:subject http://dbpedia.org/resource/Category:People from Stavanger .?s rdfs:label ?name.}Main part is query pattern (WHERE clause)Using Turtle syntax for RDFQuery patterns may contain variables (?s, ?name)Shortcuts for URIs (PREFIX)Query results via selection of variables (SELECT)Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Query ResultsTable with one row per result?s?namehttp://dbpedia.org/resource/Erik Nevland"Erik Nevland"@nohttp://dbpedia.org/resource/Jan Simonsen"Jan Simonsen"@nohttp://dbpedia.org/resource/Laila Goody"Laila Goody"@nohttp://dbpedia.org/resource/Henriette Henriksen"Henriette Henriksen"@nohttp://dbpedia.org/resource/Guri Hjeltnes"Guri Hjeltnes"@nohttp://dbpedia.org/resource/Johan E. Holand"Johan E. Holand"@nohttp://dbpedia.org/resource/Kristian Valen"Kristian Valen"@no Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Further FunctionalityOptional triple patterns (e.g., return name and optionallybirthdate if available)Unions (e.g., return material scientists and also physicists)Filter (e.g., only return scientists born before 1970)Result formats (e.g., return RDF triples instead of resultstable)Modificators (e.g., sort results, only return unique results)Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Benefits of Linked DataExplicit, simple data representationCommon data representation (Resource Description Framework, RDF)hides underlying technologies and systemsDistributed SystemDecentralised distributed ownership and control facilitates adoption andscalabilityCross-referencingAllows for linking and referencing of existing data, via reuse of URIsLoose coupling with common language layerLarge scale systems require loose coupling, via HTTP as common accessprotocolEase of publishing and consumptionSimple and easy-to-use systems and technologies to facilitate uptakeIncremental data integrationStart with merged RDF graphs and provide mappings as you goMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Challenges (I)Ramp-up cost for data conversionMay be alleviated by semi-automatic mappings and adequate toolsupport for manual conversionIntegrated data may be messy at firstBut can be refined as need arisesDistributed creation and loose coordination may result ininconsistenciesCan be detected, diagnosed, and fixed with appropriate toolsMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

The Pedantic Web GroupGet the community to contact publishers about errors/issues as theyariseGet involved: http://pedantic-web.org/137 members!Acknowledgements to: Aidan Hogan, Alex Passant, Me, Antoine Zimmermann,Axel Polleres, Michael Hausenblas, Richard Cyganiak, Stéphane CorlosquetMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Challenges (II)Often very much oriented towards individualsLittle possibilities for expressing schema knowledgeDifferent data sources have different ways of representing the samefactsOntology languages (RDFS, OWL) solve that drawbackRDFS and OWL are layered on top of RDFMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

ONTOLOGY LANGUAGESMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Ontology in PhilosophyTerm exists only in singular (there are no“ontologies”)Ontology is concerned with the study of thenature of being, existence or reality as suchDiscussed by Aristoteles (Sokrates), Thomasvon Aquin, Descartes, Kant, Hegel,Wittgenstein, Heidegger, Quine, .Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Ontology in Informatics“An Ontology is aformal specification interpretable by machinesof a shared based on consensusconceptualisation describes terminologyof a domain of interest” covers a specific topicStuder, Benjamins and Fensel (1998) based on Gruber(1993) and Borst (1997)Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Schema KnowledgeRDF provides universal mechanism for the representationof facts using triplesPossible to describe individuals and their relationsRequired: describe generic sets of individuals (classes),e.g., people, chemical compounds, organisations Required: specification of logical connections betweenindividuals, classes and properties to describe theirmeaning, e.g., “researchers write papers”, “materials arechemical compounds”In database-speak: schema knowledgeMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Schema Knowledge with RDFSRDF Vocabulary Description Language (RDFS)Allows for specification of schema (also: terminological)knowledgeRDFS is a special RDF vocabulary (every RDFS documentis an RDF document)RDFS vocabulary is generic: allows to specify thesemantics of other vocabularies (and as such is a kind of“metavocabulary”)Thus, RDFS is an ontology language (but a lightweightontology language)“A little semantics goes a long way” (Hendler, 1997)Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Classes and InstancesProperty rdf:type defines the subject of a triple as of type of theobjectObject of the triple is interpreted as identifier for the class, whichcontains the resources denoted via subject of the tripleExample:“The individual Hegel is of type Person”dbpedia:Georg Wilhelm Friedrich Hegel rdf:typefoaf:Person .Class membership is not exclusive:Example:dbpedia:Georg Wilhelm Friedrich Hegel rdf:typeyago:PoliticalPhilosophers .Instances and classes both use same syntax for URIs, so nosyntactical distinctionMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Subclasses - MotivationGiven tripledbpedia:Georg Wilhelm Friedrich Hegel rdf:typeyago:PoliticalPhilosophers .and a query for all foaf:Person instanceswe do not get any resultsWe could add the tripledbpedia:Georg Wilhelm Friedrich Hegel rdf:typefoaf:Person .but would solve the problem only for one instanceMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

SubclassesSolution:Make one statement which says that every scientist is a personWhich means every instance of classyago:PoliticalPhilosophers is also an instance of classfoaf:PersonRealised via rdfs:subClass propertyExample:“The class of political philosophers is a subclass of theclass of persons”yago:PoliticalPhilosophers rdfs:subClassOffoaf:Person .Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Subclassesrdfs:subClassOf is reflexive, that is, every class is asubclass of itselfExample:yago:PoliticalPhilosophers rdfs:subClassOfyago:PoliticalPhilosophers .Possible to equate two classes via reciprocal subclassrelations:Example:dbpedia:Person rdfs:subClassOf foaf:Person .foaf:Person rdfs:subClassOf dbpedia:Person .Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Class HierarchiesTypically, ontologies contain not only single subclass relations, but classhierarchiesExample:yago:PoliticalPhilosophers rdfs:subClassOfyago:Philosophers .yago:Philopsophers rdfs:subClassOf dbpedia:Person .dbpedia:Person rdfs:subClassOf dbpedia:Mammal .Transitivity of rdfs:subClassOf is part of the RDFS semantics, whichmeans e.g., the following holds:Example:dbpedia:Philopsophers rdfs:subClassOf dbpedia:Mammal .Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Further RDFS PrimitivesProperty hierarchies via rdfs:subPropertyOfRestrictions on properties via rdfs:domain andrdfs:rangeLists and collectionsReification (statements about statements)Annotations via rdfs:label or rdfs:commentMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

RDFS SummaryRDFS can be used to describe semantic aspects ofspecific domainsOn the basis of RDFS it is possible to infer implicitknowledgeHowever, the primitives of RDFS have limited expressivityMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Web Ontology Language OWLFragment of first-order logicsFive variants: OWL EL, OWL RL, OWL QL, OWL DL, OWL FullOWL DL is decidable and has a corresponding description logicsSROIQ (D)OWL documents are RDF documentsThree building blocks areClasses (comparable to classes in RDFS)Individuals (comparable to instances in RDFS)Roles (comparable to properties in RDFS)OWL contains primitives to specify elaborate expressions,e.g. two classes are disjointOWL allows for complex reasoning tasks such as consistency check,but may be computationally expensiveMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

EquivalenceOWL allows for specification of equivalence; needed in data integrationscenariosBetween individuals: owl:sameAsExample: http://viaf.org/viaf/97773925/ owl:sameAs http://dbpedia.org/resource/Marlene Dietrich .Between properties: owl:equivalentPropertyBetween classes: owl:equivalentClassExample:dbpedia:Person owl:equivalentClass foaf:Person .However, equivalences are often implicitly stated in the dataMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Inverse Functional PropertiesPossible to define “uniquely identifying properties” useful for objectconsolidationE.g. (hypothetical) fromex:passportNo rdf:type owl:inverseFunctionalProperty .anddbpedia:Marlene Dietrich ex:passportNo “12033-89-5” .freebase:en.marlene dietrich ex:passportNo ”12033-89-5” .follows:dbpedia:Marlene Dietrich owl:sameAsfreebase:en.marlene dietrich .Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Further OWL PrimitivesProperty characteristics: inverse properties, symmetricpropertiesProperty cardinality: minimum cardinality, maximumcardinalityClass restrictionsProperty chains Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

LINKED DATA APPLICATIONARCHITECTURESMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Data Integration System Architecture? !IntegrationWrapperWrapper11Wrapper 2Wrapper nSource 1Source 2Source nMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Semantic Web Components( )Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Linked Data: Minimal ComponentsMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data2. Answer( )!1. Query?

Architecture StylesWarehousing/Crawl-Index-Serve?2. Answer!1. Query?Virtual Integration/Distributed Querying0. CrawlIndexMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data!

Basic Application: Entity BrowsingWarehousing/Crawl-Index-ServeSWSE, Falcons, Sindice, Watson,FactForge Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked DataVirtual Integration/Distributed QueryingTabulator, Disco, Zitgist

SUMMARYMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

SummaryThe Linked Data Web is a large, decentralised, complex system builton simple principlesidentify resource via HTTP URIsprovide RDF that links to other URIs upon lookupCurrent trend around Linked Data allows for a re-think of componentsin Semantic Web Layer CakeData publishers and consumers coordinate littleWeb of Data grows rapidly and covers a large variety of domainsAlgorithms operating over a common access protocol and data modelOntology languages provide integration and mapping betweendisparate sourcesFirst commercial applications emergingMarko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

AttributionSlides from my SWT-2 lectures and WWW 2010 SILD tutorialSlides about RDFS and OWL adapted from SWT-1 lecture (Rudolph,Kroetzsch, Harth)Linking Open Data cloud diagrams, by Richard Cyganiak and AnjaJentzsch. http://lod-cloud.net/Images of Berlin, Hegel and Dietrich via WikipediaHendler 97: http://www.cs.rpi.edu/ hendler/LittleSemanticsWeb.htmlBorst 97: “Construction of Engineering Ontologies”, Ph.D. Thesis,University of Twente 1997.Studer, Benjamins, Fensel 98: “Knowledge Engineering: Principles andMethods”, DKE 25(1-2):161-198.Gruber 93: “Towards principles for the design of ontologies used forknowledge sharing”, Formal Ontology in Conceptual Analysis andKnowledge Representation, Kluwer.Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data

Big Linked Data Tutorial Semantic Days 2012 . Tutorial Agenda . Large Scale Linked Data Management (30 m) Andreas Big Data Intro and Analytics (60 m - 90 m) Marko Questions & Answers Session (30 m) all Marko Grobelnik, Andreas Harth, Dumitru Roman, Big Linked Data . Introduction to Linked Data (Andreas)