Next Generation Query And Transformation - Datypic

Transcription

Next Generation Query andTransformation StandardsPriscilla WalmsleyManaging Director, 1

Agenda The query and transformation landscapeQuerying XML with XQueryTransforming XML with XSLTShared componentsDecision points 2005 Datypic http://www.datypic.comSlide 2

Querying and TransformingXML Querying: Extracting data of interest Transformation: Changing the structure of data Sometimes it's hard to tell them apartGive me just theb elements, butcall them x in theresultsChange all theb elements to xelements, andignore the rest 2005 Datypic http://www.datypic.comSlide 3

W3C Standards forQuerying/TransformationConditional ExpressionsArithmetic ExpressionsQuantified ExpressionsBuilt-In Functions & OperatorsData ModelFLWOR ExpressionsXML ConstructorsQuery PrologUser-Defined FunctionsXQuery 1.0XPath2.0XSLT 2.0XPath1.0Path ExpressionsComparison ExpressionsSome Built-In Functions 2005 Datypic http://www.datypic.comStylesheetsTemplatesXML ConstructorsUser-Defined FunctionsSlide 4

Querying XML withXQuery5

XQuery 1.0for i1 in doc("input1.xml")//item/@deptfor i2 in doc("input2.xml")//productwhere i1/@dept i2/@deptorder by i1return dep name "{ i1}" quant "{sum( i2/@quant)}"/ analyze and evaluateXMLInput 1XQueryProcessorparseserialize(or pass on)XMLXMLOutputOutputparseXMLInput 2 2005 Datypic http://www.datypic.comSlide 6

XML Input Could be data that is:–––––a textual XML document on a file systemretrieved from a Web servicestored in an XML databasestored in a relational databasecreated in memory by program code Can take the form of:– a single XML document– a collection of several documents– a fragment of a document (e.g. sequence of elements) 2005 Datypic http://www.datypic.comSlide 7

XQuery 1.0 Capabilitiesselecting elements/attributes from XMLinput documentsfor i1 in doc("order.xml")//itemfor i2 in doc("catalog.xml")//productwhere i1/@num i2/prodNumorder by i1/@num, i1/quantityreturn item number "{ i1/@num}"name "{ i2/prodName}"salePrice "{min( i2/price) i2/discount}"/ 2005 Datypic http://www.datypic.comSlide 8

XQuery 1.0 Capabilitiesjoining data from multiple sourcesfor i1 in doc("order.xml")//itemfor i2 in doc("catalog.xml")//productwhere i1/@num i2/prodNumorder by i1/@num, i1/quantityreturn item number "{ i1/@num}"name "{ i2/prodName}"salePrice "{min( i2/price) i2/discount}"/ 2005 Datypic http://www.datypic.comSlide 9

XQuery 1.0 Capabilitiesadding new elements/attributes to resultsfor i1 in doc("order.xml")//itemfor i2 in doc("catalog.xml")//productwhere i1/@num i2/prodNumorder by i1/@num, i1/quantityreturn item number "{ i1/@num}"name "{ i2/prodName}"salePrice "{min( i2/price) i2/discount}"/ 2005 Datypic http://www.datypic.comSlide 10

XQuery 1.0 Capabilitiesperforming calculationsfor i1 in doc("order.xml")//itemfor i2 in doc("catalog.xml")//productwhere i1/@num i2/prodNumorder by i1/@num, i1/quantityreturn item number "{ i1/@num}"name "{ i2/prodName}"salePrice "{min( i2/price) i2/discount}"/ 2005 Datypic http://www.datypic.comSlide 11

XQuery 1.0 Capabilitiessorting resultsfor i1 in doc("order.xml")//itemfor i2 in doc("catalog.xml")//productwhere i1/@num i2/prodNumorder by i1/@num, i1/quantityreturn item number "{ i1/@num}"name "{ i2/prodName}"salePrice "{min( i2/price) i2/discount}"/ 2005 Datypic http://www.datypic.comSlide 12

XQuery Use Cases13

Search and BrowseWhat hotels in NewYork allow pets andhave Internet access?Built-In User InterfaceHappy UserCustom User InterfaceBuilt-In XQuery ProcessorSemi-Structured XML Content(Poetry Manuscripts, Medical Journals, Hotel Reviews)"Native" XML DBMS e.g. MarkLogic, Berkeley DB, eXist 2005 Datypic http://www.datypic.comSlide 14

"XML-izing" Data for WebServicesWhat is the statusof my order?Happy UserOrder Inquiry Web ServiceBuilt-In XQuery Front-EndStructured Data(Orders, Product Prices, Customer Information)Relational DBMS e.g. SQL Server, Oracle, DB2 2005 Datypic http://www.datypic.comSlide 15

Integrating Disparate DataSources 2005 Datypic http://www.datypic.com DataDirectTechnologiesSlide 16

Anything, really. Anywhere in application codeyou would currently use XPath,or XSLT, or DOM, e.g.:– to narrow down results returnedfrom a Web service– in a pipeline process to split orsubset an XML document– to manipulate or create aconfiguration file stored as XML 2005 Datypic http://www.datypic.comSlide 17

XQuery Features18

Features of XQuery Compact syntax Typing and schemasupport Reusable functionlibraries Designed with today'sXML in mind 2005 Datypic http://www.datypic.comSlide 19

Compact, Intuitive Syntax Easy to learn and use Less verbose than XSLT– but much more powerful than straight XPath Does not require hard-core programmingbackground Ideal for embedding into programminglanguages 2005 Datypic http://www.datypic.comSlide 20

Embedding in Java XQJ: XQuery API for Java– proposed Java standard for invoking queries,and processing the results– the "JDBC of XML"XQExpression expr conn.createExpression();String qy "for p in doc('cat.xml')//productreturn ( p/name)";XQResultSequence result expr.executeQuery(qy);while (result.next()) {String str result.getString();System.out.println("Product name: " str); }result.close(); expr.close(); conn.close(); 2005 Datypic http://www.datypic.comSlide 21

Typing and SchemaSupport Typing allows for identification of query errors Optional schema support– can associate a schema with a query or inputdocument– the schema defines the rules for the input or outputXML names of elements/attributeshierarchical structurenumber of occurrencesdata types 2005 Datypic http://www.datypic.comSlide 22

Benefits of Using Schemas Better identification of static errors– allows discovery of errors in the query thatwere not otherwise apparent– especially important when new versions of theinput XML vocabulary come along Query optimization Validity of query inputs and results– makes them more predictable Special processing based on type 2005 Datypic http://www.datypic.comSlide 23

Using Schemas to CatchStatic Errorsimport schemadefault element namespace"http://datypic.com/prod"at "http://datypic.com/prod.xsd";for prod in doc("cat.xml")/produtorder by prod/name/numberreturn prod/name 1type error: name is declaredto be of type xs:string, socannot be used in an addoperation 2005 Datypic http://www.datypic.commisspellinginvalid path; namewill never havenumber childSlide 24

Reusable FunctionLibraries Portable, reusable, shareable Can provide a set of standard queries on astandard XML vocabulary As vocabulary changes, function librariescan be recompiled and/or versionedmodule namespace dty "http://datypic.com/order";declare function dty:orderStatus( num as xs:string?)as element(order)* { . };declare function dty:cancelOrder( num as xs:string?)as xs:boolean { . }; 2005 Datypic http://www.datypic.comSlide 25

Designed with Today's XMLin Mind Intuitive, designed-in support for:– namespaces– construction of new elements/attributes– data types– whitespace handling– etc. Much less awkward than, e.g., DOMmanipulation 2005 Datypic http://www.datypic.comSlide 26

Transforming XMLwith XSLT27

Typical XSLT Use Cases Transform content into presentation– XML to HTML, XML to XSL-FO General purpose XML to XML transforms(data manipulation)– B2B– EAI Transform XML to other formats (text, CSV,etc.) 2005 Datypic http://www.datypic.comSlide 28

XSLT (Look familiar?) xsl:template match "order" xsl:for-each select "item" li Item number xsl:value-of select "@num"/ /li /xsl:for-each /xsl:template analyze and evaluateXMLInput 1XSLTProcessorparseXMLInput 2serialize(or pass on)XMLXMLOutputOutputparse 2005 Datypic http://www.datypic.comSlide 29

XSLT 2.0 - What's New? GroupingMultiple result documentsTemporary result treesXPath 2.0 enhancements– more powerful syntax– more built-in functions Schema support and type system 2005 Datypic http://www.datypic.comSlide 30

Schema Support andType System Same typing/schema features as XQuery Special processing based on type: xsl:template match "element(*,USAddressType)" . xsl:value-of select "city"/ xsl:value-of select "zipCode"/ /xsl:template xsl:template match "element(*,UKAddressType)" . xsl:value-of select "postCode"/ xsl:value-of select "city"/ /xsl:template 2005 Datypic http://www.datypic.comSlide 31

XSLT Conveniences(not present in XQuery) Highly flexible recursive processing– allows "Push" approach Grouping syntax is more explicit easier Formatting of dates and numbers– format-date, format-number Advanced string manipulation– analyze-string Ability to customize/override stylesheets 2005 Datypic http://www.datypic.comSlide 32

Pull vs. Push Approaches Pull– go get element X and do this with it– next, go get element Y and do this with it Push– get the root element if it happens to be X, do this with it. if it happens to be Y, do this with it. if it's anything else, skip it.– next, go get its children and repeat 2005 Datypic http://www.datypic.comSlide 33

Pull Approach Pulling the information from the inputdocument using hardcoded paths tospecific locations Requires a predictable document structure xsl:template match "order" xsl:for-each select "item" li Item # xsl:value-of select "@num"/ /li /xsl:for-each /xsl:template 2005 Datypic http://www.datypic.comSlide 34

Push Approach Traversing a document, taking eachelement as it comes, then deciding what todo with it Useful when the structure of the input file isnot known, or is highly flexible Flexible but not optimized Very difficult to do in XQuery 2005 Datypic http://www.datypic.comSlide 35

Sample Stylesheet in"Push" Style xsl:template match "order" xsl:apply-templates select "*"/ /xsl:template xsl:template match "item" xsl:apply-templates select "@*"/ /xsl:template xsl:template match "@num" li Item # xsl:value-of select "."/ /li /xsl:template 2005 Datypic http://www.datypic.comSlide 36

XQuery and XSLT:Shared Components37

Shared Components XPath 2.0 Built-in functions Data model 2005 Datypic http://www.datypic.comSlide 38

XPath 2.0 Full compatibility across XQuery and XSLT– same syntax– same expression will always return the samevalue Much more than just path expressionsfor a in fn:distinct-values(/bib/book/author)return ( a, /bib/book[author a]/title)some emp in /emps/employee satisfies( emp/bonus 0.25 * emp/salary) 2005 Datypic http://www.datypic.comSlide 39

Over 100 Built-InFunctions: A Sample String-related substring, contains, matches, tokenize Date-related current-date, month-from-date Number-related round, avg, sum, ceiling Sequence-related index-of, insert-before, reverse Document- and URI-related collection, doc, root, base-uri 2005 Datypic http://www.datypic.comSlide 40

XQuery/XPath Data Model 2005 Datypic http://www.datypic.comSlide 41

XQuery vs. XSLT:Decision Factors42

XQuery vs. XSLT:Decision Factors Use caseAvailability of relevant implementationsPerformanceProgramming style 2005 Datypic http://www.datypic.comSlide 43

Use Case Use XSLT if:– your documents are highly variable– your transformation is presentation-oriented– your processing is heavily recursive Use XQuery if:– you are selecting a small subset of a collection of XMLdata– you are joining data from multiple sources– your documents are predictable in structure, orvariations are not relevant to your searches 2005 Datypic http://www.datypic.comSlide 44

Availability of RelevantImplementations XQuery– XML DBMSs: MarkLogic, Sleepycat BerkeleyDB, X-Hive, eXist– Relational DBMSs: Oracle, SQL Server, DB2– Standalone: Saxon– XML Editors: Stylus Studio, XMLSpy, Oxygen XSLT 2.0– Standalone: Saxon– XML Editors: Stylus Studio, XMLSpy 2005 Datypic http://www.datypic.comSlide 45

Performance XQuery implementations tend to be optimizedfor:– XML stored in a database– predictable document structures that can be indexed XSLT implementations tend to be optimized for:– transforming an entire document that can be loadedinto memory More driven by use cases than limitations oflanguages 2005 Datypic http://www.datypic.comSlide 46

Programming Style XSLT– recursive template language difficult for somedevelopers to grasp– verbosity can be irritating– however, many users loves it XQuery– appealing to SQL users– probably easier for newcomers 2005 Datypic http://www.datypic.comSlide 47

Conclusions XQuery and XSLT 2.0 are coming of age They overlap in capabilities.– but differ in use cases and sweet spots Both take XML manipulation to a new levelin terms of:– power– flexibility– production-readiness 2005 Datypic http://www.datypic.comSlide 48

Resources Detailed technical comparison of XQueryand XSLT 2.0– Michael Kay's paper from XTech 05:– ers/02-03-01/ XQuery implementations– http://www.w3.org/XML/Query 2005 Datypic http://www.datypic.comSlide 49

Learning XQuery My tutorial on XQuery:– http://datypic.com/services/xquery Definitive XQuery– By Priscilla Walmsley– Coming in 2006 2005 Datypic http://www.datypic.comSlide 50

Thank you for yourinterest.For more information pleasecontact me at:Email: pwalmsley@datypic.comWebsite: http://www.datypic.com51

"Native" XML DBMS e.g. MarkLogic, Berkeley DB, eXist Search and Browse Happy User Semi-Structured XML Content (Poetry Manuscripts, Medical Journals, Hotel Reviews) Built-In XQuery Processor Built-In User Interface Custom User Interface What hotels in New York allow pets and have Internet access?