Transcription
Next Generation Query andTransformation StandardsPriscilla WalmsleyManaging Director, 1
Agenda The query and transformation landscapeQuerying XML with XQueryTransforming XML with XSLTShared componentsDecision points 2005 Datypic http://www.datypic.comSlide 2
Querying and TransformingXML Querying: Extracting data of interest Transformation: Changing the structure of data Sometimes it's hard to tell them apartGive me just theb elements, butcall them x in theresultsChange all theb elements to xelements, andignore the rest 2005 Datypic http://www.datypic.comSlide 3
W3C Standards forQuerying/TransformationConditional ExpressionsArithmetic ExpressionsQuantified ExpressionsBuilt-In Functions & OperatorsData ModelFLWOR ExpressionsXML ConstructorsQuery PrologUser-Defined FunctionsXQuery 1.0XPath2.0XSLT 2.0XPath1.0Path ExpressionsComparison ExpressionsSome Built-In Functions 2005 Datypic http://www.datypic.comStylesheetsTemplatesXML ConstructorsUser-Defined FunctionsSlide 4
Querying XML withXQuery5
XQuery 1.0for i1 in doc("input1.xml")//item/@deptfor i2 in doc("input2.xml")//productwhere i1/@dept i2/@deptorder by i1return dep name "{ i1}" quant "{sum( i2/@quant)}"/ analyze and evaluateXMLInput 1XQueryProcessorparseserialize(or pass on)XMLXMLOutputOutputparseXMLInput 2 2005 Datypic http://www.datypic.comSlide 6
XML Input Could be data that is:–––––a textual XML document on a file systemretrieved from a Web servicestored in an XML databasestored in a relational databasecreated in memory by program code Can take the form of:– a single XML document– a collection of several documents– a fragment of a document (e.g. sequence of elements) 2005 Datypic http://www.datypic.comSlide 7
XQuery 1.0 Capabilitiesselecting elements/attributes from XMLinput documentsfor i1 in doc("order.xml")//itemfor i2 in doc("catalog.xml")//productwhere i1/@num i2/prodNumorder by i1/@num, i1/quantityreturn item number "{ i1/@num}"name "{ i2/prodName}"salePrice "{min( i2/price) i2/discount}"/ 2005 Datypic http://www.datypic.comSlide 8
XQuery 1.0 Capabilitiesjoining data from multiple sourcesfor i1 in doc("order.xml")//itemfor i2 in doc("catalog.xml")//productwhere i1/@num i2/prodNumorder by i1/@num, i1/quantityreturn item number "{ i1/@num}"name "{ i2/prodName}"salePrice "{min( i2/price) i2/discount}"/ 2005 Datypic http://www.datypic.comSlide 9
XQuery 1.0 Capabilitiesadding new elements/attributes to resultsfor i1 in doc("order.xml")//itemfor i2 in doc("catalog.xml")//productwhere i1/@num i2/prodNumorder by i1/@num, i1/quantityreturn item number "{ i1/@num}"name "{ i2/prodName}"salePrice "{min( i2/price) i2/discount}"/ 2005 Datypic http://www.datypic.comSlide 10
XQuery 1.0 Capabilitiesperforming calculationsfor i1 in doc("order.xml")//itemfor i2 in doc("catalog.xml")//productwhere i1/@num i2/prodNumorder by i1/@num, i1/quantityreturn item number "{ i1/@num}"name "{ i2/prodName}"salePrice "{min( i2/price) i2/discount}"/ 2005 Datypic http://www.datypic.comSlide 11
XQuery 1.0 Capabilitiessorting resultsfor i1 in doc("order.xml")//itemfor i2 in doc("catalog.xml")//productwhere i1/@num i2/prodNumorder by i1/@num, i1/quantityreturn item number "{ i1/@num}"name "{ i2/prodName}"salePrice "{min( i2/price) i2/discount}"/ 2005 Datypic http://www.datypic.comSlide 12
XQuery Use Cases13
Search and BrowseWhat hotels in NewYork allow pets andhave Internet access?Built-In User InterfaceHappy UserCustom User InterfaceBuilt-In XQuery ProcessorSemi-Structured XML Content(Poetry Manuscripts, Medical Journals, Hotel Reviews)"Native" XML DBMS e.g. MarkLogic, Berkeley DB, eXist 2005 Datypic http://www.datypic.comSlide 14
"XML-izing" Data for WebServicesWhat is the statusof my order?Happy UserOrder Inquiry Web ServiceBuilt-In XQuery Front-EndStructured Data(Orders, Product Prices, Customer Information)Relational DBMS e.g. SQL Server, Oracle, DB2 2005 Datypic http://www.datypic.comSlide 15
Integrating Disparate DataSources 2005 Datypic http://www.datypic.com DataDirectTechnologiesSlide 16
Anything, really. Anywhere in application codeyou would currently use XPath,or XSLT, or DOM, e.g.:– to narrow down results returnedfrom a Web service– in a pipeline process to split orsubset an XML document– to manipulate or create aconfiguration file stored as XML 2005 Datypic http://www.datypic.comSlide 17
XQuery Features18
Features of XQuery Compact syntax Typing and schemasupport Reusable functionlibraries Designed with today'sXML in mind 2005 Datypic http://www.datypic.comSlide 19
Compact, Intuitive Syntax Easy to learn and use Less verbose than XSLT– but much more powerful than straight XPath Does not require hard-core programmingbackground Ideal for embedding into programminglanguages 2005 Datypic http://www.datypic.comSlide 20
Embedding in Java XQJ: XQuery API for Java– proposed Java standard for invoking queries,and processing the results– the "JDBC of XML"XQExpression expr conn.createExpression();String qy "for p in doc('cat.xml')//productreturn ( p/name)";XQResultSequence result expr.executeQuery(qy);while (result.next()) {String str result.getString();System.out.println("Product name: " str); }result.close(); expr.close(); conn.close(); 2005 Datypic http://www.datypic.comSlide 21
Typing and SchemaSupport Typing allows for identification of query errors Optional schema support– can associate a schema with a query or inputdocument– the schema defines the rules for the input or outputXML names of elements/attributeshierarchical structurenumber of occurrencesdata types 2005 Datypic http://www.datypic.comSlide 22
Benefits of Using Schemas Better identification of static errors– allows discovery of errors in the query thatwere not otherwise apparent– especially important when new versions of theinput XML vocabulary come along Query optimization Validity of query inputs and results– makes them more predictable Special processing based on type 2005 Datypic http://www.datypic.comSlide 23
Using Schemas to CatchStatic Errorsimport schemadefault element namespace"http://datypic.com/prod"at "http://datypic.com/prod.xsd";for prod in doc("cat.xml")/produtorder by prod/name/numberreturn prod/name 1type error: name is declaredto be of type xs:string, socannot be used in an addoperation 2005 Datypic http://www.datypic.commisspellinginvalid path; namewill never havenumber childSlide 24
Reusable FunctionLibraries Portable, reusable, shareable Can provide a set of standard queries on astandard XML vocabulary As vocabulary changes, function librariescan be recompiled and/or versionedmodule namespace dty "http://datypic.com/order";declare function dty:orderStatus( num as xs:string?)as element(order)* { . };declare function dty:cancelOrder( num as xs:string?)as xs:boolean { . }; 2005 Datypic http://www.datypic.comSlide 25
Designed with Today's XMLin Mind Intuitive, designed-in support for:– namespaces– construction of new elements/attributes– data types– whitespace handling– etc. Much less awkward than, e.g., DOMmanipulation 2005 Datypic http://www.datypic.comSlide 26
Transforming XMLwith XSLT27
Typical XSLT Use Cases Transform content into presentation– XML to HTML, XML to XSL-FO General purpose XML to XML transforms(data manipulation)– B2B– EAI Transform XML to other formats (text, CSV,etc.) 2005 Datypic http://www.datypic.comSlide 28
XSLT (Look familiar?) xsl:template match "order" xsl:for-each select "item" li Item number xsl:value-of select "@num"/ /li /xsl:for-each /xsl:template analyze and evaluateXMLInput 1XSLTProcessorparseXMLInput 2serialize(or pass on)XMLXMLOutputOutputparse 2005 Datypic http://www.datypic.comSlide 29
XSLT 2.0 - What's New? GroupingMultiple result documentsTemporary result treesXPath 2.0 enhancements– more powerful syntax– more built-in functions Schema support and type system 2005 Datypic http://www.datypic.comSlide 30
Schema Support andType System Same typing/schema features as XQuery Special processing based on type: xsl:template match "element(*,USAddressType)" . xsl:value-of select "city"/ xsl:value-of select "zipCode"/ /xsl:template xsl:template match "element(*,UKAddressType)" . xsl:value-of select "postCode"/ xsl:value-of select "city"/ /xsl:template 2005 Datypic http://www.datypic.comSlide 31
XSLT Conveniences(not present in XQuery) Highly flexible recursive processing– allows "Push" approach Grouping syntax is more explicit easier Formatting of dates and numbers– format-date, format-number Advanced string manipulation– analyze-string Ability to customize/override stylesheets 2005 Datypic http://www.datypic.comSlide 32
Pull vs. Push Approaches Pull– go get element X and do this with it– next, go get element Y and do this with it Push– get the root element if it happens to be X, do this with it. if it happens to be Y, do this with it. if it's anything else, skip it.– next, go get its children and repeat 2005 Datypic http://www.datypic.comSlide 33
Pull Approach Pulling the information from the inputdocument using hardcoded paths tospecific locations Requires a predictable document structure xsl:template match "order" xsl:for-each select "item" li Item # xsl:value-of select "@num"/ /li /xsl:for-each /xsl:template 2005 Datypic http://www.datypic.comSlide 34
Push Approach Traversing a document, taking eachelement as it comes, then deciding what todo with it Useful when the structure of the input file isnot known, or is highly flexible Flexible but not optimized Very difficult to do in XQuery 2005 Datypic http://www.datypic.comSlide 35
Sample Stylesheet in"Push" Style xsl:template match "order" xsl:apply-templates select "*"/ /xsl:template xsl:template match "item" xsl:apply-templates select "@*"/ /xsl:template xsl:template match "@num" li Item # xsl:value-of select "."/ /li /xsl:template 2005 Datypic http://www.datypic.comSlide 36
XQuery and XSLT:Shared Components37
Shared Components XPath 2.0 Built-in functions Data model 2005 Datypic http://www.datypic.comSlide 38
XPath 2.0 Full compatibility across XQuery and XSLT– same syntax– same expression will always return the samevalue Much more than just path expressionsfor a in fn:distinct-values(/bib/book/author)return ( a, /bib/book[author a]/title)some emp in /emps/employee satisfies( emp/bonus 0.25 * emp/salary) 2005 Datypic http://www.datypic.comSlide 39
Over 100 Built-InFunctions: A Sample String-related substring, contains, matches, tokenize Date-related current-date, month-from-date Number-related round, avg, sum, ceiling Sequence-related index-of, insert-before, reverse Document- and URI-related collection, doc, root, base-uri 2005 Datypic http://www.datypic.comSlide 40
XQuery/XPath Data Model 2005 Datypic http://www.datypic.comSlide 41
XQuery vs. XSLT:Decision Factors42
XQuery vs. XSLT:Decision Factors Use caseAvailability of relevant implementationsPerformanceProgramming style 2005 Datypic http://www.datypic.comSlide 43
Use Case Use XSLT if:– your documents are highly variable– your transformation is presentation-oriented– your processing is heavily recursive Use XQuery if:– you are selecting a small subset of a collection of XMLdata– you are joining data from multiple sources– your documents are predictable in structure, orvariations are not relevant to your searches 2005 Datypic http://www.datypic.comSlide 44
Availability of RelevantImplementations XQuery– XML DBMSs: MarkLogic, Sleepycat BerkeleyDB, X-Hive, eXist– Relational DBMSs: Oracle, SQL Server, DB2– Standalone: Saxon– XML Editors: Stylus Studio, XMLSpy, Oxygen XSLT 2.0– Standalone: Saxon– XML Editors: Stylus Studio, XMLSpy 2005 Datypic http://www.datypic.comSlide 45
Performance XQuery implementations tend to be optimizedfor:– XML stored in a database– predictable document structures that can be indexed XSLT implementations tend to be optimized for:– transforming an entire document that can be loadedinto memory More driven by use cases than limitations oflanguages 2005 Datypic http://www.datypic.comSlide 46
Programming Style XSLT– recursive template language difficult for somedevelopers to grasp– verbosity can be irritating– however, many users loves it XQuery– appealing to SQL users– probably easier for newcomers 2005 Datypic http://www.datypic.comSlide 47
Conclusions XQuery and XSLT 2.0 are coming of age They overlap in capabilities.– but differ in use cases and sweet spots Both take XML manipulation to a new levelin terms of:– power– flexibility– production-readiness 2005 Datypic http://www.datypic.comSlide 48
Resources Detailed technical comparison of XQueryand XSLT 2.0– Michael Kay's paper from XTech 05:– ers/02-03-01/ XQuery implementations– http://www.w3.org/XML/Query 2005 Datypic http://www.datypic.comSlide 49
Learning XQuery My tutorial on XQuery:– http://datypic.com/services/xquery Definitive XQuery– By Priscilla Walmsley– Coming in 2006 2005 Datypic http://www.datypic.comSlide 50
Thank you for yourinterest.For more information pleasecontact me at:Email: pwalmsley@datypic.comWebsite: http://www.datypic.com51
"Native" XML DBMS e.g. MarkLogic, Berkeley DB, eXist Search and Browse Happy User Semi-Structured XML Content (Poetry Manuscripts, Medical Journals, Hotel Reviews) Built-In XQuery Processor Built-In User Interface Custom User Interface What hotels in New York allow pets and have Internet access?