Introduction To XSLT Concepts - Mulberry Tech

Transcription

Introduction to XSLT ConceptsDeborah Aleyne Lapeyre and B. Tommie UsdinMulberry Technologies, Inc.17 West Jefferson St., Suite 207Rockville, MD 20850Phone: 301/315-9631Fax: rytech.comJanuary 2006 2006 Mulberry Technologies, Inc.

Introduction to XSLT ConceptsAdministrivia.1What is XSLT.4What XSLT Does is “Transform”.5The Very Basics of XSLT Transforms.6Sample XSLT TransformsLogical Components of an XSLT Application.9Component 1: XML Document.10Looking at an XML Document as a Tree.11Component 2: The XSLT Stylesheet (aka XSLT Transform).11An XSL Stylesheet / Transform.13Component 3: An XSLT Engine/Processor.14Component 4: The Output File(s).16Watching a Stylesheet in OperationHow Input-Driven Stylesheets Work.16Advice: What to Do and Not Do with XSLT.17Business Uses XSLT Because XML is Everywhere.17For the Right Kind of Problems* .17What’s Really Easy in XSLT.18XSLT Easily Changes XML into Different XML.18XSLT Handles Markup Well.18XSLT is Not Good at Everything.19XSLT is Weak on Manipulating Text (Strings).19Really Big Files.21Making Flat Files into Hierarchies.21Where XSLT Fits in Processing.22How Organizations Use XSLT.23Simple Business Transforms.23Making HTML From Semantically Richer XML.24Page i

Introduction to XSLT ConceptsSingle Source and Reuse Publishing.25Construct the Output for Publishing.25What You Want in the Order You Want It.26There is Not Just One Print Product.26Some of the Text is Added by the Transform.27Large Structures Can be Built and Inserted as Well.27XSLT is Also Useful During Production.28XML for Interchange and Archiving.30XSLT as the Middle Component in XSL-FO.30How XSL-FO Works.31Architecture of a Full XSL System (XSLT XSL-FO).32Formatting Objects Describe Page Layout.32Applying Styles through XSL FOs.33XSL-FO is a Great Report Writer.33The Last Bits.34What is XPath.34XPath Has Two Main Uses.35You’ve Seen XPath in match Expressions .35XPath Can Be Very Complex.35Another Complexity: Push-me Pull-you StylesheetsWhat is a Pull Stylesheet?.37Why Pull Can Be a Problem.40Heads UP: XSLT and XPath 1.0, 1.1, 2.0.41What Was “Wrong” with XSLT 1.0.41XSLT 2.0: More Power; More Programmer Responsibility.42How to Deal with XSLT 1.0 and 2.0 (November 2005).42How to Make XSLT Programmers.43XSLT is Also Really Easy But.43How to Learn XSLT.45Debbie's XSLT Programming Pearls (Optional).45Page ii

Introduction to XSLT ConceptsNow Let’s Look at Some Real Stylesheets.46End Speech; Start ReferencesFor Further Information.47XSLT Technical Reference Book.47Useful XSLT Reference Website: Zvon.48XSLT Concept/Syntax Books.48XSLT Syntax for Programmers.48Colophon.49AppendixesAppendix 1: Representative XSLT Tools.49Appendix 2: Acronyms Used in This Talk.50Page iii

Introduction to XSLT Conceptsslide 1AdministriviaC Start, end, breakC Ask questions any time (please!)C Who we areC Why this classC Why more publishing examplesC Anything else?slide 2Where We Are Not Going in This TutorialC What is XML, why you should care, how XML works(element, attribute, DTD, schema, entity)C How to solve your particular business problem(s)C Programmer stuff like how to write stylesheets(although you will see some code)C Syntax of the XSLT language (templates, functions, location paths)C Detailed XPath syntax (location paths, functions, data types)C XSLT toolsC XSL-FO in depth (that’s this afternoon)Page 1

Introduction to XSLT Conceptsslide 3Where We Are Going TodayThe What and Why of XSLTC What is transformation, what is XSLTC How it works (logical components of an XSLT system)C How to think about it (the XSLT processing model)C How businesses are using XSLTC What XSLT does not do wellC How should you learn/write XSLTslide 4WARNING!We are going to show code!You’ll understand the examples even if you ignore the codeWe are going to act as if you never heard of XSLT and start from scratchPage 2

Introduction to XSLT Conceptsslide 5A Quick Poll (Who You Are)C Where in the processC content creators / editors / publishersC prepress / compositionC printersC print / web / graphic designC fulfillment / distributionC System analysts / application programmersC TrainingC What kind of publishingC Books (monographs, reference series, etc.)C JournalsC Magazines and newspapersC Product documentationC Technical documentationC Course materials (CBT, course-packages, tests, textbooks plus, etc.)C Non-publishing folksPage 3

Introduction to XSLT Conceptsslide 6What Do You Know Now?C Know HTML (even a little)C XMLC SGMLC XSLTC XSL-FOC Microsoft Word, WordPerfectC QuarkXPress, InDesign, other desktop publishingC High-end composition systemsslide 7What is XSLTExtensible Stylesheet Language TransformationC Name is misleadingC StylesheetC implies it makes things look like somethingC not necessarily or usually trueC Name should have been“The XML Transformation Language”Page 4

Introduction to XSLT Conceptsslide 8So What is XSLT Really?C Provides transformation and manipulation functions for XML filesC Designed to make XML into something elseC1.0 W3C Recommendation 1999C2.0 Candidate Recommendation November 2005slide 9What XSLT Does is “Transform”Transform means changeReads XML documents and writesC HTML for browsersC a different XML tag setC typesetting driver file (InDesign, QuarkXPress, FrameMaker)C interchange file (RTF, RDF, EDI, etc.)C a flat ASCII file (plain text, comma separated etc.)Page 5

Introduction to XSLT Conceptsslide 10The Very Basics of XSLT TransformsC TransformC does not change the input fileC creates one (or more) new output filesC Transform does not make something else into XMLC Two basic requirementsC known XML source (tag set, schema, DTD)C known targetSample XSLT Transformsslide 11Take in an XML document employee-record type "dog" empno "9" name first Sasparilla /first last Usdin /last /name affiliation title Deputy in Charge of Chewables /title company Mulberry Technologies /company location city Rockville /city state MD /state zip 20850 /zip /location email-name sassy /email-name /affiliation height unit "in" 36 /height weight unit "lb" 70 /weight /employee-record Page 6

Introduction to XSLT Conceptsslide 12Transform It into HTML(convert to HTML and display in a browser)slide 13Transform It into PDF(convert to PDF and display with Acrobat)MulberryTechnologies, Inc.Sasparilla Usdin17 West Jefferson StreetSuite 207Rockville, MD 20850Phone: 301/315-9631Fax: 301/315-9634sassy@mulberrytech.comPage 7

Introduction to XSLT Conceptsslide 14Transform It into QuarkXPressNew Employee AnnouncementsSasparilla Usdinhas recently joined Mulberry Technologies, Inc.’sRockville staff as Deputy in Charge of Chewables.Welcome to the team, Sassy!C XML elements rolled into “form letter”C Something (perhaps employee-id) linked to photoslide 15Transform It into a Database Load FileKey: 00095AUSEMPNO: 009001:USDIN002:Sasparilla008:36014:70020:Deputy in Charge of ChewablesPage 8

Introduction to XSLT Conceptsslide 16In Other Words: Tagging Changes Large and SmallC Change the following surname Lapeyre /surname firstnames Deborah A. /firstnames INTO contrib Deborah A. Lapeyre /contrib C Change the following chapter title Lawns and Gardens /title INTO h2 Lawns and Gardens /h2 C bold Tall /bold .INTO:G46Helvetica-ExtraBold;Tallslide 17Logical Components of an XSLT Application(needs XSLT processing software (called an “XSLT Engine”)CCCCReads XML document(s) (tags and text)Uses an XSLT stylesheet/transform (the program)Runs using XSLT processing software (called an XSLT Engine)Produces output document(s)Structure of an XSLT Page 9

Introduction to XSLT Conceptsslide 18Component 1: XML DocumentC XML documentsC are sequences of data characters and markupC start-tag and end-tag markup delimits elementsC But XSLT does not work directly on XML documentsC Part of the XSLT processing (usually an XML parser)builds a treeC XSLT works on trees (made from XML documents)Page 10

Introduction to XSLT Conceptsslide 19Looking at an XML Document as a Treeslide 20Component 2: The XSLT Stylesheet(aka XSLT Transform)C A computer programC Transformation instructionsC Called a “stylesheet” (or a “transform”)C A well-formed XML document!C Commands in the XSLT language areC a tag set (elements and attributes)C defined by the W3C XSLT recommendationC that look like this ( xsl:sort and xsl:number )Page 11

Introduction to XSLT Conceptsslide 21An XSLT Stylesheet / Transform IsC A series of rules (called template rules)C Each rule is a sequence of XSLT commandsC Each command is an XML element with attributesC A rule is executed when itC matches some conditionCor is called by nameslide 22“Matching a Condition” MeansC If you find a ( ) in the source XML,then do this (perform the template)C Matching can mean finding in the XMLC an elementC an element/attribute combinationC an element in a certain contextC some special circumstance(words in the content, any element at all, etc.)Page 12

Introduction to XSLT Conceptsslide 23An XSL Stylesheet / Transform(close your eyes, this is code)C Here is a template ruleC This rule matches a paragraph elementC Notice that it is made up of XML elements (two kinds)C The two kinds of XML elementsC XSLT language tags (instructions)C HTML tags123456 xsl:template match "paragraph" hr/ p xsl:apply-templates/ /p /xsl:template Page 13

Introduction to XSLT Conceptsslide 24Component 3: An XSLT Engine/ProcessorC You need special software to run XSLTC But you don’t have to buy themC Free open-source, shareware, as well as commercialC New ones all the timeC Look for more at: http://www.xml.comCSaxon (http://users.iclway.co.uk/mhkay/saxon/ )CXalan XSLT (http://xml.apache.org/xalan/index.html )CCUnicorn XSLT Processor(http://www.unicorn-enterprises.com/ )XSLT C library for Gnome (http://xmlsoft.org/XSLT/ )slide 25XSLT Also Built Into/Can be Hooked IntoC XML programmers’ developing environmentsC XML-aware editorsC Content aggregation systemsC Other XML processorsIn softwares like this XSLT comes built in and you still don’t have to buy it!Page 14

Introduction to XSLT Conceptsslide 26How an XSLT Processor WorksSourceTreeTransformerResultTree. t XSL is /t "fun". z XSL is e fun /e /z . xsl:stylesheet . /xsl:stylesheet The big dark rectangle above is the XSLT processorPage 15

Introduction to XSLT Conceptsslide 27Component 4: The Output File(s)XSLT can make 3 syntaxes for outputC XML filesC HTMLC Text (untagged files)C ASCII email messageC comma-separated fileC desktop publishing system format (e.g., XTags for QuarkXPress)Watching a Stylesheet in Operationslide 28How Input-Driven Stylesheets WorkPage 16

Introduction to XSLT Conceptsslide 29Advice: What to Do and Not Do with XSLTslide 30Business Uses XSLT Because XML is EverywhereC XSLT was designed to process XMLC Takes full advantage of the treeC XML constructs are built in ( no special programming)C Solves problems withC order of the materialC document model/processing mismatchC interchange (mine different from yours different from ours)C personalization/localizationC Part of the XML family, so applications built to supportMakes content fluid, as XML and SGML have always promisedslide 31For the Right Kind of Problems* XSLT isC fasterC betterC cheaper*All three, but note the caveatPage 17

Introduction to XSLT Conceptsslide 32What’s Really Easy in XSLTC Extract just some of the inputC Change sequence of elements (rearrange / sort)C Remove materialC Use the same element / attribute in 5 placesC Add generated textslide 33XSLT Easily Changes XML into Different XMLC Rename an element or attributeC Change element xxx into element yyyC Make elements into attributesC Make attributes into elementsslide 34XSLT Handles Markup WellXSLT works best whenC What you care about (want to process) is tagged!C Hierarchy is explicitC The most important relationships are tree relationshipsC containment (parent / child)C siblingsC attributesPage 18

Introduction to XSLT Conceptsslide 35XSLT is Not Good at EverythingC Not at allC conversion into XMLC Non-XML data (Word, QuarkXPress, SGML)C Not as good as most “programming languages”C number crunching (arithmetic and higher math)C string processing (parsing)C really big filesC making structure where there was none(making flat files into hierarchies)slide 36XSLT is Weak on Manipulating Text (Strings)C An XSLT processor expects to work onC a tree of nodesC not an XML file of tags and textC If you have untagged files(comma delimited, space delimited, tab delimited)C there is no treeC strings must be “parsed” into piecesC XSLT does this awkwardly(XSLT 2.0 has better string manipulation than XSLT 1.0, but )Page 19

Introduction to XSLT Conceptsslide 37What If You Need String Processing?C Use a different programming languageC Preprocess to make the data into XMLC add tagsC add nesting (make hierarchy explicit)C add end tagsslide 38Real World String Parsing ExampleThe original data looked like this: title Large Animals /title address Dallas, TX 23071 /address The Requirement was to put the name of the state before every section title title Texas Large Animals /title One solutionC run a program (Perl, etc.) to make the following title Large Animals /title address city Dallas /city , state TX /state 23071 /address C Now it’s a node in a tree, run XSLTPage 20

Introduction to XSLT Conceptsslide 39Really Big FilesFiles are sequences of characters; XSLT works on treesC Many XSLT processorsC make the input document into a tree in memoryC make the stylesheet into another tree in memoryC make the results into more trees in memoryC Document may not fit in memoryC Usual solution is “chunking”slide 40Making Flat Files into HierarchiesC XSLT 1.0 was not designed to do thisC Sometimes you can do it anywayC using grouping techniquesC using keys (an advanced technique)C When it works (maybe 2/3) it is elegant, clever, and trickyC Success depends on the dataC information must be thereC markup must be clean and consistent(XSLT 2.0 much better at this, but still needs clear distinctions)Page 21

Introduction to XSLT Conceptsslide 41Where XSLT Fits in ProcessingC XML used in any of the three tiers, especially in the middleC XSL is used for any processingC within the middle tier (application to application)C between tiersC database to databasePresentation LayerBrowserPrintEditingApplicationXSLT XML/ XSLTXSLT XML/ File system XML/ ApplicationformatengineXSLTXSLT XML/ XSL XML/ ProcessingXSLTLayer XML/ DOM XML/ Relational DBObject DBStorage LayerXSLT is often the mechanism represented by the arrow.Page 22OtherDevicePartnersystem

Introduction to XSLT Conceptsslide 42How Organizations Use XSLTC Simple business transformsC Making HTML from semantically richer XMLC Single Source and Reuse PublishingC Transforms for editorial QAC XML to XML transformsC XSLT as the middle component in XSL-FOslide 43Simple Business TransformsC Data exchange between applicationsC you give me what you think I needC I take what I want in the order I want itC E-Business / E-Commerce — Translate between transaction formatsfaster, easier, better than with EDIC Portals / Web Services / Data AggregationC grab just the data you want from a repository, database, filesC rearrange it to suitC serve it forthPage 23

Introduction to XSLT Conceptsslide 44Making HTML From Semantically Richer XMLRead in semantically rich tagging COMPUTER CLASS "Portable" MFR GCA /MFR FAMILY Laptop /FAMILY LINE Thinkie /LINE MODEL 520XL /MODEL DISK UOM "GB" 80 /DISK SPEED UOM "GHz" 3.2 /SPEED /COMPUTER Simplify it to HTML for display in any browser H2 Laptop Computer /H2 UL LI GCA Thinkie 520XL /LI LI 3.2GHz /LI LI 80GB /LI /UL (Use CSS for look-and-feel; serve to even really old browsers)slide 45Which Displays AsPage 24

Introduction to XSLT Conceptsslide 46Single Source and Reuse Publishing(XSLT fulfills the XML promise of multiple use)C Making the output productC preparation for publishing (web and print)C Print on Demand and web servingC composition driversC QA and proofingC XML to XML transformC XSLT as the middle component in XSL-FOslide 47Construct the Output for Publishing(transformations build products)C Out of databases, rearranged for the webC Customized printing Different users getC different orderC different text or contentC same content different look-and-feelC Print on Demand (with data up to this minute)Page 25

Introduction to XSLT Conceptsslide 48What You Want in the Order You Want ItSelect / Extract / List / OmitC Pull out the metadata to put into the catalogC Extract titles and abstracts of all articles for the advertising webpageC Extract the CME material for a special site for nursesC Get all the environmental impact materialC Publish this report with all the SECRET material removedC Get me the citations to send to the link matching serviceC My car has a sun-roof, manual transmission, and option package #4,make me my owners manualC Get me all the dosage sections that mention pregnancy restrictionsslide 49There is Not Just One Print ProductC Customization (change, assemble, or adaptbased on customer or organization)C mix and match text and graphic componentsC target specific marketsC Personalization (tailor a product to an individual person)C based on purchase, profile, historyC Internationalization (multiple languages, script, writing directions,currency)C Localization (adapting a print product to a specific locality/region)Page 26

Introduction to XSLT Conceptsslide 50Some of the Text is Added by the Transform(textual additions are called “generated” text)Text that is not in the data, but is put in by the transform,based on the taggingFor example:C numbers or bullets that prefix list items (1., 2., 3.)(based on list-item tag)C mark a footnote reference (²) or a citation reference [Lapeyre, 2006]based on a cross-reference made with an attributeC Adding words or phrases to titles (Chapter VI Sassy Poodles)C Turning a cross reference into textC xref redid :A123456"/ intoC “See Figure 6, Herpetologist Distribution Curve”Less content maintenance!slide 51Large Structures Can be Built and Inserted as WellC Table of Contents from chapter titlesC Subject index from embedded index termsC List of Figures, Tables, Equations, Genus-species namesC Title Page from the metadata elementsC Leaning Objectives from embedded objectivesPage 27

Introduction to XSLT Conceptsslide 52XSLT is Also Useful During ProductionTransformations for Editorial QA and ProofingC Make checklists for humans to examineC Make files for automated authority checkingC Run galleys as often as you wantC Make useful displays that will never be printedC number things that won’t be numbered on displayC if the book will say“(See Section 4.3)”put the section title into the reference“(See Section 4.3 My Life with Poodles)”C make false color proofsC ferrous materials in red and non-ferrous in greenC all skeletal system paragraphs in blue, circulatory systemparagraphs in redC a citation with author name in green, journal name in pink, year inblue, paper title in yellowPage 28

Introduction to XSLT Conceptsslide 53False Color ProofWater is blue (italic), land is yellow (bold), and “features” are purpley(display font in the print)Page 29

Introduction to XSLT Conceptsslide 54XML for Interchange and ArchivingXML to XML TransformsC Corporate tagset intoC client’s tagsC business partner’s tagsC Company-specific tags into Industry Standard schemaC 5 Publisher tag sets into one repository / aggregator tag setC Authoring DTD into publication DTDC 50 articles to one RSS feed of the summariesslide 55XSLT as the Middle Component in XSL-FOC XSL is a spec with two partsC XSLT (the transformation part)C XSL-FO (the formatting part)C XSL provides a tag set into which XML documentsmay be transformed (using XSLT)C describes page geometryC says how to put content on the pageXSL-FO used to make PDF (or RTF or MIF) directly from XMLPage 30

Introduction to XSLT Conceptsslide 56How XSL-FO WorksC XSLTC transforms the inputC makes a tree of formatting objectsC An XSL-FO document isC an XML documentC with text and graphic content wrapped in formatting object tagsC XSL-FO (XSL Formatting Objects)C get processed by a rendering engine (software)C to make an output fileC a display engine (like a browser or a printer)makes the pretty outputslide 57Remember How an XSLT System WorksSourceTreeTransformerResultTree. t XSL is /t "fun". z XSL is e fun /e /z . xsl:stylesheet . /xsl:stylesheet Page 31

Introduction to XSLT Conceptsslide 58SourceTreeTransformerResultTreeFormatter.XSL is fun. z XSL is e fun /e /z .Architecture of a Full XSL System(XSLT XSL-FO) xsl:stylesheet . /xsl:stylesheet slide 59Formatting Objects Describe Page LayoutC Page layout:C page size, margins, columnsC headers, footers, side-bars. etc.C Different page layout templates (masters) can be sequenced, e.g.C first page followed by later pagesC recto / verso alternatingC XSL-FO properties control hyphenation, widows / orphans, etc.Page 32

Introduction to XSLT Conceptsslide 60Applying Styles through XSL FOsslide 61XSL-FO is a Great Report Writer(Pagination is not a problem)C Credit card and bank statementsC Investment portfoliosC Hospital systems reportsC Insurance policies and claimsC Patient medical chartsC Directory productsC product and editorial indiciesC company personnel listingPage 33

Introduction to XSLT Conceptsslide 62The Last BitsC Other things you need to know about how XSLT worksC XPath for tree-walkingC Pull-style stylesheetsC XSLT 2.0 (with XPath 2.0)C How can you make yourself (or your staff) into XSLT peopleslide 63There’s Another Part of XSLTWe Haven’t Talked AboutXPathReally powerful!slide 64What is XPathC The tree-walking part of XSLTC So named because it uses a path notation with slasheslike UNIX directories and URLsplay/act/scecne/speechinvoice/customer data/customer nameCCXPath 1.0 W3C Recommendation in 1999XPath 2.0 W3C Recommendation Draft 2005(more complex, more powerful, harder to learn)Page 34

Introduction to XSLT Conceptsslide 65XPath Has Two Main UsesC First use: AddressingC addresses (finds) part of an XML documentC can address any part of the tree from any otherC (ask for something in an XML document (“gimme my footnote!”). and get it back)C Second use in XSLT: for Testing/MatchingC test whether a node in a tree matches a patternC is this node a paragraph that is inside a footnote that has an attributecalled “footnote-type” with the value “legal”?slide 66You’ve Seen XPath in match ExpressionsC xsl:template match "title" Matches title elementsC xsl:template match "scene/title" Matches title elements that have a scene parentC xsl:template match "para[@type ’warning’]" Matches para elements that have a type attribute that has a valueof “warning”slide 67XPath Can Be Very Complex(all that power has a price)child::slide[attribute::type "overview"]/child::list[count(descendent::item) ::node(),’Business’)]]Thanks to Jeni Tennison for:select "following-sibling::transaction[@type type and substring(@date, 4, 2) ! month] [1]"Page 35

Introduction to XSLT ConceptsAnother Complexity:Push-me Pull-you Stylesheetsslide 68XSLT isC By design and default, driven by the XML input fileC That means you tell it what to do, not how or whenC Automatically recursive through the use of templatesslide 69XSLT can be Written in Two Ways (1)C Way # 1 Input-driven (called Push)C walk the input treeC match elements in input treeC do something when you find a matchC This is what XSLT was designed to doC This works the best (when it works for your data)slide 70XSLT can be Written in Two Ways (2)C Way # 2 Stylesheet driven (called Pull)C more like a typical computer programC walk the stylesheet(which specifies the order of the output document)C when it asks for data, go get it from the input treeC Similar to some fill-in-the-blank programming languagesPull stylesheets are most useful when you have very regular dataPage 36

Introduction to XSLT Conceptsslide 71What is a Pull Stylesheet?Let’s look at some XML for a menu specials-menu menu-date Friday, July 28, 2000 /menu-date spec-meat price "24.50" Pork chops with Chard& Apples /spec-meat spec-appetiz price "3.95" Seckel pears withGorgonzola and Walnuts /spec-appetiz spec-soup price "6.95" Red and YellowPepper /spec-soup spec-fish price "18.50" Seared Achoo with Risottoand Spinach /spec-fish spec-pasta price "12.25" Wagon-wheels Alfredo(with side salad) /spec-pasta spec-sweet price "12.95" Strawberry andChocolate Tart /spec-sweet /specials-menu slide 72Now Let’s Look at the StylesheetHere’s the XML file for that: html xmlns:xsl "http://www.w3.org/1999/XSL/Transform"version "1.0" title Today’s Menu xsl:value-of select "//specials-menu/menu-date"/ /title

Introduction to XSLT Concepts Deborah Aleyne Lapeyre and B. Tommie Usdin Mulberry Technologies, Inc. . Where We Are Not Going in This Tutorial C What is XML, why you should care, how XML works (element, attribute, DTD, schema, entity) . Transform It into PDF (convert to PDF and display with Acrobat) Page 7 Mulberry Technologies, Inc.