Definitive XML Schema

Transcription

DefinitiveXMLSchemaSecond Edition

The Charles F. GoldfarbDefinitive XML SeriesPriscilla WalmsleyDefinitive XML Schema Second EditionCharles F. Goldfarb and Paul PrescodCharles F. Goldfarb’s XML Handbook Fifth EditionRick JelliffeThe XML and SGML Cookbook:Recipes for Structured InformationCharles F. Goldfarb, Steve Pepper,and Chet EnsignSGML Buyer’s Guide: Choosing the RightXML and SGML Products and ServicesG. Ken HolmanDefinitive XSL-FODefinitive XSLT and XPathBob DuCharmeXML: The Annotated SpecificationSGML CDTruly DonovanIndustrial-Strength SGML:An Introduction to Enterprise PublishingLars Marius GarsholDefinitive XML Application DevelopmentJP Morgenthal with Bill la ForgeEnterprise Application Integration withXML and JavaMichael Leventhal, David Lewis, andMatthew FuchsDesigning XML Internet ApplicationsAdam Hocek and David CuddihyDefinitive VoiceXMLDmitry KirsanovXSLT 2.0 Web DevelopmentYuri Rubinsky and Murray MaloneySGML on the Web:Small Steps Beyond HTMLDavid MegginsonStructuring XML DocumentsSean McGrathXML Processing with PythonXML by Example:Building E-commerce ApplicationsParseMe.1st:SGML for Software DevelopersChet Ensign GML: The Billion Dollar SecretRon Turner, Tim Douglass, andAudrey TurnerReadMe.1st:SGML for Writers and EditorsCharles F. Goldfarb andPriscilla WalmsleyXML in Office 2003:Information Sharing with Desktop XMLMichael FloydBuilding Web Sites with XMLFredrick Thomas MartinTOP SECRET Intranet:How U.S. Intelligence Built Intelink—TheWorld’s Largest, Most Secure NetworkJ. Craig CleavelandProgram Generators with XML and JavaAbout the Series AuthorCharles F. Goldfarb is the father of XML technology. He invented SGML, the StandardGeneralized Markup Language on which both XML and HTML are based. You can findhim on the Web at: www.xmlbooks.com.About the Series LogoThe rebus is an ancient literary tradition, dating from 16th century Picardy, and is especiallyappropriate to a series involving fine distinctions between markup and text, metadata anddata. The logo is a rebus incorporating the series name within a stylized XML commentdeclaration.

DefinitiveXMLSchemaSecond EditionPriscilla WalmsleyUpper Saddle River, NJ Boston Indianapolis San FranciscoNew York Toronto Montreal London Munich Paris MadridCape Town Sydney Tokyo Singapore Mexico City

Many of the designations used by manufacturers and sellers to distinguish their products areclaimed as trademarks. Where those designations appear in this book, and the publisher wasaware of a trademark claim, the designations have been printed with initial capital letters orin all capitals.The author and publisher have taken care in the preparation of this book, but make noexpressed or implied warranty of any kind and assume no responsibility for errors or omissions.No liability is assumed for incidental or consequential damages in connection with or arisingout of the use of the information or programs contained herein.Titles in this series are produced using XML, SGML, and/or XSL. XSL-FO documents arerendered into PDF by the XEP Rendering Engine from RenderX: www.renderx.com.The publisher offers excellent discounts on this book when ordered in quantity for bulkpurchases or special sales, which may include electronic versions and/or custom covers andcontent particular to your business, training goals, marketing focus, and branding interests.For more information, please contact:U.S. Corporate and Government Sales(800) 382–3419corpsales@pearsontechgroup.comFor sales outside the United States, please contact:International Salesinternational@pearsoned.comVisit us on the Web: informit.com/phLibrary of Congress Cataloging-in-Publication Data is on fileCopyright 2013 Pearson Education, Inc.All rights reserved. Printed in the United States of America. This publication is protected bycopyright, and permission must be obtained from the publisher prior to any prohibitedreproduction, storage in a retrieval system, or transmission in any form or by any means,electronic, mechanical, photocopying, recording, or likewise. To obtain permission to usematerial from this work, please submit a written request to Pearson Education, Inc.,Permissions Department, One Lake Street, Upper Saddle River, New Jersey 07458, or youmay fax your request to (201) 236–3290.ISBN-13: 978-0-132-88672-7ISBN-10: 0-132-88672-3Text printed in the United States on recycled paper at Edwards Brothers Malloy in AnnArbor, MI.First printing: September 2012Editor-in-Chief: Mark L. TaubManaging Editor: Kristy HartBook Packager: Alina KirsanovaCover Designer: Alan Clements

To Doug, my SH

This page intentionally left blank

OverviewChapter 1Schemas: An introductionChapter 2A quick tour of XML Schema16Chapter 3Namespaces34Chapter 4Schema composition56Chapter 5Instances and schemas78Chapter 6Element declarations88Chapter 7Attribute declarations112Chapter 8Simple types128Chapter 9Regular expressions1582Chapter 10Union and list types180Chapter 11Built-in simple types200Chapter 12Complex types256Chapter 13Deriving complex types300Chapter 14Assertions350Chapter 15Named groups384Chapter 16Substitution groups406Chapter 17Identity constraints422vii

viiiOverviewRedefining and overriding schemacomponents446Chapter 19Topics for DTD users472Chapter 20XML information modeling500Chapter 21Schema design and documentation538Chapter 22Extensibility and reuse594Chapter 23Versioning616Chapter 18Appendix AXSD keywords648Appendix BBuilt-in simple types690

ContentsForewordAcknowledgmentsHow to use this bookChapter 1 Schemas: An introductionWhat is a schema?1.1The purpose of schemas1.2xxxixxxiiixxxv2351.2.1Data validation51.2.2A contract with trading partners51.2.3System documentation61.2.4Providing information to processors61.2.5Augmentation of data61.2.6Application information61.3Schema design71.3.1Accuracy and precision71.3.2Clarity81.3.3Broad applicability8ix

xContents1.4Schema languages91.4.1Document Type Definition (DTD)1.4.2Schema requirements expand101.4.3W3C XML Schema111.4.4Other schema languages121.4.4.11.4.4.2RELAX NGSchematronA quick tour of XML SchemaAn example schemaThe components of XML SchemaChapter 22.12.2912131617182.2.1Declarations vs. definitions182.2.2Global vs. local components192.3Elements and attributes20The tag/type distinction202.3.12.4Types212.4.1Simple vs. complex types212.4.2Named vs. anonymous types222.4.3The type definition hierarchy222.5Simple types232.5.1Built-in simple types232.5.2Restricting simple types242.5.3List and union types24Complex types252.6.1Content types252.6.2Content models262.6.3Deriving complex types272.62.7Namespaces and XML Schema28

Contents2.82.92.102.11Schema compositionInstances and schemasAnnotationsAdvanced features293031322.11.1Named groups322.11.2Identity constraints322.11.3Substitution groups322.11.4Redefinition and overriding332.11.5Assertions33NamespacesNamespaces in XMLChapter 33.134353.1.1Namespace names363.1.2Namespace declarations and prefixes373.1.3Default namespace declarations393.1.4Name terminology403.1.5Scope of namespace declarations413.1.6Overriding namespace declarations423.1.7Undeclaring namespaces433.1.8Attributes and namespaces443.1.9A summary example463.23.3The relationship between namespaces andschemasUsing namespaces in schemas48483.3.1Target namespaces483.3.2The XML Schema Namespace503.3.3The XML Schema Instance Namespace513.3.4The Version Control Namespace51xi

xiiContents3.3.5Namespace declarations in schema documentsMap a prefix to the XML Schema NamespaceMap a prefix to the target namespaceMap prefixes to all namespaces525354Schema compositionModularizing schema documentsDefining schema documentsCombining multiple schema documents563.3.5.13.3.5.23.3.5.3Chapter 44.14.24.34.3.1include4.3.1.1 The syntax of includes4.3.1.2 Chameleon includes575861626365import4.3.2.1 The syntax of imports4.3.2.2 Multiple levels of imports4.3.2.3 Multiple imports of the same namespace66Schema assembly considerations754.3.24.4526770724.4.1Uniqueness of qualified names754.4.2Missing components764.4.3Schema document defaults77Instances and schemasUsing the instance attributesSchema processingChapter 55.15.27879815.2.1Validation815.2.2Augmenting the instance825.3Relating instances to schemas5.3.15.3.1.15.3.1.25.4Using hints in the instanceThe xsi:schemaLocation attributeThe xsi:noNamespaceSchemaLocation attributeThe root element8384848687

ContentsElement declarationsGlobal and local element declarationsChapter 66.188896.1.1Global element declarations896.1.2Local element declarations936.1.3Design hint: Should I use global or local elementdeclarations?956.26.3Declaring the types of elementsQualified vs. unqualified forms96986.3.1Qualified local names986.3.2Unqualified local names986.3.3Using elementFormDefault996.3.4Using form1006.3.5Default namespaces and unqualified names1016.4Default and fixed values1016.4.1Default values1026.4.2Fixed values1036.5Nils and nillability1056.5.1Using xsi:nil in an instance1086.5.2Making elements nillable109Attribute declarationsAttributes vs. elementsGlobal and local attribute declarationsChapter 77.17.21121131157.2.1Global attribute declarations1157.2.2Local attribute declarations1177.2.3Design hint: Should I use global or local attributedeclarations?1197.3Declaring the types of attributes120xiii

xivContents7.47.5Qualified vs. unqualified formsDefault and fixed values1221237.5.1Default values1247.5.2Fixed values1257.6Inherited attributesSimple typesSimple type varietiesChapter 88.18.1.18.2Design hint: How much should I break down my datavalues?Simple type definitions1261281291301318.2.1Named simple types1318.2.2Anonymous simple types1328.2.3Design hint: Should I use named or anonymous types?1338.3Simple type restrictions1358.3.1Defining a restriction1368.3.2Overview of the facets1378.3.3Inheriting and restricting facets1398.3.4Fixed facets1408.3.4.18.4Design hint: When should I fix a facet?Facets1411428.4.1Bounds facets1428.4.2Length facets1438.4.2.18.4.2.2Design hint: What if I want to allow empty values?Design hint: What if I want to restrict the length of an integer?1431448.4.3totalDigits and 88.4.6Assertion150

Contents8.4.7Explicit Time Zone1508.4.8Whitespace151Preventing simple type derivationImplementation-defined types and facets8.58.61521548.6.1Implementation-defined types1548.6.2Implementation-defined facets155Regular expressionsThe structure of a regular expressionAtomsChapter 99.19.21581591619.2.1Normal characters1629.2.2The wildcard escape character1649.2.3Character class escapes164Single-character escapesMulticharacter escapesCategory escapesBlock racter class 59.2.4.69.2.59.39.4Listing individual charactersSpecifying a rangeCombining individual characters and rangesNegating a character class expressionSubtracting from a character class expressionEscaping rules for character class expressionsParenthesized regular expressionsQuantifiersBranchesUnion and list typesVarieties and derivation typesUnion typesChapter v

xviContents10.2.1Defining union types18310.2.2Restricting union types18510.2.3Unions of unions18610.2.4Specifying the member type in the instance18710.3List types18810.3.1Defining list types18810.3.2Design hint: When should I use lists?18910.3.3Restricting list types19010.3.3.110.3.3.210.3.3.3Length facetsEnumeration facetPattern facet19219219410.3.4Lists and strings19510.3.5Lists of unions19610.3.6Lists of lists19610.3.7Restricting the item type198Built-in simple typesThe XML Schema type systemChapter 1111.120020111.1.1The type hierarchy20211.1.2Value spaces and lexical spaces20411.1.3Facets and built-in types20411.2String-based types11.2.1 string, normalizedString,11.2.1.1205and tokenDesign hint: Should I use string, normalizedString,or token?20520711.2.2 Name20811.2.3 NCName21011.2.4 language21111.3Numeric types213

Contents11.3.1 floatand double11.3.2 decimal11.3.311.3.3.111.4Integer typesDesign hint: Is it an integer or a string?Date and time types21321521722022111.4.1 date22111.4.2 time22211.4.3 dateTime22311.4.4 dateTimeStamp22411.4.5 gYear22511.4.6 gYearMonth22611.4.7 gMonth22711.4.8 gMonthDay22711.4.9 gDay22811.4.10 duration22911.4.11 yearMonthDuration23111.4.12 dayTimeDuration23211.4.13Representing time zones23311.4.14Facets23411.4.15Date and time ordering23511.5Legacy types23611.5.1 ID23611.5.2 IDREF23711.5.3 IDREFS23911.5.4 ENTITY24011.5.5 ENTITIES24211.5.6 NMTOKEN243xvii

xviiiContents11.5.7 NMTOKENS24411.5.8 NOTATION24511.6Other types24611.6.1 QName24611.6.2 boolean24711.6.3The binary types25011.6.4 anyURI11.7Comparing typed valuesComplex typesWhat are complex types?Defining complex typesChapter 1212.112.224825325625725812.2.1Named complex types25812.2.2Anonymous complex types26012.2.3Complex type alternatives261Content types26212.3.1Simple content26212.3.2Element-only content26412.3.3Mixed content26412.3.4Empty content26512.312.4Using element declarations26612.4.1Local element declarations26612.4.2Element references26712.4.3Duplication of element names26812.5Using model groups12.5.1 sequence12.5.1.1groupsDesign hint: Should I care about the order of elements?12.5.2 choicegroups270270272273

Contents12.5.3Nesting of sequence and choice groups12.5.4 allgroups27527612.5.5Named model group references27812.5.6Deterministic content models279Using attribute declarations28112.6.1Local attribute declarations28112.6.2Attribute references28212.6.3Attribute group references28412.6.4Default attributes28412.612.7Using 2.112.7.2.212.7.3284Element wildcards285Controlling the namespace of replacement elementsControlling the strictness of validationNegative wildcardsOpen content modelsOpen content in a complex typeDefault open contentAttribute wildcardsDeriving complex typesWhy derive types?Restriction and extensionSimple content and complex contentChapter 1313.113.213.313.3.1 2303303elements304Complex type extensions30513.4.1Simple content extensions30613.4.2Complex content extensions307Extending choice groupsExtending all groups30931013.3.2 complexContent13.413.4.2.113.4.2.2xix

xxContentsExtending open content31113.4.3Mixed content extensions31213.4.4Empty content extensions31313.4.5Attribute extensions31413.4.6Attribute wildcard extensions31513.4.2.3Complex type restrictions31613.5.1Simple content restrictions31713.5.2Complex content restrictions318Eliminating meaningless groupsRestricting element declarationsRestricting wildcardsRestricting groupsRestricting open 13.5.2.413.5.2.513.5.3Mixed content restrictions33113.5.4Empty content restrictions33213.5.5Attribute restrictions33313.5.6Attribute wildcard restrictions33513.5.7Restricting types from another namespace33713.5.7.113.613.7Using targetNamespace on element and attributedeclarationsType substitutionControlling type derivation and substitution33934134313.7.1 final:Preventing complex type derivation34313.7.2 block:Blocking substitution of derived types34413.7.3Blocking type substitution in element declarations13.7.4 abstract:Forcing derivationAssertionsAssertionsChapter 1414.114.1.114.1.1.1346346350351Assertions for simple types353Using XPath 2.0 operators355

.2.114.1.2.214.1.2.314.1.314.1.3.114.2Using XPath 2.0 functionsTypes and assertionsInheriting simple type assertionsAssertions on list types357359362363Assertions for complex types365Path expressionsConditional expressionsAssertions in derived complex typesAssertions and namespacesUsing xpathDefaultNamespaceConditional type assignment36736937037237337514.2.1The alternative element37614.2.2Specifying conditional type assignment37714.2.3Using XPath in the test attribute37814.2.4The error type38014.2.5Conditional type assignment and namespaces38114.2.6Using inherited attributes in conditional typeassignment382Named groupsWhy named groups?Named model groupsChapter 1515.115.238438538615.2.1Defining named model groups38615.2.2Referencing named model groups38815.2.2.115.2.2.215.2.2.315.2.2.415.3Group referencesReferencing a named model group in a complex typeUsing all in named model groupsNamed model groups referencing named model groupsAttribute groups38838939139239215.3.1Defining attribute groups39315.3.2Referencing attribute groups39515.3.2.1Attribute group references395xxi

15.415.5Referencing attribute groups in complex typesDuplicate attribute namesDuplicate attribute wildcard handlingAttribute groups referencing attribute groupsThe default attribute groupNamed groups and namespacesDesign hint: Named groups or complex typederivations?Substitution groupsWhy substitution groups?The substitution group hierarchyDeclaring a substitution groupType constraints for substitution groupsMembers in multiple groupsAlternatives to substitution groupsChapter 40740840941241341416.6.1Reusable choice groups41416.6.2Substituting a derived type in the instance41516.7Controlling substitution groups41816.7.1 final:Preventing substitution group declarations41816.7.2 block:Blocking substitution in instances41916.7.3 abstract:Identity constraintsIdentity constraint categoriesDesign hint: Should I use ID/IDREF orkey/keyref?Structure of an identity constraintUniqueness constraintsChapter 1717.117.217.317.4Forcing substitution420422423424424426

Contents17.517.6Key constraintsKey references42843017.6.1Key references and scope43217.6.2Key references and type equality43217.7Selectors and 9XPath subset for identity constraintsIdentity constraints and namespacesUsing xpathDefaultNamespace17.9.1Referencing identity constraints17.10Redefining and overridingschema componentsRedefinition435439441442Chapter 1818.118.1.118.1.1.118.1.1.218.1.1.3Redefinition basicsInclude plus redefineRedefine and namespacesPervasive impact44644844845045045018.1.2The mechanics of redefinition45118.1.3Redefining simple types45218.1.4Redefining complex types45318.1.5Redefining named model 2Defining a subsetDefining a supersetRedefining attribute groupsDefining a subsetDefining a supersetOverrides18.2.1Override basics454455456457458459459xxiii

xxivContents18.2.1.118.2.1.218.2.1.3Include plus overrideOverride and namespacesPervasive impact46146146218.2.2The mechanics of overriding components46218.2.3Overriding simple types46418.2.4Overriding complex types46518.2.5Overriding element and attribute declarations46618.2.6Overriding named groups46718.3Risks of redefines and overrides46818.3.1Risks of redefining or overriding types46818.3.2Risks of redefining or overriding named groups470Topics for DTD usersElement declarationsChapter 1919.147247319.1.1Simple types47419.1.2Complex types with simple content47519.1.3Complex types with complex content47619.1.4Mixed content47819.1.5Empty content47919.1.6Any content48019.2Attribute declarations48019.2.1Attribute types48019.2.2Enumerated attribute types48119.2.3Notation attributes48219.2.4Default values48219.3Parameter entities for reuse48319.3.1Reusing content models48419.3.2Reusing attributes48519.4Parameter entities for extensibility486

Contents19.4.1Extensions for sequence groups48619.4.2Extensions for choice groups48919.4.3Attribute extensions49019.5External parameter entities49219.6General entities49319.6.1Character and other parsed entities49319.6.2Unparsed entities49319.7Notations49319.7.1Declaring a notation49419.7.2Declaring a notation attribute49519.7.3Notations and unparsed entities49619.819.9CommentsUsing DTDs and schemas togetherXML information modelingData modeling paradigmsRelational modelsChapter 2020.120.249749950050250320.2.1Entities and attributes5

1.2.2 A contract with trading partners 5 1.2.3 System documentation 6 1.2.4 Providing information to processors 6 1.2.5 Augmentation of data 6 1.2.6 Application information 6 1.3 Schema design 7 1.3.1 Accuracy and p