The Data Warehouse Toolkit - Aatinegar

Transcription

The DataWarehouseToolkit

The DataWarehouseToolkitThe Definitive Guide toDimensional ModelingThird EditionRalph KimballMargy Ross

The Data Warehouse Toolkit: The Defi nitive Guide to Dimensional Modeling, Third EditionPublished byJohn Wiley & Sons, Inc.10475 Crosspoint BoulevardIndianapolis, IN 46256www.wiley.comCopyright 2013 by Ralph Kimball and Margy RossPublished by John Wiley & Sons, Inc., Indianapolis, IndianaPublished simultaneously in CanadaISBN: 978-1-118-53080-1ISBN: 978-1-118-53077-1 (ebk)ISBN: 978-1-118-73228-1 (ebk)ISBN: 978-1-118-73219-9 (ebk)Manufactured in the United States of America10 9 8 7 6 5 4 3 2 1No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form orby any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior writtenpermission of the Publisher, or authorization through payment of the appropriate per-copy fee to theCopyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 6468600. Requests to the Publisher for permission should be addressed to the Permissions Department, JohnWiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or onlineat http://www.wiley.com/go/permissions.Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim allwarranties, including without limitation warranties of fitness for a particular purpose. No warranty may becreated or extended by sales or promotional materials. The advice and strategies contained herein may notbe suitable for every situation. This work is sold with the understanding that the publisher is not engaged inrendering legal, accounting, or other professional services. If professional assistance is required, the servicesof a competent professional person should be sought. Neither the publisher nor the author shall be liable fordamages arising herefrom. The fact that an organization or Web site is referred to in this work as a citationand/or a potential source of further information does not mean that the author or the publisher endorsesthe information the organization or website may provide or recommendations it may make. Further, readersshould be aware that Internet websites listed in this work may have changed or disappeared between whenthis work was written and when it is read.For general information on our other products and services please contact our Customer CareDepartment within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax(317) 572-4002.Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some materialincluded with standard print versions of this book may not be included in e-books or in print-ondemand. If this book refers to media such as a CD or DVD that is not included in the version youpurchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.Library of Congress Control Number: 2013936841Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons,Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is notassociated with any product or vendor mentioned in this book.

About the AuthorsRalph Kimball founded the Kimball Group. Since the mid-1980s, he has been thedata warehouse and business intelligence industry’s thought leader on the dimensional approach. He has educated tens of thousands of IT professionals. The Toolkitbooks written by Ralph and his colleagues have been the industry’s best sellerssince 1996. Prior to working at Metaphor and founding Red Brick Systems, Ralphcoinvented the Star workstation, the first commercial product with windows, icons,and a mouse, at Xerox’s Palo Alto Research Center (PARC). Ralph has a PhD inelectrical engineering from Stanford University.Margy Ross is president of the Kimball Group. She has focused exclusively on datawarehousing and business intelligence since 1982 with an emphasis on businessrequirements and dimensional modeling. Like Ralph, Margy has taught the dimensional best practices to thousands of students; she also coauthored five Toolkit bookswith Ralph. Margy previously worked at Metaphor and cofounded DecisionWorksConsulting. She graduated with a BS in industrial engineering from NorthwesternUniversity.

CreditsExecutive EditorRobert ElliottProject EditorMaureen SpearsSenior Production EditorKathleen WisorCopy EditorApostrophe Editing ServicesEditorial ManagerMary Beth WakefieldFreelancer Editorial ManagerRosemarie GrahamAssociate Director of MarketingDavid MayhewMarketing ManagerAshley ZurcherBusiness ManagerAmy KniesProduction ManagerTim TateVice President and Executive GroupPublisherRichard SwadleyVice President and Executive PublisherNeil EddeAssociate PublisherJim MinatelProject Coordinator, CoverKatie CrockerProofreaderWord One, New YorkIndexerJohnna VanHoose DinseCover ImageiStockphoto.com / teekidCover DesignerRyan Sneed

AcknowledgmentsFirst, thanks to the hundreds of thousands who have read our Toolkit books,attended our courses, and engaged us in consulting projects. We have learned asmuch from you as we have taught. Collectively, you have had a profoundly positiveimpact on the data warehousing and business intelligence industry. Congratulations!Our Kimball Group colleagues, Bob Becker, Joy Mundy, and Warren Thornthwaite,have worked with us to apply the techniques described in this book literally thousands of times, over nearly 30 years of working together. Every technique in thisbook has been thoroughly vetted by practice in the real world. We appreciate theirinput and feedback on this book—and more important, the years we have sharedas business partners, along with Julie Kimball.Bob Elliott, our executive editor at John Wiley & Sons, project editor MaureenSpears, and the rest of the Wiley team have supported this project with skill andenthusiasm. As always, it has been a pleasure to work with them.To our families, thank you for your unconditional support throughout ourcareers. Spouses Julie Kimball and Scott Ross and children Sara Hayden Smith,Brian Kimball, and Katie Ross all contributed in countless ways to this book.

ContentsIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxvii1Data Warehousing, Business Intelligence, and DimensionalModeling Primer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Different Worlds of Data Capture and Data Analysis . . . . . . . . . . . . . . . . . . .2Goals of Data Warehousing and Business Intelligence . . . . . . . . . . . . . . . . . .3Publishing Metaphor for DW/BI Managers . . . . . . . . . . . . . . . . . . . . . . .5Dimensional Modeling Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7Star Schemas Versus OLAP Cubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8Fact Tables for Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Dimension Tables for Descriptive Context . . . . . . . . . . . . . . . . . . . . . . 13Facts and Dimensions Joined in a Star Schema . . . . . . . . . . . . . . . . . . . 16Kimball’s DW/BI Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Operational Source Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Extract, Transformation, and Load System . . . . . . . . . . . . . . . . . . . . . . 19Presentation Area to Support Business Intelligence. . . . . . . . . . . . . . . . 21Business Intelligence Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . .22Restaurant Metaphor for the Kimball Architecture . . . . . . . . . . . . . . . . 23Alternative DW/BI Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Independent Data Mart Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 26Hub-and-Spoke Corporate Information Factory Inmon Architecture . . 28Hybrid Hub-and-Spoke and Kimball Architecture . . . . . . . . . . . . . . . . .29Dimensional Modeling Myths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30Myth 1: Dimensional Models are Only for Summary Data . . . . . . . . . .30Myth 2: Dimensional Models are Departmental, Not Enterprise . . . . . 31Myth 3: Dimensional Models are Not Scalable . . . . . . . . . . . . . . . . . . . 31Myth 4: Dimensional Models are Only for Predictable Usage . . . . . . . . 31Myth 5: Dimensional Models Can’t Be Integrated . . . . . . . . . . . . . . . . 32More Reasons to Think Dimensionally . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Agile Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

xContents2Kimball Dimensional Modeling Techniques Overview. . . . . . . . . 37Fundamental Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Gather Business Requirements and Data Realities . . . . . . . . . . . . . . . . . 37Collaborative Dimensional Modeling Workshops . . . . . . . . . . . . . . . . . 38Four-Step Dimensional Design Process . . . . . . . . . . . . . . . . . . . . . . . . . 38Business Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Grain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Dimensions for Descriptive Context . . . . . . . . . . . . . . . . . . . . . . . . . . .40Facts for Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40Star Schemas and OLAP Cubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40Graceful Extensions to Dimensional Models . . . . . . . . . . . . . . . . . . . . . 41Basic Fact Table Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Fact Table Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Additive, Semi-Additive, Non-Additive Facts . . . . . . . . . . . . . . . . . . . . 42Nulls in Fact Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Conformed Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Transaction Fact Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Periodic Snapshot Fact Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Accumulating Snapshot Fact Tables . . . . . . . . . . . . . . . . . . . . . . . . . . .44Factless Fact Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44Aggregate Fact Tables or OLAP Cubes . . . . . . . . . . . . . . . . . . . . . . . . . 45Consolidated Fact Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Basic Dimension Table Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46Dimension Table Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46Dimension Surrogate Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46Natural, Durable, and Supernatural Keys . . . . . . . . . . . . . . . . . . . . . . .46Drilling Down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Degenerate Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Denormalized Flattened Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 47Multiple Hierarchies in Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . .48Flags and Indicators as Textual Attributes . . . . . . . . . . . . . . . . . . . . . . .48Null Attributes in Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48Calendar Date Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48Role-Playing Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Junk Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

ContentsxiSnowflaked Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Outrigger Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Integration via Conformed Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Conformed Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Shrunken Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Drilling Across . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Value Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Enterprise Data Warehouse Bus Architecture . . . . . . . . . . . . . . . . . . . . 52Enterprise Data Warehouse Bus Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 52Detailed Implementation Bus Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 53Opportunity/Stakeholder Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Dealing with Slowly Changing Dimension Attributes . . . . . . . . . . . . . . . . . 53Type 0: Retain Original . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54Type 1: Overwrite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54Type 2: Add New Row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Data Warehouse Toolkit: The Defi nitive Guide to Dimensional Modeling, Third Edition Published by John Wiley & Sons, Inc. 10475 Crosspoint Boulevard