Building A Data Warehouse - Lob.de

Transcription

9314fmfinal.qxd11/15/071:37 PMPage iBuilding aData WarehouseWith Examples inSQL ServerVincent Rainardi

9314fmfinal.qxd11/15/071:37 PMPage iiBuilding a Data Warehouse: With Examples in SQL ServerCopyright 2008 by Vincent RainardiAll rights reserved. No part of this work may be reproduced or transmitted in any form or by any means,electronic or mechanical, including photocopying, recording, or by any information storage or retrievalsystem, without the prior written permission of the copyright owner and the publisher.ISBN-13 (pbk): 978-1-59059-931-0ISBN-10 (pbk): 1-59059-931-4ISBN-13 (electronic): 978-1-4302-0527-2ISBN-10 (electronic): 1-4302-0527-XPrinted and bound in the United States of America 9 8 7 6 5 4 3 2 1Trademarked names may appear in this book. Rather than use a trademark symbol with every occurrenceof a trademarked name, we use the names only in an editorial fashion and to the benefit of the trademarkowner, with no intention of infringement of the trademark.Lead Editor: Jeffrey PepperTechnical Reviewers: Bill Hamilton and Asif SayedEditorial Board: Steve Anglin, Ewan Buckingham, Tony Campbell, Gary Cornell, Jonathan Gennick,Jason Gilmore, Kevin Goff, Jonathan Hassell, Matthew Moodie, Joseph Ottinger, Jeffrey Pepper,Ben Renow-Clarke, Dominic Shakeshaft, Matt Wade, Tom WelshSenior Project Manager: Tracy Brown CollinsCopy Editor: Kim WimpsettAssociate Production Director: Kari Brooks-CoponyProduction Editor: Kelly WinquistCompositor: Linda Weidemann, Wolf Creek PressProofreader: Linda MarousekIndexer: Ron StraussArtist: April MilneCover Designer: Kurt KramesManufacturing Director: Tom DebolskiDistributed to the book trade worldwide by Springer-Verlag New York, Inc., 233 Spring Street, 6th Floor,New York, NY 10013. Phone 1-800-SPRINGER, fax 201-348-4505, e-mail orders-ny@springer-sbm.com,or visit http://www.springeronline.com.For information on translations, please contact Apress directly at 2855 Telegraph Avenue, Suite 600,Berkeley, CA 94705. Phone 510-549-5930, fax 510-549-5939, e-mail info@apress.com, or visit http://www.apress.com.The information in this book is distributed on an “as is” basis, without warranty. Although every precaution has been taken in the preparation of this work, neither the author(s) nor Apress shall have anyliability to any person or entity with respect to any loss or damage caused or alleged to be caused directlyor indirectly by the information contained in this work.The source code for this book is available to readers at http://www.apress.com.

9314fmfinal.qxd11/15/071:37 PMPage iiiFor my lovely wife, Ivana.

9314fmfinal.qxd11/15/071:37 PMPage iv

9314fmfinal.qxd11/15/071:37 PMPage vContents at a GlanceAbout the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER1234567891011121314Introduction to Data Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Data Warehouse Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Data Warehouse Development Methodology . . . . . . . . . . . . . . . . . . . . 49Functional and Nonfunctional Requirements . . . . . . . . . . . . . . . . . . . . 61Data Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Physical Database Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173Populating the Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215Assuring Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301Building Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329Multidimensional Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377Using Data Warehouse for Business Intelligence . . . . . . . . . . . . . . . 411Using Data Warehouse for CustomerRelationship Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 CHAPTER 15 Other Data Warehouse Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 CHAPTER 16 Testing Your Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 CHAPTER 17 Data Warehouse Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 APPENDIXNormalization Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509v

9314fmfinal.qxd11/15/071:37 PMPage vi

9314fmfinal.qxd11/15/071:37 PMPage viiContentsAbout the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv CHAPTER 1Introduction to Data Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . . 1What Is a Data Warehouse?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Retrieves Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Consolidates Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Periodically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Dimensional Data Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Normalized Data Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Business Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Other Analytical Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Updated in Batches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Other Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Data Warehousing Today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Business Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Customer Relationship Management . . . . . . . . . . . . . . . . . . . . . . . . . . 18Data Mining. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Master Data Management (MDM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Customer Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Future Trends in Data Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Unstructured Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Service-Oriented Architecture (SOA) . . . . . . . . . . . . . . . . . . . . . . . . . . 26Real-Time Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27vii

9314fmfinal.qxdviii11/15/071:37 PMPage viii CONTENTS CHAPTER 2Data Warehouse Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Data Flow Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Single DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33NDS DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35ODS DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Federated Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 CHAPTER 3Data Warehouse Development Methodology . . . . . . . . . . . . . . 49Waterfall Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Iterative Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 CHAPTER 4Functional and Nonfunctional Requirements . . . . . . . . . . . . . 61Identifying Business Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Understanding Business Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Defining Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Defining Nonfunctional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Conducting a Data Feasibility Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 CHAPTER 5Data Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Designing the Dimensional Data Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Dimension Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Date Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Slowly Changing Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80Product, Customer, and Store Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 83Subscription Sales Data Mart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Supplier Performance Data Mart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94CRM Data Marts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96Data Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101Source System Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102Designing the Normalized Data Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

9314fmfinal.qxd11/15/071:37 PMPage ix CONTENTS CHAPTER 6Physical Database Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113Hardware Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113Storage Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120Configuring Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Creating DDS Database Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128Creating the Normalized Data Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139Using Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157Summary Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 CHAPTER 7Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173Introduction to ETL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173ETL Approaches and Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174General Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177Extracting Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180Whole Table Every Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180Incremental Extract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181Fixed Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185Related Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186Testing Data Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187Extracting File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187Extracting Other Source Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190Extracting Data Using SSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191Memorizing the Last Extraction Timestamp . . . . . . . . . . . . . . . . . . . . . . . . 200Extracting from Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 CHAPTER 8Populating the Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . 215Stage Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216Data Firewall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218Populating NDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219Using SSIS to Populate NDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228Upsert Using SQL and Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242Practical Tips on SSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249ix

9314fmfinal.qxdx11/15/071:37 PMPage x CONTENTSPopulating DDS Dimension Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250Populating DDS Fact Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266Batches, Mini-batches, and Near Real-Time ETL . . . . . . . . . . . . . . . . . . . 269Pushing the Data In . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 CHAPTER 9Assuring Data Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273Data Quality Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274Data Cleansing and Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277Cross-checking with External Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290Data Quality Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291Action: Reject, Allow, Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293Logging and Auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296Data Quality Reports and Notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 CHAPTER 10 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301Metadata in Data Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301Data Definition and Mapping Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303Data Structure Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308Source System Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313ETL Process Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318Data Quality Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320Audit Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323Usage Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324Maintaining Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 CHAPTER 11 Building Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329Data Warehouse Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329When to Use Reports and When Not to Use Them. . . . . . . . . . . . . . . . . . . 332Report Wizard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334Report Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340Report Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342Grouping, Sorting, and Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351Simplicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356Spreadsheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357Multidimensional Database Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362Deploying Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

9314fmfinal.qxd11/15/071:37 PMPage xi CONTENTSManaging Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370Managing Report Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370Managing Report Subscriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372Managing Report Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 CHAPTER 12 Multidimensional Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377What a Multidimensional Database Is . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377Online Analytical Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380Creating a Multidimensional Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381Processing a Multidimensional Database . . . . . . . . . . . . . . . . . . . . . . . . . . 388Querying a Multidimensional Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394Administering a Multidimensional Database . . . . . . . . . . . . . . . . . . . . . . . 396Multidimensional Database Security . . . . . . . . . . . . . . . . . . . . . . . . . 397Processing Cubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399Backup and Restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 CHAPTER 13 Using Data Warehouse for Business Intelligence . . . . . . . . 411Business Intelligence Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412Business Intelligence Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413Business Intelligence Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416Business Intelligence Dashboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432Business Intelligence Alerts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437Business Intelligence Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 CHAPTER 14 Using Data Warehouse for CustomerRelationship Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441Single Customer View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442Campaign Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447Permission Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450Delivery and Response Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454Customer Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460Customer Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463Personalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464Customer Loyalty Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466xi

9314fmfinal.qxdxii11/15/071:37 PMPage xii CONTENTS CHAPTER 15 Other Data Warehouse Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467Customer Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467Unstructured Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470Search in Data Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476 CHAPTER 16 Testing Your Data Warehouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477Data Warehouse ETL Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478Functional Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482Security Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485User Acceptance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486End-to-End Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487Migrating to Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 CHAPTER 17 Data Warehouse Administration . . . . . . . . . . . . . . . . . . . . . . . . . . 491Monitoring Data Warehouse ETL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492Monitoring Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495Managing Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498Managing Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499Making Schema Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501Updating Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 APPENDIXNormalization Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509

9314fmfinal.qxd11/15/071:37 PMPage xiiiAbout the Author VINCENT RAINARDI is a data warehouse architect and developer with morethan 12 years of experience in IT. He started working with data warehousing in 1996 when he was working for Accenture. He has been workingwith Microsoft SQL Server since 2000. He worked for Lastminute.com(part of the Travelocity group) until October 2007. He now works as a datawarehousing consultant in London specializing in SQL Server. He is amember of The Data Warehousing Institute (TDWI) and regularly writesdata warehousing articles for SQLServerCentral.com.xiii

9314fmfinal.qxd11/15/071:37 PMPage xiv

9314fmfinal.qxd11/15/071:37 PMPage xvPrefaceFriends and colleagues who want to start learning data warehousing sometimes ask me torecommend a practical book about the subject matter. They are not new to the databaseworld; most of them are either DBAs or developers/consultants, but they have never built adata warehouse. They want a book that is practical and aimed at beginners, one that containsall the basic essentials. There are many data warehousing books on the market, but they usually cover a specialized topic such as clickstream, ETL, dimensional modeling, data mining,OLAP, or project management and therefore a beginner would need to buy five to six books tounderstand the complete spectrum of data warehousing. Other books cover multiple aspects,but they are not as practical as they need to be, targeting executives and project managersinstead of DBAs and developers.Because of that void, I took a pen (well, a laptop really) and spent a whole year writingin order to provide a practical, down-to-earth book containing all the essential subjects ofbuilding a data warehouse, with many examples and illustrations from projects that are easyto understand. The book can be used to build your first data warehouse straightaway; it covers all aspects of data warehousing, including approach, architecture, data modeling, ETL,data quality, and OLAP. I also describe some practical issues that I have encountered in myexperience—issues that you’ll also likely encounter in your first data warehousing project—along with the solutions.It is not possible to show examples, code, and illustrations for all the different databaseplatforms, so I had to choose a specific platform. Oracle and SQL Server provide completeend-to-end solutions including the database, ETL, reporting, and OLAP, and after discussionswith my editor, we decided to base the examples on SQL Server 2005, while also making themapplicable to future versions of SQL Server such as 2008. I apologize in advance that the examples do not run on SQL Server 2000; there is just too big a gap in terms of data warehousingfacilities, such as SSIS, between 2000 and 2005.Throughout this book, together we will be designing and building a data warehouse fora case study called Amadeus Entertainment. A data warehouse consist of many parts, such asthe data model, physical databases, ETL, data quality, metadata, cube, application, and so on.In each chapter, I will cover each part one by one. I will cover the theory related to that part,and then I will show how to build that part for the case study. Specifically, Chapter 1 introduceswhat a data warehouse is and what the benefits are. In Chapters 2–6, we will design the architecture, define the requirements, and create the data model and physical databases, includingthe SQL Server configuration. In Chapters 7–10 we will populate the data stores using SSIS, aswell as discuss data quality and metadata. Chapters 11–12 are about getting the data out byusing Reporting Services and Analysis Services cubes. In Chapters 13–15, I’ll discuss the application of data warehouse for BI and CRM as well as CDI, unstructured data, and search. I closethe book with testing and administering a data warehouse in Chapters 16–17.xv

9314fmfinal.qxdxvi11/15/071:37 PMPage xvi PREFACEThe supplementary material (available on the book’s download page on the Apress website, http://www.apress.com) provides all the necessary material to build the data warehousefor the case study. Specifically, it contains

Data Warehouse With Examples in SQL Server 9314fmfinal.qxd 11/15/07 1:37 PM Page i. Building a Data Warehouse: With Examples in SQL Server . building a data warehouse, with many examples and illustrations from projects that are easy to understand. The book can be used to build your first data warehouse straightaway; it cov-