DATA WAREHOUSING PROFESSIONALS - Baylor University

Transcription

DATA WAREHOUSINGFUNDAMENTALS FOR ITPROFESSIONALSSecond EditionPAULRAJ PONNIAH

DATA WAREHOUSINGFUNDAMENTALS FOR ITPROFESSIONALS

DATA WAREHOUSINGFUNDAMENTALS FOR ITPROFESSIONALSSecond EditionPAULRAJ PONNIAH

Copyright # 2010 by John Wiley & Sons, Inc. All rights reservedPublished by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in CanadaNo part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by anymeans, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted underSection 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of thePublisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center,Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department,John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or onlineat http://www.wiley.com/go/permission.Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts inpreparing this book, they make no representations or warranties with respect to the accuracy or completenessof the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for aparticular purpose. No warranty may be created or extended by sales representatives or written sales materials.The advice and strategies contained herein may not be suitable for your situation. You should consult with aprofessional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or anyother commercial damages, including but not limited to special, incidental, consequential, or other damages.For general information on our other products and services or for technical support, please contact our CustomerCare Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 orfax (317) 572-4002.Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not beavailable in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.Library of Congress Cataloging-in-Publication Data:Ponniah, Paulraj.Data warehousing fundamentals for IT professionals / Paulraj Ponniah.—2nd ed.p. cm.Previous ed. published under title: Data warehousing fundamentals.Includes bibliographical references and index.ISBN 978-0-470-46207-2 (cloth)1. Data warehousing. I. Ponniah, Paulraj. Data warehousing fundamentals. II. Title.QA76.9.D37P66 2010005.740 5—dc222009041789Printed in the United States of America10 9 87 65 4 32 1

ToVimala, my loving wifeand toJoseph, David, and Shobi,my dear children

CONTENTSPREFACEPART 11xxvOVERVIEW AND CONCEPTSTHE COMPELLING NEED FOR DATA WAREHOUSING13CHAPTER OBJECTIVES / 3ESCALATING NEED FOR STRATEGIC INFORMATION / 4The Information Crisis / 6Technology Trends / 6Opportunities and Risks / 8FAILURES OF PAST DECISION-SUPPORT SYSTEMS / 9History of Decision-Support Systems / 10Inability to Provide Information / 10OPERATIONAL VERSUS DECISION-SUPPORT SYSTEMS / 11Making the Wheels of Business Turn / 12Watching the Wheels of Business Turn / 12Different Scope, Different Purposes / 12DATA WAREHOUSING—THE ONLY VIABLE SOLUTION / 13A New Type of System Environment / 13Processing Requirements in the New Environment / 14Strategic Information from the Data Warehouse / 14vii

viiiCONTENTSDATA WAREHOUSE DEFINED / 15A Simple Concept for Information Delivery / 15An Environment, Not a Product / 15A Blend of Many Technologies / 16THE DATA WAREHOUSING MOVEMENT / 17Data Warehousing Milestones / 17Initial Challenges / 18EVOLUTION OF BUSINESS INTELLIGENCE / 18BI: Two Environments / 19BI: Data Warehousing and Analytics / 19CHAPTER SUMMARY / 20REVIEW QUESTIONS / 20EXERCISES / 212DATA WAREHOUSE: THE BUILDING BLOCKSCHAPTER OBJECTIVES / 23DEFINING FEATURES / 24Subject-Oriented Data / 24Integrated Data / 25Time-Variant Data / 26Nonvolatile Data / 27Data Granularity / 28DATA WAREHOUSES AND DATA MARTS / 29How Are They Different? / 29Top-Down Versus Bottom-Up Approach / 29A Practical Approach / 31ARCHITECTURAL TYPES / 32Centralized Data Warehouse / 32Independent Data Marts / 32Federated / 33Hub-and-Spoke / 33Data-Mart Bus / 34OVERVIEW OF THE COMPONENTS / 34Source Data Component / 34Data Staging Component / 37Data Storage Component / 39Information Delivery Component / 40Metadata Component / 41Management and Control Component / 4123

CONTENTSixMETADATA IN THE DATA WAREHOUSE / 41Types of Metadata / 42Special Significance / 42CHAPTER SUMMARY / 42REVIEW QUESTIONS / 43EXERCISES / 433TRENDS IN DATA WAREHOUSINGCHAPTER OBJECTIVES / 45CONTINUED GROWTH IN DATA WAREHOUSING / 46Data Warehousing has Become Mainstream / 46Data Warehouse Expansion / 47Vendor Solutions and Products / 48SIGNIFICANT TRENDS / 50Real-Time Data Warehousing / 50Multiple Data Types / 50Data Visualization / 52Parallel Processing / 54Data Warehouse Appliances / 56Query Tools / 56Browser Tools / 57Data Fusion / 57Data Integration / 58Analytics / 59Agent Technology / 59Syndicated Data / 60Data Warehousing and ERP / 60Data Warehousing and KM / 61Data Warehousing and CRM / 63Agile Development / 63Active Data Warehousing / 64EMERGENCE OF STANDARDS / 64Metadata / 65OLAP / 65WEB-ENABLED DATA WAREHOUSE / 66The Warehouse to the Web / 67The Web to the Warehouse / 67The Web-Enabled Configuration / 69CHAPTER SUMMARY / 6945

xCONTENTSREVIEW QUESTIONS / 69EXERCISES / 70PART 24PLANNING AND REQUIREMENTSPLANNING AND PROJECT MANAGEMENT7173CHAPTER OBJECTIVES / 73PLANNING YOUR DATA WAREHOUSE / 74Key Issues / 74Business Requirements, Not Technology / 76Top Management Support / 77Justifying Your Data Warehouse / 77The Overall Plan / 78THE DATA WAREHOUSE PROJECT / 79How is it Different? / 79Assessment of Readiness / 81The Life-Cycle Approach / 81THE DEVELOPMENT PHASES / 83Adopting Agile Development / 84THE PROJECT TEAM / 85Organizing the Project Team / 85Roles and Responsibilities / 86Skills and Experience Levels / 87User Participation / 88PROJECT MANAGEMENT CONSIDERATIONS / 90Guiding Principles / 91Warning Signs / 92Success Factors / 92Anatomy of a Successful Project / 93Adopt a Practical Approach / 94CHAPTER SUMMARY / 96REVIEW QUESTIONS / 96EXERCISES / 975DEFINING THE BUSINESS REQUIREMENTSCHAPTER OBJECTIVES / 99DIMENSIONAL ANALYSIS / 100Usage of Information Unpredictable / 100Dimensional Nature of Business Data / 101Examples of Business Dimensions / 10299

CONTENTSxiINFORMATION PACKAGES—A USEFUL CONCEPT / 103Requirements Not Fully Determinate / 104Business Dimensions / 105Dimension Hierarchies and Categories / 106Key Business Metrics or Facts / 107REQUIREMENTS GATHERING METHODS / 109Types of Questions / 110Arrangement of Questions / 111Interview Techniques / 111Adapting the JAD Methodology / 113Using Questionnaires / 115Review of Existing Documentation / 115REQUIREMENTS DEFINITION: SCOPE AND CONTENT / 116Data Sources / 117Data Transformation / 117Data Storage / 117Information Delivery / 118Information Package Diagrams / 118Requirements Definition Document Outline / 118CHAPTER SUMMARY / 119REVIEW QUESTIONS / 119EXERCISES / 1206REQUIREMENTS AS THE DRIVING FORCE FORDATA WAREHOUSINGCHAPTER OBJECTIVES / 121DATA DESIGN / 122Structure for Business Dimensions / 123Structure for Key Measurements / 124Levels of Detail / 125THE ARCHITECTURAL PLAN / 125Composition of the Components / 126Special Considerations / 127Tools and Products / 129DATA STORAGE SPECIFICATIONS / 131DBMS Selection / 132Storage Sizing / 132INFORMATION DELIVERY STRATEGY / 133Queries and Reports / 134Types of Analysis / 134Information Distribution / 135121

xiiCONTENTSReal Time Information Delivery / 135Decision Support Applications / 135Growth and Expansion / 136CHAPTER SUMMARY / 136REVIEW QUESTIONS / 136EXERCISES / 137PART 37ARCHITECTURE AND INFRASTRUCTUREARCHITECTURAL COMPONENTS139141CHAPTER OBJECTIVES / 141UNDERSTANDING DATA WAREHOUSE ARCHITECTURE / 141Architecture: Definitions / 142Architecture in Three Major Areas / 142DISTINGUISHING CHARACTERISTICS / 143Different Objectives and Scope / 144Data Content / 144Complex Analysis and Quick Response / 145Flexible and Dynamic / 145Metadata-Driven / 146ARCHITECTURAL FRAMEWORK / 146Architecture Supporting Flow of Data / 146The Management and Control Module / 147TECHNICAL ARCHITECTURE / 148Data Acquisition / 149Data Storage / 152Information Delivery / 154ARCHITECTURAL TYPES / 156Centralized Corporate Data Warehouse / 156Independent Data Marts / 156Federated / 159Hub-and-Spoke / 159Data-Mart Bus / 160CHAPTER SUMMARY / 160REVIEW QUESTIONS / 160EXERCISES / 1618INFRASTRUCTURE AS THE FOUNDATION FORDATA WAREHOUSINGCHAPTER OBJECTIVES / 163163

CONTENTSxiiiINFRASTRUCTURE SUPPORTING ARCHITECTURE / 164Operational Infrastructure / 165Physical Infrastructure / 165HARDWARE AND OPERATING SYSTEMS / 166Mainframes / 167Open System Servers / 168NT Servers / 168Platform Options / 168Server Hardware / 177DATABASE SOFTWARE / 181Parallel Processing Options / 182Selection of the DBMS / 184COLLECTION OF TOOLS / 184Architecture First, Then Tools / 186Data Modeling / 186Data Extraction / 187Data Transformation / 187Data Loading / 187Data Quality / 187Queries and Reports / 187Dashboards / 187Scorecards / 187Online Analytical Processing (OLAP) / 188Alert Systems / 188Middleware and Connectivity / 188Data Warehouse Administration / 188DATA WAREHOUSE APPLIANCES / 188Evolution of DW Appliances / 189Benefits of DW Appliances / 190CHAPTER SUMMARY / 191REVIEW QUESTIONS / 191EXERCISES / 1929THE SIGNIFICANT ROLE OF METADATACHAPTER OBJECTIVES / 193WHY METADATA IS IMPORTANT / 193A Critical Need in the Data Warehouse / 195Why Metadata Is Vital for End-Users / 198Why Metadata Is Essential for IT / 199Automation of Warehousing Tasks / 200Establishing the Context of Information / 202193

xivCONTENTSMETADATA TYPES BY FUNCTIONAL AREAS / 203Data Acquisition / 204Data Storage / 205Information Delivery / 206BUSINESS METADATA / 207Content Overview / 207Examples of Business Metadata / 208Content Highlights / 209Who Benefits? / 209TECHNICAL METADATA / 209Content Overview / 210Examples of Technical Metadata / 210Content Highlights / 211Who Benefits? / 211HOW TO PROVIDE METADATA / 212Metadata Requirements / 212Sources of Metadata / 214Challenges for Metadata Management / 215Metadata Repository / 215Metadata Integration and Standards / 217Implementation Options / 218CHAPTER SUMMARY / 219REVIEW QUESTIONS / 220EXERCISES / 220PART 4DATA DESIGN AND DATA PREPARATION10 PRINCIPLES OF DIMENSIONAL MODELINGCHAPTER OBJECTIVES / 225FROM REQUIREMENTS TO DATA DESIGN / 225Design Decisions / 226Dimensional Modeling Basics / 226E-R Modeling Versus Dimensional Modeling / 230Use of CASE Tools / 232THE STAR SCHEMA / 232Review of a Simple STAR Schema / 232Inside a Dimension Table / 234Inside the Fact Table / 236The Factless Fact Table / 238Data Granularity / 238223225

CONTENTSxvSTAR SCHEMA KEYS / 239Primary Keys / 239Surrogate Keys / 240Foreign Keys / 240ADVANTAGES OF THE STAR SCHEMA / 241Easy for Users to Understand / 241Optimizes Navigation / 242Most Suitable for Query Processing / 243STARjoin and STARindex / 244STAR SCHEMA: EXAMPLES / 244Video Rental / 244Supermarket / 244Wireless Phone Service / 244Auction Company / 244CHAPTER SUMMARY / 246REVIEW QUESTIONS / 247EXERCISES / 24711 DIMENSIONAL MODELING: ADVANCED TOPICSCHAPTER OBJECTIVES / 249UPDATES TO THE DIMENSION TABLES / 250Slowly Changing Dimensions / 250Type 1 Changes: Correction of Errors / 251Type 2 Changes: Preservation of History / 252Type 3 Changes: Tentative Soft Revisions / 253MISCELLANEOUS DIMENSIONS / 255Large Dimensions / 255Rapidly Changing Dimensions / 256Junk Dimensions / 258THE SNOWFLAKE SCHEMA / 259Options to Normalize / 259Advantages and Disadvantages / 260When to Snowflake / 262AGGREGATE FACT TABLES / 262Fact Table Sizes / 264Need for Aggregates / 266Aggregating Fact Tables / 266Aggregation Options / 271FAMILIES OF STARS / 272Snapshot and Transaction Tables / 273Core and Custom Tables / 274249

xviCONTENTSSupporting Enterprise Value Chain or Value Circle / 274Conforming Dimensions / 275Standardizing Facts / 276Summary of Family of STARS / 277CHAPTER SUMMARY / 277REVIEW QUESTIONS / 278EXERCISES / 27812 DATA EXTRACTION, TRANSFORMATION, AND LOADINGCHAPTER OBJECTIVES / 281ETL OVERVIEW / 282Most Important and Most Challenging / 282Time Consuming and Arduous / 283ETL REQUIREMENTS AND STEPS / 284Key Factors / 285DATA EXTRACTION / 286Source Identification / 287Data Extraction Techniques / 287Evaluation of the Techniques / 294DATA TRANSFORMATION / 295Data Transformation: Basic Tasks / 296Major Transformation Types / 297Data Integration and Consolidation / 299Transformation for Dimension Attributes / 301How to Implement Transformation / 301DATA LOADING / 302Applying Data: Techniques and Processes / 303Data Refresh Versus Update / 306Procedure for Dimension Tables / 306Fact Tables: History and Incremental Loads / 307ETL SUMMARY / 308ETL Tool Options / 308Reemphasizing ETL Metadata / 309ETL Summary and Approach / 310OTHER INTEGRATION APPROACHES / 311Enterprise Information Integration (EII) / 311Enterprise Application Integration (EAI) / 312CHAPTER SUMMARY / 313REVIEW QUESTIONS / 313EXERCISES / 314281

CONTENTS13 DATA QUALITY: A KEY TO SUCCESSxvii315CHAPTER OBJECTIVES / 315WHY IS DATA QUALITY CRITICAL? / 316What Is Data Quality? / 316Benefits of Improved Data Quality / 319Types of Data Quality Problems / 320DATA QUALITY CHALLENGES / 323Sources of Data Pollution / 323Validation of Names and Addresses / 325Costs of Poor Data Quality / 325DATA QUALITY TOOLS / 326Categories of Data Cleansing Tools / 327Error Discovery Features / 327Data Correction Features / 327The DBMS for Quality

Data Warehouse Administration / 188 DATAWAREHOUSE APPLIANCES / 188 Evolution of DWAppliances / 189 Benefits of DWAppliances / 190 CHAPTER SUMMARY / 191 REVIEW QUESTIONS / 191 EXERCISES / 192 9 THE SIGNIFICANT ROLE OF METADATA 193 CHAPTER OBJECTIVES / 193 WHY METADATA IS IMPORTANT / 193 A Critical Need in the Data Warehouse /