The Data Warehouse Toolkit : The Definitive Guide To .

Transcription

The DataWarehouseToolkitThe Definitive Guide toDimensionalModelingThird EditionRalph KimballMargy RossWiley

Contents1DataWarehousing,Modeling PrimerBusinessand Dimensional1Different Worlds of DataGoals of DataIntelligence,Capture and Data AnalysisWarehousingand Business23IntelligencePublishing Metaphor for DW/BI ManagersDimensionalModeling5Introduction7Star Schemas Versus OLAP Cubes8Fact Tables for Measurements10Dimension Tables for13Descriptive ContextFacts and DimensionsJoinedinaStar Schema16Kimball's DW/BI ArchitectureOperationalSource1818SystemsExtract, Transformation, and Load SystemPresentation Area nce ApplicationsRestaurantMetaphor for22the Kimball Architecture23Alternative DW/BI ArchitecturesIndependent26Data Mart Architecture26Hub-and-Spoke Corporate Information FactoryHybrid Hub-and-SpokeDimensionalInmon Architectureand Kimball Architecture.2829Modeling Myths301: Dimensional ModelsareOnlyMyth 2: Dimensional ModelsareDepartmental, Not Enterprise313: Dimensional ModelsareNot Scalable31Myth 4: Dimensional ModelsareOnly for Predictable Usage31MythMythMyth5: Dimensional Models Can't BeMore Reasons to maryIntegratedData3032323435

Contents2 Kimball Dimensional Modeling Techniques OverviewFundamentalConceptsGather Business3737Requirements and Data RealitiesCollaborative Dimensional37Modeling Workshops38Four-Step Dimensional Design Process38Business Processes39Grain39Dimensions forDescriptive Context40Facts for Measurements40Star Schemas and OLAP Cubes40Graceful Extensions to Dimensional Models41Basic Fact TableTechniques41Fact Table Structure41Additive, Semi-Additive, Non-AdditiveFacts42Nulls in Fact Tables42Conformed Facts42Transaction Fact Tables43Periodic43SnapshotFact TablesAccumulating Snapshot Fact Tables44Factless Fact Tables44,XAggregateFact Tables or OLAP CubesConsolidated Fact TablesBasic Dimension TableTechniques454546Dimension Table Structure46Dimension46Surrogate KeysNatural, Durable, and Supernatural Keys46Drilling47DownDegenerate Dimensions47Denormalized Flattened Dimensions47MultipleFlagsHierarchies in Dimensionsand IndicatorsasTextual Attributes4848Null Attributes in Dimensions48Calendar Date Dimensions48Role-Playing Dimensions49junk49Dimensions

ContentsSnowflaked Dimensions50Outrigger Dimensions50Integration via Conformed Dimensions50Conformed Dimensions51Shrunken Dimensions51Drilling51AcrossValue Chain52EnterpriseData Warehouse Bus Architecture52EnterpriseData Warehouse Bus Matrix52DetailedImplementation Bus Matrix53Opportunity/Stakeholder Matrix53Dealing with Slowly Changing Dimension AttributesType0: Retain5354OriginalType 1: Overwrite542: Add New Row54Type 3: Add New Attribute55TypeType4: Add Mini-Dimension55Type 5: Add Mini-Dimension and Type 1 Outrigger551 Attributes to56Type6: AddTypeType2 DimensionType 7: Dual Type 1 and Type 2 DimensionsDealing withFixed56Dimension Hierarchies56Depth Positional Hierarchies56Slightly Ragged/Variable Depth Hierarchies57Ragged/Variable Depth Hierarchies with Hierarchy Bridge Tables57Hierarchies with57Ragged/Variable DepthAdvanced Fact TableFact TableCentipedePathstring Attributes58Techniques58Surrogate KeysFact TablesNumeric Valuesas58AttributesorFactsLag/Duration FactsHeader/Line5959Fact Tables59Allocated Facts60Profit and Loss Fact TablesUsingAllocations60Multiple Currency Facts60Multiple Units61of Measure Facts

xiiContents61Year-to-Date FactsMultipass SQL to AvoidTimespan TrackingFact-to-Fact TableJoins62in Fact Tables62Arriving FactsLateAdvanced Dimension62TechniquesDimension-to-Dimension TableMultivalued Dimensions andTime62JoinsBridgeTablesVarying Multivalued Bridge TablesBehavior Tag Time SeriesBehaviorStudy GroupsAggregated FactsDimension Attributesas6363636464Dynamic Value Bands64Text Comments Dimension65Multiple tepSwappableDimensions66Abstract Generic Dimensions66Audit Dimensions66LateArriving DimensionsSpecial Purpose361Schemas6767Supertype and Subtype Schemas for Heterogeneous Products67Real-Time Fact Tables68Error Event Schemas6869Retail SalesFour-Step Dimensional Design Process70Step1: Select the Business Process70Step2: Declare the Grain71Step 3: Identify theStep4:Retail CaseStep72Study721: Select the Business Process3:72Identify the FactsStep 2: Declare theStepDimensionsGrainIdentify the Dimensions747476

ContentsStep4:Identify the Facts76Dimension Table Details79Date Dimension79Product Dimension83Store Dimension87Promotion Dimension89Other Retail Sales Dimensions92Degenerate Dimensions for Transaction Numbers93Retail Schema in Action94Retail Schema95ExtensibilityFactless Fact Tables97Dimension and Fact TableDimension TableKeys98Surrogate Keys98Dimension Natural and DurableDegenerateDimensionDate Dimension SmartFact TableSupernatural Keys100Surrogate Keys101Keys101Surrogate Keys102Resisting Normalization Urges104Snowflake Schemas with Normalized Dimensions104Outriggers106CentipedeFact Tables with TooMany Dimensions108109Summary4 Inventory111Value Chain Introduction111Inventory Models112Inventory Periodic Snapshot113Inventory Transactions116Inventory Accumulating Snapshot118Fact Table119TypesTransaction Fact TablesPeriodicSnapshot120Fact TablesAccumulating Snapshot Fact TablesComplementaryFact TableTypes120121122xiii

ContentsValue ChainEnterpriseIntegration122Data Warehouse Bus ArchitectureUnderstanding theEnterpriseBus Architecture123124Data Warehouse Bus Matrix125Conformed Dimensions130Drilling Across130Fact TablesIdentical Conformed Dimensions131Shrunken132Rollup ConformedDimension with Attribute SubsetShrunken Conformed Dimension with Row Subset132Shrunken Conformed Dimensions134Limitedonthe Bus MatrixConformity135Importance of Data Governance and Stewardship135Conformed Dimensions and the137Agile MovementConformed Facts138Summary139Procurement141Procurement CaseStudy141Procurement Transactions and Bus MatrixSingleVersusMultipleComplementarySlowly Changing142Transaction Fact TablesProcurementSnapshotDimension Basics143147147Type 0: Retain Original148Type 1: Overwrite149Type2: Add New Row150Type3: Add New Attribute154Type4: Add Mini-Dimension156Hybrid Slowly Changing Dimension TechniquesType5: Mini-Dimension andType6: AddTypeType7: DualType 1 and TypeSlowly ChangingSummaryType1 Attributes toDimension1OutriggerType2 Dimension2 DimensionsRecap159160160162164165

ContentsOrderManagementOrder167Management Bus Matrix168Order Transactions168Fact NormalizationDimensionRole169Playing170Product Dimension Revisited172Customer Dimension174Deal Dimension177DegenerateDimension for Order Number178Junk Dimensions179Header/Line Pattern to Avoid181Multiple Currencies182Transaction Facts at DifferentGranularity184Another Header/Line Pattern to Avoid186Invoice Transactions187Service Level PerformanceasFacts,Dimensions,orBoth188Profit and Loss Facts189Audit Dimension192Accumulating Snapshot for Order Fulfillment Pipeline194Lag Calculations196Multiple Units of Measure197Beyond the Rearview Mirror198199SummaryAccounting201Accounting Case Study andGeneralBus MatrixLedger DataGeneralLedger202203PeriodicSnapshot203Chart of Accounts203Period Close204Year-to-Date Facts206Multiple CurrenciesGeneralRevisitedLedger JournalTransactions206206

xviContentsMultiple Fiscal Accounting rchyFinancial Statements209Budgeting Process210Dimension Attribute Hierarchies214Fixed214Depth Positional HierarchiesSlightly Ragged Variable Depth HierarchiesRaggedSharedTimeVariable214215Depth HierarchiesOwnership inRagged Hierarchya219220Varying Ragged HierarchiesModifying Ragged Hierarchies220Alternative221Ragged Hierarchy Modeling ApproachesAdvantages of the Bridge Table Approach for Ragged Hierarchies.223Consolidated Fact Tables224Role of OLAP and226Packaged Analytic Solutions227SummaryCustomer229Relationship ManagementCRM Overview230Operational and Analytic CRM231Customer Dimension AttributesName and Address233Parsing233International Name and Address Considerations236Customer-Centric Dates238Aggregated239FactsasDimension AttributesSegmentation Attributes and Scores240Counts with243Type2 DimensionChangesOutrigger for Low Cardinality ables for Multivalued DimensionsBridgeTable forSparseBridgeTable forMultiple Customer ContactsAttributesComplex Customer BehaviorBehaviorStudy Groups for243244245247248249Cohorts249

ContentsStep Dimension for Sequential BehaviorTimespan251Fact Tables252Tagging Fact Tables with Satisfaction IndicatorsTaggingFact Tables with Abnormal Scenario IndicatorsCustomer DataLow256Management CreatingaSingle CustomerConformity of Multiple Customer DimensionsAvoidingFact-to-Fact TableJoinsDimension.256258259Latency Reality Check260261SummaryHuman ResourcesEmployee255Integration ApproachesMaster DataPartial254ProfileProfile263TrackingPrecise Effective unt PeriodicasExpiration Timespans266TrackingType 2 Attributes265orFact Events267267SnapshotBus Matrix for HR Processes268Packaged Analytic Solutions and Data Models270Recursive271Employee HierarchiesChange TrackingDrilling Up andMultivalued SkillonEmbeddedManager KeyManagement ord Bridge275SkillKeyword Text String276Survey Questionnaire277Data278Text Comments279SummaryFinancial Services281Banking Case Study andDimensionTriageBus Matrixto Avoid Too Few Dimensions283286Household DimensionMultivalued Dimensions and282Weighting Factors287xvii

xviii ContentsMini-Dimensions RevisitedAddingMini-Dimension toaDynamic Value Banding289aBridge Tableof Facts291Supertype and Subtype Schemas for Heterogeneous ProductsSupertype and Subtype11290Products with Common Facts293295HotSwappable mmunications CaseGeneralStudy andBus MatrixDesign Review ConsiderationsBalance BusinessRequirementsand Source Realities297299300Focus on Business Processes300Granularity300Single Granularity for Facts301Dimension301Granularity and HierarchiesDate Dimension302Degenerate Dimensions303Surrogate Keys303Dimension Decodes andDescriptionsConformity Commitment303304Design Review Guidelines304Draft306DesignExercise DiscussionRemodeling ExistingData StructuresGeographic LocationDimensionSummary30931031012 TransportationAirline CaseStudy311and Bus Matrix311Multiple Fact Table Granularities312Linking Segments315intoTripsRelated Fact TablesExtensionstoOther Industries316317Cargo Shipper317Travel Services317

ContentsCombining Correlated Dimensions318Class of Service319Origin and Destination320More Date and Time Considerations321Country-Specific Calendars321Date and Time inLocalizationMultipleasOutriggersTime Zones324Recap324Summary13 EducationUniversity323325CaseStudyand Bus Matrix325Fact Tables326Accumulating SnapshotApplicant Pipeline326Research Grant329Proposal PipelineFactless Fact Tables329Admissions Events330Course330RegistrationsFacility Utilization334Student Attendance335More Educational336Analytic OpportunitiesSummary33614 Healthcare339Healthcare CaseClaimsBillingStudy and Bus Matrixand342PaymentsDate Dimension RoleMultivalued339345Playing345DiagnosesSupertypes and Subtypes for ChargesElectronic Medical RecordsMeasureType Dimension for Sparse347348Facts349Freeform Text Comments350Images350Facility/Equipment Inventory Utilization351Dealing with351SummaryRetroactiveChanges352

Contents15 Electronic Commerce353Clickstream Source Data353Clickstream DataChallenges354Clickstream Dimensional Models357Page358DimensionEvent Dimension359Session Dimension359Referral Dimension360Clickstream Session Fact Table361Clickstream Page Event Fact Table363Step Dimension366Aggregate Clickstream Fact Tables366Google Analytics367Integrating Clickstream into Web Retailer's Bus MatrixProfitabilityAcross Channels368Including Web370373Summary16 Insurance375Insurance Case376StudyInsurance Value Chain377,XXDraft Bus Matrix378Policy TransactionsDimension Role379PlayingSlowly Changing380DimensionsMini-Dimensions forLargeor380Rapidly ChangingDimensions381Multivalued Dimension Attributes382Numeric Attributes as Facts or Dimensions382Degenerate Dimension383LowCardinalityDimension TablesAudit DimensionPolicy Transaction383Fact TableHeterogeneous Supertypeand383Subtype ProductsComplementary Policy Accumulating SnapshotPremium Periodic383384384Snapshot385Conformed Dimensions386Conformed Facts386

ContentsPay-in-AdvanceFacts386Heterogeneous Supertypes and Subtypes Revisited387Multivalued Dimensions Revisited388More Insurance CaseStudy BackgroundUpdated Insurance BusDetailed388Matrix389Implementation Bus Matrix390Claim Transactions390Transaction Versus ProfileClaimJunk Dimensions392Accumulating SnapshotAccumulating Snapshot392forComplex Workflows393Timespan Accumulating Snapshot394Periodic Instead of395Accumulating SnapshotPolicy/Claim Consolidated Periodic SnapshotFactlessAccidentEventsCommon Dimensional396Modeling MistakesMistake 10: Place Text Attributes inMistake 9: Limit VerboseMistake 8:395SplitaDescriptorsHierarchies intoto Avoid397Fact Table397to Save398SpaceMultiple DimensionsMistake 7: Ignore the Need to Track Dimension398Changes398Mistake 6: Solve All Performance Problems with More Hardware.Mistake 5: UseMistake 4:NeglectMistake 3: UseMistake 2:Operational Keystoto Declare andJoin Dimensions andComplyFactswith the Fact Grain.399399399ReporttoDesign the Dimensional Model400Expect UserstoQuery Normalized Atomic Data400aMistake 1: Fail to Conform Facts and Dimensions400401Summary17 Kimball DW/BI Lifecycle Overview403Lifecycle Roadmap404Roadmap405LifecycleMile MarkersLaunch Activities406Program/Project PlanningBusinessandManagementRequirements DefinitionLifecycle Technology TrackTechnical Architecture406410416DesignProduct Selection and Installation416418xxi

xxii ContentsLifecycle Data TrackDimensional420Modeling420420Physical DesignDesign and Dev

TheData Warehouse Toolkit The Definitive Guideto Dimensional Modeling ThirdEdition RalphKimball MargyRoss Wiley