Data Warehouse: The Choice Of Inmon Versus Kimball

Transcription

IAS IncData Warehouse:The Choice ofInmon versus KimballIan AbramsonIAS Inc.

IAS IncAgenda The 2 Approaches BillInmon – Enterprise Warehouse (CIF) Ralph Kimball – Dimensional DesignSimilarities Differences Choices

IAS IncDW History 1990 1996 Inmon publishes “Building the Data Warehouse”Kimball publishes “The Data Warehouse Toolkit”2002 Inmon updates book and defines architecture for collection ofdisparate sources into detailed, time variant data store. The top down approachKimball updates book and defines multiple databases called datamarts that are organized by business processes, but useenterprise standard data bus The bottom-up approach

IAS IncThe Data Warehouse Is: Bill Inmon, an early and influential practitioner, has formally defined adata warehouse in the following terms; Subject-oriented Time-variant The database contains data from most or all of an organization's operationalapplications, and that this data is made consistentRalph Kimball, a leading proponent of the dimensional approach tobuilding data warehouses, provides a succinct definition for a datawarehouse: “A copy of transaction data specifically structured for query and analysis.“Ref: wikipedia4Data in the database is never over-written or deleted - once committed, the datais static, read-only, but retained for future reporting; andIntegrated The changes to the data in the database are tracked and recorded so that reportscan be produced showing changes over time;Non-volatile The data in the database is organized so that all the data elements relating to thesame real-world event or object are linked together;

IAS IncWhat are they saying? These two influential data warehousing experts represent thecurrent prevailing views on data warehousing. Kimball, in 1997, stated that Inmon responded in 1998 by saying, 5".the data warehouse is nothing more than the union of all the datamarts",Kimball indicates a bottom-up data warehousing methodology in whichindividual data marts providing thin views into the organizational datacould be created and later combined into a larger all-encompassingdata warehouse."You can catch all the minnows in the ocean and stack them togetherand they still do not make a whale,"This indicates the opposing view that the data warehouse should bedesigned from the top-down to include all corporate data. In thismethodology, data marts are created only after the complete datawarehouse has been created.

IAS IncWhat is a Data Warehouse: The single organizational repository ofenterprise wide data across many or alllines of business and subject areas. Containsmassive and integrated data Represents the complete organizational viewof information needed to run and understandthe business

IAS IncWhat is a Data Mart? The specific, subject oriented, or departmentalview of information from the organization.Generally these are built to satisfy userrequirements for information Multipledata marts for one organization A data mart is built using dimensional modeling More focused Generally smaller, selected facts and dimensions Integrated

IAS IncData Warehouses vs.Data Marts DataWarehouses: Scopeindependent Centralized or Enterprise Planned Data Specificapplication Decentralized by group Organic but may be planned Data Historical,detailed, summary Some denormalization Subjects Somehistory, detailed, summary High denormalization Subjects Multiplesubjects Source Singlecentral subject area Sourceinternal and external sources Other Fewinternal and external sources Other Flexible Restrictive Data Projectoriented Long life Single complex structure8Marts: Scope Application Many Dataoriented Short life Multiple simple structures that mayform a complex structure

IAS IncThe Inmon Model Consists of all databases and informationsystems in an organization . The CIF (Corporate Information Factory)Defines overall database environment as: Operational Atomicdata warehouse Departmental Individual The Warehouse is part of the bigger whole (CIF)

IAS IncThe Data WarehouseOperational(Day-to-Day Operations)* Transactions *Atomic Data Warehouse(Data manipulated & moved)* Transactions *Departmental(Focused)* Source is ADW *Individual(Ad hoc)* Source is ADW *Customer Credit RatingCustomer Credit HistoryCustomer by Postal CodeDelinquent Customers

IAS IncInmon Modeling Three levels of data modeling ERD (Entity Relationship Diagram) Refines entities, attributes and relationshipsMid-Level model (*DIS*) Data Item SetsData sets by departmentFour constructs: Primary data groupingsSecondary data groupingsConnectors“Type of” dataPhysical data model Optimize for performance (de-normalize)

IAS IncRelationship between Levels One and Twoof Inmon's Data model (Inmon,2002)

IAS IncThe Warehouse Architecture

IAS IncThe Inmon WarehouseData SourcesStagingThe Data WarehouseData AccessSourceDB 1SourceDB 2File orExternalData14LandingStagingAreaData MartsCubes

IAS IncThe Kimball Approach The Dimensional Data Model Starts with tablesFactsDimensions Factscontain metrics Dimensions contain attributes May contain repeating groups Doesnot adhere to normalization theory User accessible

IAS IncThe Kimball Data LifecycleData SourcesStagingThe Data WarehouseData AccessSourceDB 1Workstation GroupSourceDB 2File orExternalData16LandingStagingAreaEnd UsersCubes

IAS IncThe Dimensional Model

IAS IncThe Kimball Data Bus Data is moved to staging area Data is scrubbed and made consistentFrom Staging Data Marts are createdData Marts are based on a single processSum of the data marts can constitute anEnterprise Data WarehouseConformed dimensions are the key to success

IAS IncThe Kimball Design ApproachSelect business process Declare the grain Choose dimensions Identify facts (metrics)

IAS IncKimball’s PhilosophyMake data easily accessible Present the organization’s informationconsistently Be adaptive and resilient to change Protect information Service as the foundation for improveddecision making.

IAS IncGetting Started with Choices Kimball Willstart with data marts Focused on quick delivery to users Inmon Willfocus on the enterprise Organizational focus

IAS IncKimball vs. Inmon Inmon: Kimball ntTop-DownIntegration Achieved via an Assumed Enterprise Data ModelCharacterizes Data marts as AggregatesBusiness-Process-OrientedBottom-Up and EvolutionaryStresses Dimensional Model, Not E-RIntegration Achieved via Conformed DimensionsStar Schemas Enforce Query Semantics

IAS IncThe Comparison(Methodology and Architecture)InmonKimballOverall approachTop-downBottom-upArchitectural structureEnterprise-wide DWfeeds departmental DBsData marts model abusiness process;enterprise is achievedwith conformed dimsComplexity of methodQuite complexFairly simpleReference: http://www.bi-bestpractices.com/view-articles/4768

IAS IncThe Comparison(Data Modeling)InmonKimballData orientationSubject or data drivenProcess orientedToolsTraditional (ERDs andDIS)Dimensional modeling;departs from traditionalrelational modelingEnd user accessibilityLowHighReference: http://www.bi-bestpractices.com/view-articles/4768

IAS IncThe ous & DiscreteSlowly ChangingMethodsTimestampsDimension keysReference: http://www.bi-bestpractices.com/view-articles/4768

IAS IncInmon Continuous & DiscreteDimension Management Define data management via dates in yourdata ContinuoustimeWhen is a record active Start and end dates DiscretetimeA point in time Snapshot

IAS IncKimball Slowly ChangingDimension Management Define data management via versioning Type Change record as requiredNo History Type IIManage all changesHistory is recorded Type IIIISome history is parallelLimit to defined history

IAS IncThe Comparison(Philosophy)InmonKimballPrimary AudienceITEnd UsersPlace in theOrganizationIntegral part of theCorporate InformationFactory (CIF)Transformer and retainerof operational dataObjectiveDeliver a sound technicalsolution based on provenmethodsDeliver a solution thatmakes it easy for endusers to directly querydata and still havereasonable response rateReference: http://www.bi-bestpractices.com/view-articles/4768

IAS IncHow to Choose?Characteristic Favors Kimball Favors InmonNature of theorganization'sdecision supportrequirementsTacticalStrategicData integrationrequirementsIndividual businessareasEnterprise-wide integrationStructure of dataBusiness metrics,performancemeasures, andscorecardsNon-metric data and for data thatwill be applied to meet multipleand varied information needsScalabilityNeed to adapt toGrowing scope and changinghighly volatile needs requirements are criticalwithin a limited scope

IAS IncHow to Choose?Characteristic Favors Kimball Favors InmonPersistency of dataSource systems arerelatively stableHigh rate of change from sourcesystemsStaffing and skillsrequirementsSmall teams ofgeneralistsLarger team(s) of specialistsTime to deliveryNeed for the firstdata warehouseapplication is urgentOrganization's requirements allowfor longer start-up timeCost to deployLower start-up costs,with eachsubsequent projectcosting about thesameHigher start-up costs, with lowersubsequent project developmentcosts

IAS IncReferences Data Warehousing Battle of the Giants: Comparing the Basics of theKimball and Inmon Models: by Mary Breslin http://www.bi-bestpractices.com/view-articles/4768 Inmon CIF glossary: http://www.inmoncif.com/library/glossary/#D The Data Warehouse Toolkit, Kimball, 2002Inmon, W.H. Building the Data Warehouse (Third Edition), NewYork: John Wiley & Sons, (2002).Kimball, R. and M. Ross. The Data Warehouse Toolkit: TheComplete Guide to Dimensional Modeling (Second Edition), NewYork: John Wiley & Sons, 2000.

IAS IncThanks and Questions? Ian AbramsonIAS Inc416-407-2448ian@abramson.ca

The Data Warehouse Toolkit, Kimball, 2002 Inmon, W.H. Building the Data Warehouse (Third Edition), New York: John Wiley & Sons, (2002). Kimball, R. and M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition), New York: John Wiley & Sons, 2000. IAS Inc Thanks and Questions? Ian Abramson IAS Inc 416-407-2448 ian@abramson.ca. Title: