Concepts: Data Warehouse

Transcription

Concepts: Data WarehouseEric TremblayOracle warehouse.ca

ObjectiveDescribes the main steps inthe design of a datawarehouse. Presentstechniques for it's use andchallenges in it'sdevelopment.

Definition

Definition« A warehouse is asubject-oriented, integrated,time-variant and non-volatilecollection of data in supportof management's decisionmaking process »--- Bill Inmon

The Goals of aData Warehouse

The Goals of aData WarehouseAccess to company informationCoherent company informationA consistent, total and unifiedsight the company dataThe data published is stored forfast consultation

The Goals of aData WarehouseQuality of the information in the datawarehouseThe presentation tools to display theinformation is part of the data warehouseSupport business intelligence applications

Business IntelligenceConceptBusiness nowledgeInformationData

Business reDesignProjectPlanning enance& GrowthData Specification

How to build aData WarehouseIdentify the problems andthe business processesIdentify the grainIdentify the dimensionsIdentify the facts

Data WarehouseLifecycle onTechnicalArchitectureDesignProductSelection yticApplicationSpecificationData roject ManagementDeploymentMaintenanceand Growth

Elements of aData WarehouseDATASOURCESDATA WAREHOUSEDATA QUERYTOOLSQuery ToolsReportingAnalysisData Mining

Elements of aData AreaLoadDataWarehouseAccessDataAccessTools

Elements of aData AreaLoadDataWarehouseAccessDataAccessToolsFlow of information in aData Warehouse

ExtractionSourceStaging

TransformationStagingTransformationStaging

LoadingStaging

Dimension & Fact Table

Dimension Table

Dimension TableSimple primary keyTextual attributes rich andadapted to the userHierarchical reportsFew codes; Few codes; codesshould be decoded according totheir descriptionsRelatively small

resCode Postal6-ChiffresCode PostalClef d'entrepôtAdresseGrain de dimension

eekWarehouse KeyDayDimension Grain

MultidimensionalModelsStar & Snowflake Schema

Fact TableComposed Primary keyThe date/time is almost always akeyThe facts are usually numericalThe facts are in general additive

FactTableStar SchemaPromotion Dimensionpromotion key (PK)promotion nameCalendar Dimensioncalendar key (PK)day of weekmonthquarteryearStore Dimensionstore key (PK)store IDstore nameaddressProduct DimensionSales Fact Tablepromotion keycalendar key (FK)product key (FK)store key (FK)customer key (FK)dollars solddollars costunits soldproduct key (PK)SKUdescriptionbrandcategoryCustomer Dimensioncustomer key (PK)customer IDcustomer nameaddress

FactTableStar SchemaPromotion Dimensionpromotion key (PK)promotion nameCalendar Dimensioncalendar key (PK)day of weekmonthquarteryearStore Dimensionstore key (PK)store IDstore nameaddressProduct DimensionSales Fact Tablepromotion keycalendar key (FK)product key (FK)store key (FK)customer key (FK)dollars solddollars costunits soldproduct key (PK)SKUdescriptionbrandcategoryCustomer Dimensioncustomer key (PK)customer IDcustomer nameaddress

Time Dimensiontime key (PK)SQL dateday of weekweek numbermonthCourse Dimensioncourse key (PK)namedepartmentlevelcourse numberlaboratory flagFacility Dimensionfacility key (PK)typelocationdepartmentseatingsizeFactTableStar SchemaStudent AttendanceFact TableStudent Dimensionstudent key (PK)student IDnameaddressmajorminorfirst enrolledgraduation classtime key (FK)student key (FK)course key (FK)teacher key (FK)facility key (FK)attendance 1Teacher Dimensionteacher key (PK)employee IDnameaddressdepartmenttitledegree

Time Dimensiontime key (PK)SQL dateday of weekweek numbermonthFactTableStar SchemaStudent Dimensionstudent key (PK)student IDnameaddressmajorminorfirst enrolledgraduation classWhich classes were the most heavilyattended?Course Dimensioncourse key (PK)namedepartmentlevelcourse numberlaboratory flagFacility Dimensionfacility key (PK)typelocationdepartmentseatingsizeStudent AttendanceFact Tabletime key (FK)student key (FK)course key (FK)teacher key (FK)facility key (FK)attendance 1Teacher Dimensionteacher key (PK)employee IDnameaddressdepartmenttitledegree

Time Dimensiontime key (PK)SQL dateday of weekweek numbermonthFactTableStar SchemaStudent Dimensionstudent key (PK)student IDnameaddressmajorminorfirst enrolledgraduation classWhich classes were the most heavilyattended?Course DimensionStudent AttendanceFact TableWhichclasses were the most lightly used?course key (PK)namedepartmentlevelcourse numberlaboratory flagFacility Dimensionfacility key (PK)typelocationdepartmentseatingsizetime key (FK)student key (FK)course key (FK)teacher key (FK)facility key (FK)attendance 1Teacher Dimensionteacher key (PK)employee IDnameaddressdepartmenttitledegree

Time Dimensiontime key (PK)SQL dateday of weekweek numbermonthFactTableStar SchemaStudent Dimensionstudent key (PK)student IDnameaddressmajorminorfirst enrolledgraduation classWhich classes were the most heavilyattended?Course DimensionStudent AttendanceFact TableWhichclasses were the most lightly used?course key (PK)namedepartmentlevelcourse numberlaboratory flagtime key (FK)student key (FK)course key (FK)teacher key (FK)facility key (FK)attendance 1Which teachers taught the most students?Facility Dimensionfacility key (PK)typelocationdepartmentseatingsizeTeacher Dimensionteacher key (PK)employee IDnameaddressdepartmenttitledegree

Time Dimensiontime key (PK)SQL dateday of weekweek numbermonthFactTableStar SchemaStudent Dimensionstudent key (PK)student IDnameaddressmajorminorfirst enrolledgraduation classWhich classes were the most heavilyattended?Course DimensionStudent AttendanceFact TableWhichclasses were the most lightly used?course key (PK)namedepartmentlevelcourse numberlaboratory flagtime key (FK)student key (FK)course key (FK)teacher key (FK)facility key (FK)attendance 1Which teachers taught the most students?Teacher DimensionWhichteachers taught classes in facilitiesFacility Dimensionteacher key (PK)facility key (PK) to other departments?belongingemployee rtmenttitledegree

Calendar Dimensioncalendar key (PK)month key (FK)yearmonth key (PK)yearyear key (PK)Store Dimensionstore key (PK)store IDstore nomaddressFactTableSnowflake SchemaProduct Dimensionproduct key (PK)SKUdescriptionbrand key (FK)categorySales Fact Tablecalendar key (FK)product key (FK)store key (FK)customer key (FK)dollars solddollars costunits soldbrand key (PK)brand descriptionCustomer Dimensioncustomer key (PK)customer IDcustomer nameaddress

Slowly Changing Dimensions

Slowly Changing DimensionsType 1: Overwrite changed attribute.A fact is associated with only the current value of adimension column.Type 2: Add new dimension record.A fact is associated with only the original value of adimension column.Type 3: Use field for ʻoldʼ value.A fact is associated with both the original value andwith the current value of a dimension column.

Slowly Changing DimensionsType 1: Overwrite changed attribute.A fact is associated with only the current value of adimension column.Type 2: Add new dimension record.A fact is associated with only the original value of adimension column.Type 3: Use field for ʻoldʼ value.A fact is associated with both the original value andwith the current value of a dimension column.

Slowly Changing DimensionsType 1: Overwrite changed attribute.A fact is associated with only the current value of adimension column.Type 2: Add new dimension record.A fact is associated with only the original value of adimension column.Type 3: Use field for ʻoldʼ value.A fact is associated with both the original value andwith the current value of a dimension column.

Slowly Changing DimensionsScenario: Customer last name changes fromPharand to Smith: Update Cust.Lname

Slowly Changing Dimensions

Slowly Changing Dimensions

Normalisation

Normalisation3NF

Normalisation3NF

Normalisation

NormalisationSimplicityQuery PerformanceMinor disk space savingSlows down the users’ ability tobrowse

NormalisationSimplicityQuery PerformanceMinor disk space savingSlows down the users’ ability tobrowse

NormalisationSimplicityQuery PerformanceMinor disk space savingSlows down the users’ ability tobrowse

NormalisationSimplicityQuery PerformanceMinor disk space savingSlows down the users’ ability tobrowse

The DifferenceERDStar Schema(Star Schema)

The DifferenceData BaseData WarehouseActualHistoricalInternalInternal and alisedDimensionalDirtyClean and ConsistentDetailedDetailed andSummary

AdvancedanalysisAd Hoc Queryand AnalysisEnterprisereporting

Elements of aData ATIONExtractDATA ANALYSISTOOLSPortalERP DepartmentDataReportDisplayFilterCleanseDATA ACCESSTOOLSOLAPDistribute

Ralph KimballData WarehouseGuruwww.rkimball.com

Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca www.data-warehouse.ca. Objective Describes the main steps in the design of a data warehouse. Presents techniques for it's use and challenges in it's development. DeÞnition. DeÞnition « A warehouse is a