Transcription
Concepts: Data WarehouseEric TremblayOracle warehouse.ca
ObjectiveDescribes the main steps inthe design of a datawarehouse. Presentstechniques for it's use andchallenges in it'sdevelopment.
Definition
Definition« A warehouse is asubject-oriented, integrated,time-variant and non-volatilecollection of data in supportof management's decisionmaking process »--- Bill Inmon
The Goals of aData Warehouse
The Goals of aData WarehouseAccess to company informationCoherent company informationA consistent, total and unifiedsight the company dataThe data published is stored forfast consultation
The Goals of aData WarehouseQuality of the information in the datawarehouseThe presentation tools to display theinformation is part of the data warehouseSupport business intelligence applications
Business IntelligenceConceptBusiness nowledgeInformationData
Business reDesignProjectPlanning enance& GrowthData Specification
How to build aData WarehouseIdentify the problems andthe business processesIdentify the grainIdentify the dimensionsIdentify the facts
Data WarehouseLifecycle onTechnicalArchitectureDesignProductSelection yticApplicationSpecificationData roject ManagementDeploymentMaintenanceand Growth
Elements of aData WarehouseDATASOURCESDATA WAREHOUSEDATA QUERYTOOLSQuery ToolsReportingAnalysisData Mining
Elements of aData AreaLoadDataWarehouseAccessDataAccessTools
Elements of aData AreaLoadDataWarehouseAccessDataAccessToolsFlow of information in aData Warehouse
ExtractionSourceStaging
TransformationStagingTransformationStaging
LoadingStaging
Dimension & Fact Table
Dimension Table
Dimension TableSimple primary keyTextual attributes rich andadapted to the userHierarchical reportsFew codes; Few codes; codesshould be decoded according totheir descriptionsRelatively small
resCode Postal6-ChiffresCode PostalClef d'entrepôtAdresseGrain de dimension
eekWarehouse KeyDayDimension Grain
MultidimensionalModelsStar & Snowflake Schema
Fact TableComposed Primary keyThe date/time is almost always akeyThe facts are usually numericalThe facts are in general additive
FactTableStar SchemaPromotion Dimensionpromotion key (PK)promotion nameCalendar Dimensioncalendar key (PK)day of weekmonthquarteryearStore Dimensionstore key (PK)store IDstore nameaddressProduct DimensionSales Fact Tablepromotion keycalendar key (FK)product key (FK)store key (FK)customer key (FK)dollars solddollars costunits soldproduct key (PK)SKUdescriptionbrandcategoryCustomer Dimensioncustomer key (PK)customer IDcustomer nameaddress
FactTableStar SchemaPromotion Dimensionpromotion key (PK)promotion nameCalendar Dimensioncalendar key (PK)day of weekmonthquarteryearStore Dimensionstore key (PK)store IDstore nameaddressProduct DimensionSales Fact Tablepromotion keycalendar key (FK)product key (FK)store key (FK)customer key (FK)dollars solddollars costunits soldproduct key (PK)SKUdescriptionbrandcategoryCustomer Dimensioncustomer key (PK)customer IDcustomer nameaddress
Time Dimensiontime key (PK)SQL dateday of weekweek numbermonthCourse Dimensioncourse key (PK)namedepartmentlevelcourse numberlaboratory flagFacility Dimensionfacility key (PK)typelocationdepartmentseatingsizeFactTableStar SchemaStudent AttendanceFact TableStudent Dimensionstudent key (PK)student IDnameaddressmajorminorfirst enrolledgraduation classtime key (FK)student key (FK)course key (FK)teacher key (FK)facility key (FK)attendance 1Teacher Dimensionteacher key (PK)employee IDnameaddressdepartmenttitledegree
Time Dimensiontime key (PK)SQL dateday of weekweek numbermonthFactTableStar SchemaStudent Dimensionstudent key (PK)student IDnameaddressmajorminorfirst enrolledgraduation classWhich classes were the most heavilyattended?Course Dimensioncourse key (PK)namedepartmentlevelcourse numberlaboratory flagFacility Dimensionfacility key (PK)typelocationdepartmentseatingsizeStudent AttendanceFact Tabletime key (FK)student key (FK)course key (FK)teacher key (FK)facility key (FK)attendance 1Teacher Dimensionteacher key (PK)employee IDnameaddressdepartmenttitledegree
Time Dimensiontime key (PK)SQL dateday of weekweek numbermonthFactTableStar SchemaStudent Dimensionstudent key (PK)student IDnameaddressmajorminorfirst enrolledgraduation classWhich classes were the most heavilyattended?Course DimensionStudent AttendanceFact TableWhichclasses were the most lightly used?course key (PK)namedepartmentlevelcourse numberlaboratory flagFacility Dimensionfacility key (PK)typelocationdepartmentseatingsizetime key (FK)student key (FK)course key (FK)teacher key (FK)facility key (FK)attendance 1Teacher Dimensionteacher key (PK)employee IDnameaddressdepartmenttitledegree
Time Dimensiontime key (PK)SQL dateday of weekweek numbermonthFactTableStar SchemaStudent Dimensionstudent key (PK)student IDnameaddressmajorminorfirst enrolledgraduation classWhich classes were the most heavilyattended?Course DimensionStudent AttendanceFact TableWhichclasses were the most lightly used?course key (PK)namedepartmentlevelcourse numberlaboratory flagtime key (FK)student key (FK)course key (FK)teacher key (FK)facility key (FK)attendance 1Which teachers taught the most students?Facility Dimensionfacility key (PK)typelocationdepartmentseatingsizeTeacher Dimensionteacher key (PK)employee IDnameaddressdepartmenttitledegree
Time Dimensiontime key (PK)SQL dateday of weekweek numbermonthFactTableStar SchemaStudent Dimensionstudent key (PK)student IDnameaddressmajorminorfirst enrolledgraduation classWhich classes were the most heavilyattended?Course DimensionStudent AttendanceFact TableWhichclasses were the most lightly used?course key (PK)namedepartmentlevelcourse numberlaboratory flagtime key (FK)student key (FK)course key (FK)teacher key (FK)facility key (FK)attendance 1Which teachers taught the most students?Teacher DimensionWhichteachers taught classes in facilitiesFacility Dimensionteacher key (PK)facility key (PK) to other departments?belongingemployee rtmenttitledegree
Calendar Dimensioncalendar key (PK)month key (FK)yearmonth key (PK)yearyear key (PK)Store Dimensionstore key (PK)store IDstore nomaddressFactTableSnowflake SchemaProduct Dimensionproduct key (PK)SKUdescriptionbrand key (FK)categorySales Fact Tablecalendar key (FK)product key (FK)store key (FK)customer key (FK)dollars solddollars costunits soldbrand key (PK)brand descriptionCustomer Dimensioncustomer key (PK)customer IDcustomer nameaddress
Slowly Changing Dimensions
Slowly Changing DimensionsType 1: Overwrite changed attribute.A fact is associated with only the current value of adimension column.Type 2: Add new dimension record.A fact is associated with only the original value of adimension column.Type 3: Use field for ʻoldʼ value.A fact is associated with both the original value andwith the current value of a dimension column.
Slowly Changing DimensionsType 1: Overwrite changed attribute.A fact is associated with only the current value of adimension column.Type 2: Add new dimension record.A fact is associated with only the original value of adimension column.Type 3: Use field for ʻoldʼ value.A fact is associated with both the original value andwith the current value of a dimension column.
Slowly Changing DimensionsType 1: Overwrite changed attribute.A fact is associated with only the current value of adimension column.Type 2: Add new dimension record.A fact is associated with only the original value of adimension column.Type 3: Use field for ʻoldʼ value.A fact is associated with both the original value andwith the current value of a dimension column.
Slowly Changing DimensionsScenario: Customer last name changes fromPharand to Smith: Update Cust.Lname
Slowly Changing Dimensions
Slowly Changing Dimensions
Normalisation
Normalisation3NF
Normalisation3NF
Normalisation
NormalisationSimplicityQuery PerformanceMinor disk space savingSlows down the users’ ability tobrowse
NormalisationSimplicityQuery PerformanceMinor disk space savingSlows down the users’ ability tobrowse
NormalisationSimplicityQuery PerformanceMinor disk space savingSlows down the users’ ability tobrowse
NormalisationSimplicityQuery PerformanceMinor disk space savingSlows down the users’ ability tobrowse
The DifferenceERDStar Schema(Star Schema)
The DifferenceData BaseData WarehouseActualHistoricalInternalInternal and alisedDimensionalDirtyClean and ConsistentDetailedDetailed andSummary
AdvancedanalysisAd Hoc Queryand AnalysisEnterprisereporting
Elements of aData ATIONExtractDATA ANALYSISTOOLSPortalERP DepartmentDataReportDisplayFilterCleanseDATA ACCESSTOOLSOLAPDistribute
Ralph KimballData WarehouseGuruwww.rkimball.com
Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca www.data-warehouse.ca. Objective Describes the main steps in the design of a data warehouse. Presents techniques for it's use and challenges in it's development. DeÞnition. DeÞnition « A warehouse is a