Lecture @Dhbw: Data Warehouse Part Xii: Dimensional Modeling

Transcription

A company of Daimler AGLECTURE @DHBW: DATA WAREHOUSEPART XII: DIMENSIONAL MODELINGANDREAS BUCKENHOFER, DAIMLER TSS

ABOUT MEAndreas enior DB hemen/datenbank/in-memory/Since 2009 at Daimler TSSDepartment: Big DataBusiness Unit: Analyticshttp://wwwlehre.dhbw-stuttgart.de/ buckenhofer/https://www.xing.com/profile/Andreas Buckenhofer2

ANDREAS BUCKENHOFER, DAIMLER TSS GMBH“Forming good abstractions and avoiding complexityis an essential part of a successful data architecture”Data has always been my main focus during my long-time occupation in the area ofdata integration. I work for Daimler TSS as Database Professional and Data Architectwith over 20 years of experience in Data Warehouse projects. I am working withHadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn newthings, experiment, and program every day.I share my knowledge in internal presentations or as a speaker at internationalconferences. I'm regularly giving a full lecture on Data Warehousing and a seminar onmodern data architectures at Baden-Wuerttemberg Cooperative State UniversityDHBW. I also gained international experience through a two-year project in GreaterLondon and several business trips to Asia.I’m responsible for In-Memory DB Computing at the independent German OracleUser Group (DOAG) and was honored by Oracle as ACE Associate. I hold currentcertifications such as "Certified Data Vault 2.0 Practitioner (CDVP2)", "Big DataArchitect“, „Oracle Database 12c Administrator Certified Professional“, “IBMInfoSphere Change Data Capture Technical Professional”, etc.Daimler TSSContact/ConnectxingDOAGDHBWData Warehouse / DHBW3

NOT JUST AVERAGE: OUTSTANDING.As a 100% Daimler subsidiary, we give100 percent, always and never less.We love IT and pull out all the stops toaid Daimler's development with ourexpertise on its journey into the future.Our objective: We make Daimler themost innovative and digital mobilitycompany.Daimler TSS

INTERNAL IT PARTNER FOR DAIMLER Holistic solutions according to the Daimler guidelines IT strategy Security Architecture Developing and securing know-how TSS is a partner who can be trusted with sensitive dataAs subsidiary: maximum added value for Daimler Market closeness Independence Flexibility (short decision making process,ability to react quickly)Daimler TSS5

LOCATIONSDaimler TSS Germany7 locations1000 employees*Ulm (Headquarters)Daimler TSS ChinaHub Beijing10 employeesStuttgartBerlinKarlsruhe* as of August 2017Daimler TSSDaimler TSS IndiaHub Bangalore22 employeesDaimler TSS MalaysiaHub Kuala Lumpur42 employeesData Warehouse / DHBW6

WHAT YOU WILL LEARN TODAYAfter the end of this lecture you will be able toUnderstand differences in data modeling between OLTP and OLAPUnderstand why data modeling is importantUnderstand data modeling in the Core Warehouse Layer and Data MartLayer Data VaultDimensional Model / Star schemaUnderstand dimensions and factsUnderstand ROLAP & MOLAPDaimler TSSData Warehouse / DHBW7

LOGICAL STANDARD DATA WAREHOUSE ARCHITECTUREInternal data sourcesData torageLayer)FrontendAggregationLayerMart Layer(OutputLayer)(ReportingLayer)External data sourcesMetadata ManagementSecurityDWH Manager incl. MonitorDaimler TSSData Warehouse / DHBW8

DATA MODELS IN THE DWHLayerCharacteristicsStaging Layer Temporary storage Ingest of source data Core WarehouseLayerData Mart LayerDaimler TSSData Model Normally 1:1 copy of source table structure –usually without constraints and indexesHistorization / bitemporal data 3NF with historization Integration Head and Version modelling Tool-independent Data Vault Non-redundant data storage Anchor modeling Historization Dimensional model with historization (possible) Performance for end user queriesrequired, Tool-dependent Flat structures, esp. Dimensional model(ROLAP / MOLAP / HOLAP) Lots of joins necessary to answercomplex questionsData Warehouse / DHBW9

DIMENSIONAL MODELING Design technique to present data in a standard, intuitive framework Easily understandable for end usersHigh performance end user accessLogical data modelPhysical data model: Not necessarily relational, can also be stored in specialicedmulti-dimensional tools (“OLAP Cubes”) Analysis / Reporting of numerical measures (metrics) by differentattributes (context)Daimler TSSData Warehouse / DHBW10

DIMENSIONAL MODEL – IMPLEMENTATION TYPESImplementation types of dimensional modelsStar Schema Relational model (ROLAP) consists of Fact Tables Dimension TablesCube Multidimensional model (MOLAP) consists of Edges Attributes Cells Measures (facts)Daimler TSSData Warehouse / DHBW11

DIMENSIONAL MODELDimensions Are entities that contain descriptive textual attributes for analysis E.g. Car (model, manufacturer, etc), Time period (day, week, month, year)Facts Contain key numerical figures – “Measures” – “Metrics” E.g. Sales amount(for dimensions: product X in region y and time period z)Daimler TSSData Warehouse / DHBW12

DIMENSIONAL MODEL – LOGICAL VIEWMeasureCustomerDimensionFact table / CubePrice#ItemsSalesStockInventoryDaimler Data Warehouse / DHBW13

SAMPLE PRODUCT HIERARCHYDimensions can be organized in hierarchies i.e. product hierarchyDaimler TSSData Warehouse / DHBW14

HIERARCHIESOther hierarchies: Date Month/Year Quarter/Year Year Customer Company Industry City County/Landkreis State Country ContinentArbitrary number of hierarchy levelsPurpose: group and structure data enable view on data at different levels of granularity Hierarchies define aggregations on measuresDaimler TSSData Warehouse / DHBW15

ROLAP

ROLAPPhysical data structure: relational tables Advantage: can use well-engineered, reliable and high-performance database systemsand query languagesSpecial table structure Star / Snowflake Schema Dimension tables with textual attributes Fact table with measures consisting of foreign keys to dimension tablesDaimler TSSData Warehouse / DHBW17

ROLAPSpecial table structure (continued) Memory amount depends mainly on the number of facts One row per fact Size of a row approx. (#dimensions #measures) * column size Aggregated totals are computed dynamically in general Longer response timesDaimler TSSData Warehouse / DHBW18

RELATIONAL DATA MODELDimensions Relational table for each dimension like product, region, time period Primary key (surrogates) identifies each dimension element Additional fields contain descriptive information like product name E.g. Dimensions: Product, Region, Time period (day, week, month, year)Facts Relational table containing key figures – “Measures” Stores foreign keys to dimension tables The other fields contain the values of the key figures/measures E.g. Sales amount (for product X in region y and time period z)Daimler TSSData Warehouse / DHBW19

RELATIONAL MODEL: STAR SCHEMATime DimensionTime key (PK)DateDayMonthQuarterYearnBranch DimensionBranch key (PK)Branch nameDaimler TSSnProduct DimensionProduct key (PK)n Product nameSupplier NameSales FactTime key (FK)Product key (FK)Location key (FK)Branch key (FK)Sales amoutDiscountLocation DimensionLocation key (PK)Streetn CityCountryData Warehouse / DHBW20

DATA MODELS FOR HIERARCHIESDenormalized Dimensions 1 Table with all hierarchy levels Advantage: Efficient aggregations Performance Disadvantage: Complex updates if hierarchies changeDaimler TSSData Warehouse / DHBW21

DATA MODELS FOR HIERARCHIESNormalized Dimensions 1 table for each hierarchy level Advantage: Minimal updates for changesin the hierarchies Disadvantage: More complex querieswhen computing aggregations Multiple joinsDaimler TSSData Warehouse / DHBW22

RELATIONAL MODEL: SNOWFLAKE SCHEMA WITHNORMALIZED DIMENSIONSTime DimensionTime key (PK)DateDayMonthQuarterYearProduct DimensionProduct key (PK)n Product nameSupplier Key (FK)Sales FactTime key (FK)Product key (FK)Location key (FK)Branch key (FK)Sales amoutDiscountnnnSupplier DimensionSupplier key (PK)Supplier NameBranch DimensionBranch key (PK)Branch nameCountry DimensionCountry key (PK)CountryDaimler TSSnCity DimensionCity key (PK)CityCountry Key (FK)Location Dimensionn Location key (PK)nStreetCity key (FK)Data Warehouse / DHBW23

ONE OR TWO FACT TABLES?Product DimensionSales FactQuantity orderedQuantity shippedCustomer DimensionTime DimensionProduct DimensionSales FactQuantity orderedShipment FactCustomer DimensionQuantity shippedTime DimensionDaimler TSSData Warehouse / DHBW24

ONE OR TWO FACT hipped1AX100NULL1BY50NULL2AXNULL100 Reports get much more complicated to filter NULL Avg(quantity): 100 50/2 but avg(shipped): 100/1 There may be even more columns like quantity delivered orDelivery company 2 fact tablesDaimler TSSData Warehouse / DHBW25

ONE OR TWO FACT TABLES?Different processes must result into different fact tables E.g. measures at different time E.g. facts with different grainDaimler TSSData Warehouse / DHBW26

EXERCISE STAR SCHEMAThe following data model showsvehicle sales with entities Person (sales person and owner) Vehicle Production plantArchitect a Star Schema for theData Mart LayerDaimler TSSData Warehouse / DHBW27

SAMPLE SOLUTION STAR SCHEMADaimler TSSData Warehouse / DHBW28

ROLAP ENHANCEMENTSUsed for accelerating data warehouse queries in general Precomputation of aggregated values Materialized views / query tables store data physically Relational Columnar (in-memory) databasesDaimler TSSData Warehouse / DHBW29

PRECOMPUTATION OF AGGREGATED TOTALSQuery processing in the Mart Layer SQL statements can become complex, e.g. many joins SQL statements can become slow if many rows are aggregated E.g. sum of sales amount for city X AND product Y AND year 2016 compared to city XAND product Y AND year 2015 If aggregated values are stored in Fact tables, new data from the Core Warehouselayer have to be integrated into such aggregated fact tablesDaimler TSSData Warehouse / DHBW30

MATERIALIZED VIEWS/QUERY TABLESThe DBMS takes care of solving these problems The user defines views containing aggregated values for certain hierarchy levels These views are materialized as tables Update options immediate deferred When performing a query against a fact table the DB optimizer takes advantage ofthese materialized views, i.e., no special queries have to be written for this by a useror application program The user has not to rewrite the original query to use the materialized viewsDaimler TSSData Warehouse / DHBW31

MATERIALIZED VIEWS / MATERIALIZED QUERY TABLESExample statement Oracle to precompute values (similar DB2 and other RDBMS)CREATE MATERIALIZED VIEW sales aggBUILD IMMEDIATEREFRESH FASTON DEMANDASSELECT p.productname, s.city, EXTRACT(MONTH FROM s.date), sum(s.sales amount), sum(no items)FROMproduct pJOINsales s ON p.productid s.productidGROUP by p.productname, s.city, EXTRACT(MONTH FROM s.date);Daimler TSSData Warehouse / DHBW32

RELATIONAL COLUMNAR DATABASESRow-oriented storage Data of a relational table is stored row wise: values of Row 1 values of Row 2 values of Row N Column-oriented storage The values of each column are stored separately: values of Column 1 values of Column 2 values of Column M Daimler TSSData Warehouse / DHBW33

ROW AND COLUMN ORIENTED DB BLOCK iented storage1, Bush,1967, 2Schmitt,1980, 3Bush, 1993,4, Berger,1980, 5Miller,1967, 6,Bush, 1970, Column-oriented storage1, 2, 3, 4, 5,6, 7, Bush,Schmitt,Bush,Berger,34Miller,Bush,Miller, DB-Page/BlockDaimler TSSData Warehouse / DHBW

ROW VS COLUMN ORIENTED STORAGERow-oriented storage Data of one row is grouped on disk and can be retrieved through one read operationSingle values can be retrieved through efficient index and off-set computationsGood Insert, update and delete operations performance Suited for OLTP systemsDaimler TSSData Warehouse / DHBW35

ROW VS COLUMN ORIENTED STORAGEColumn-oriented storage Data-of one column is grouped on disk and can be retrieved with far less readoperations than for row-oriented storage This makes computation of aggregations much faster in particular for tables with a lotof columns In general better suited for queries involving partial table scans Bad Insert, update and delete operations performance Normally excellent compression as identical data types are stored in same blocks Products: SAP HANA, HP Vertica, Exasol, IBM DB2 BLU, Oracle In-Memory Option, SQLServer (Columnar Indexes), etc Suited for OLAP systemsDaimler TSSData Warehouse / DHBW36

HOW TO COVER DATA CHANGES IN THE MART?Data changes, e.g. new employeesemployees change departmentsemployees leavewhole department reorganisations, etcHow are the changes handled? Insert-only approach in the Core WarehouseLayer, but choices in the Mart Layer (reduce data amount to what end userneeds) What does the business want to see? (Reporting Scenarios) How is data inserted / updated in dimensions? (Slowly Changing Dimensions)Daimler TSSData Warehouse / DHBW37

REPORTING SCENARIOS As-is scenario As-of scenario As-posted scenario As-posted with comparable data scenarioDaimler TSSData Warehouse / DHBW38

Employee Dimension atabasePowellDatabaseFactsEmployee Dimension 2015DATA MART – EXAMPLE ushDatabaseOther departmentNew employeeAssumption: current year: 2016Daimler TSSData Warehouse / DHBW39

AS-IS SCENARIOFactsEmployee Dimension 2016Reporting uses current 1610Organisation#Projects 15#Projects 16DWH3030Database1020Daimler TSSEmployeeData Warehouse / DHBW40

AS-OF SCENARIOReporting uses structure as demandedEmployee Dimension 2015Factse.g. requested for tion#Projects 15#Projects 16DWH2020Database2020Daimler TSSEmployeeData Warehouse / DHBW41

AS-POSTED SCENARIOReporting uses „historical truth“Organisation#Projects 15#Projects 16DWH2030Database2020Daimler TSSData Warehouse / DHBW42

AS-POSTED WITH COMPARABLE DATA SCENARIOReporting uses „historical truth“ foridentical dimension dataOrganisation#Projects 15#Projects 16DWH2020Database1010Daimler TSSData Warehouse / DHBW43

SLOWLY CHANGING DIMENSIONSDimensions must absorb changesSlowly changing dimensions according to Kimball / Ross (2002): SCD Type 0 no changes, new data is ignored SCD Type 1 - 3 See next slides And some more SCD types Rarely relevantDaimler TSSData Warehouse / DHBW44

EmployeeDimensionSLOWLY CHANGING DIMENSIONS – EXAMPLE abaseChanges: New data added: Albert, DWH Powell marries and has new name ParkerDaimler TSSData Warehouse / DHBW45

EmployeeDimensionEmployeeDimensionSLOWLY CHANGING DIMENSION TYPE 1Daimler TSSIDEmployeeOrganisation No History1MillerDWH2PowellDatabase Dimension attributesalways contain Database3AlbertDWHChanges: New data added: Albert, DWH Powell marries and has newname ParkerData Warehouse / DHBW46

EmployeeDimensionEmployeeDimensionSLOWLY CHANGING DIMENSION TYPE 2Daimler TSSIDEmployeeOrganisation1MillerDWH2PowellDatabase Full HistorizationI EmploD yeeOrganisationValid FromValid To1 MillerDWH01.01.2015NULL2 PowellDatabase21.12.201415.10.20163 AlbertDWH05.03.2014NULL2 ParkerDatabase15.10.2016NULL Dimension containstimestamps with NULLs orfuture date like31.12.2999Changes: New data added: Albert, DWH Powell marries and has newname ParkerData Warehouse / DHBW47

EmployeeDimensionEmployeeDimensionSLOWLY CHANGING DIMENSION TYPE 3Daimler TSSIDEmployeeOrganisation1MillerDWH2PowellDatabase Historization of latestchange only And storage of currentvalueI EmployeeD NamePreviousNameOrganisationPreviousOrganisation1 MillerNULLDWHNULL2 ParkerPowellDatabaseNULL3 AlbertNULLDWHNULLChanges: New data added: Albert, DWH Powell marries and has newname ParkerData Warehouse / DHBW48

DIMENSION AND FACT TABLE TYPES Conformed dimension Junk dimension Role-Playing dimension Degenerated dimension Transactional fact Periodic fact Accumulating factDaimler TSSData Warehouse / DHBW49

DIMENSION TYPES: CONFORMED DIMENSION Dimension that is used in several fact tables Fact tables can be connected by using conformed dimensionsProduct DimensionInventoryFactSalesFactLocation DimensionDaimler TSSData Warehouse / DHBW50

DIMENSION TYPES: CONFORMED DIMENSIONKimball: Enterprise DWH Bus Matrix is a “design tool” to document theorganization’s processesDateProductLocationCustomerPromotionSales FactXXXXXInventory FactXXXCustomer Returns FactXXXSales Forecast FactXXXDaimler TSSXData Warehouse / DHBW51

DIMENSION TYPES: JUNK DIMENSIONCollection of lookup data / codes that could also form it’s own male3MarriedMale4MarriedFemaleDaimler TSSData Warehouse / DHBW52

DIMENSION TYPES: ROLE-PLAYING DIMENSIONA single dimension is referenced several times by the same fact table E.g. several dates in fact table reference Date 3.4.Daimler TSSData Warehouse / DHBW53

DIMENSION TYPES: DEGENERATED DIMENSION A dimension without own dimension table. Data are stored in the facttable only. Used e.g. for drill-through in reports E.g. OrderNumber in sales fact Daimler TSSData Warehouse / DHBW54

TYPES OF FACT TABLES - TRANSACTIONALTransactional Most commonUsually one row per line/event in a transactionMost detailed levelThe grain must (should) be the same for all rowsMeasures can usually be aggregated: “additive measure” (e.g. sum over sales amount)E.g. fact table for sales dataDaimler TSSData Warehouse / DHBW55

TYPES OF FACT TABLES – PERIODIC SNAPSHOTPeriodic snapshots Picture of the time Often computed from transactional fact table, e.g. aggregated by month Measures can usually not be aggregated (e.g. sum over inventory does not makesense as inventory is already snapshot / sum for a day) The grain must (should) be the same for all rows E.g. fact table for inventory data (summed up for each day)Daimler TSSData Warehouse / DHBW56

EXERCISE: QUERIES 1How many cabriolets (D Model.model) have beenBuilt in January and February 2016?Count01/201602/2016Assume SCD1 and no history in fact tablesDaimler TSSData Warehouse / DHBW57

EXERCISE: QUERIES 1How many cabriolets (D Model.model) have beenBuilt in January and February 2016?SELECT d.month, d.year, sum(f.count)FROMf vehicle built fJOINd model m on m.modelid f.modelidJOINd production date d on d.prod date f.prod dateWHERE m.model ‘Cabriolet‘ANDd.month IN (1, 2) AND d.year 2016GROUP BY d.month, d.yearDaimler TSSData Warehouse / DHBW58

EXERCISE: QUERIES 2How many different models (D Model.model) haveCurrently a performance of 105PS (D ENGINE.performance)?ModelCountCabrioletSUV Assume SCD1 and no history in fact tablesDaimler TSSData Warehouse / DHBW59

EXERCISE: QUERIES 2How many different models (D Model.model) haveCurrently a performance of 105PS (D ENGINE.performance)?Select m.model, sum(f.count)FROMf vehicle built fJOINd model m on m.modelid f.modelidJOINd engine e on e.engineid engineidWHERE e.performance 105GROUP BY m.modelDaimler TSSData Warehouse / DHBW60

EXERCISE: QUERIES 3How many different models (D Model.model) haveCurrently a performance of 105PS (D ENGINE.performance)?ModelCountCabrioletSUV Daimler TSSData Warehouse / DHBW61

EXERCISE: QUERIES 3How many different models (D Model.model) haveCurrently a performance of 105PS (D ENGINE.performance)?CREATE VIEW v vehicle sat asSELECT h vehicle key, max(loaddate), modelFROMs vehicle baseGROUP BY h vehicle key;CREATE VIEW v engine sat asSELECT h engine key, max(loaddate), performanceFROMs engineGROUP BY h engine key;Daimler TSSData Warehouse / DHBW62

EXERCISE: QUERIES 3How many different models (D Model.model) haveCurrently a performance of 105PS (D UPDaimler TSSmodel, count(*)v vehicle sat vl plugged into l ON l.h vehicle key v.h vehicle keyv engine sat e ON l.h engine key e.h engine keys engine s ON s.h engine key e.h engine keyAND s.loaddate e.loaddateMany other solutions possible, e.g. using with clauses.performance 105instead of views or using window functions – allby model;depending from DB vendor/versionData Warehouse / DHBW63

MOLAP

MULTIDIMENSIONAL DATA MODELEdges of a cube (“Dimension”) Attributes like Product, Region, Time period (day, week, month, year)Cells of a cube (“Measures”) Key Figures (i.e. sales amount, profit) – “measures” For every combination of attribute values one value of each key figure, e.g. Salesamount for product X in region y and time period z Can be NULL and is stored as empty cellDaimler TSSData Warehouse / DHBW65

MOLAP - MULTIDIMENSIONAL DATABASESA database specially designed to handle the organization of data in multipledimensions Good for DWH requirements only but not generally suited like a relational DBMS E.g. IBM Cognos TM1, Oracle Essbase, Microsoft Analysis Services, Oracle OLAPOption, IBM Cognos PowerplayHolds data cells in blocks that constitute a virtual cubeOptimized to handle numeric data Aggregated totals often precalculated Not intended for textual dataDaimler TSSData Warehouse / DHBW66

MULTIDIMENSIONAL STORAGELinearization of the cells in a cube into a one-dimensional arrayMemory amount: #(dim1) x #(dim2) x . x #(dimN) Depends on the number of dimensions and their cardinality, not on thenumber of factsExample: Cube with 2 dimensions with 3 and 1 dimension with 2 elements Memory amount size 3*3*2 18 cells The numbers in the cube cells indicate the position inthe arrayDaimler TSSData Warehouse / DHBW67

EXAMPLECube with 3 dimensions Product – 4 values – p1, p2, p3, p4 Store – 3 values – s1, s2, s3 Time (year) – 2 values - y1, y2Number of cells in the cube: 4 x 3 x 2 24Daimler TSSData Warehouse / DHBW68

EXAMPLESales in year y2Daimler TSSData Warehouse / DHBW69

EXAMPLESales of store s1 in year y2Daimler TSSData Warehouse / DHBW70

EXAMPLESales of product p2 in year y1Daimler TSSData Warehouse / DHBW71

ROLL-UP & DRILL-DOWNDaimler TSSData Warehouse / DHBW72

MDX - OLAP QUERY LANGUAGEROLAP SQL is standard languageMOLAP MDX - Multidimensional Expressions De-facto industry standard developed by MicrosoftVery complexSQL like syntaxLanguage elements Scalar – data type „string“ or „number“ Dimension, Hierarchy, Level, Member Daimler TSSData Warehouse / DHBW73

MDX SAMPLE QUERYStore SalesSELECT{ [Measures].[Store Sales] } ON COLUMNS,{ [Date].[2002], [Date].[2003] } ON ROWS200295863,66200399764,01FROM SalesWHERE ( [Store].[USA].[CA] )This query defines the following result set information: The SELECT clause sets the query axes as the Store Sales (amount) member and the2002 and 2003 members of the Date dimension The FROM clause indicates that the data source is the Sales cube The WHERE clause defines the "slicer axis" for member California of Store dimensionDaimler TSSData Warehouse / DHBW74

MOLAP - ROLAPMOLAPROLAPDatabase typeMultidimensionalRelationalData storageSpecial storage engines for cube dataStar schema – special relational data modelSize100s of Gigabytes10s of TerabytesQuery languageMDXSQLDaimler TSSData Warehouse / DHBW75

MOLAP - ROLAPAdvantagesMOLAPROLAP special database products optimizedfor multidimensional analysisshort response times, e.g. no joinssuitable storage schema and queryprocessing for multidimensional data can use existing, well established DBMSeasy data import, updateuser access, backup, securitymechanisms from DBMS can be usedproblems with sparsity (ratio occupied/ not occupied cells): "null" is stored ina field with same length as any valuelimited data volume: 5-6 dimensionscube data read-only accessible only forend usersexpensive update operation Complex SQL queries for processing OLAPrequests longer response times(solution: Materialized Views and Inmemory columnar databases) Disadvantages Daimler TSSData Warehouse / DHBW76

HOLAP – HYBRID OLAPCombines the advantages or ROLAP and MOLAPRelational DBMS for storage of sparse, historic data Data of highest granularity levelMultidimensional DBMS for efficient storage of dense data cubes Multidimensional cache for aggregated totalsComplex architecture and maintenance processesNo uniform OLAP query processingDaimler TSSData Warehouse / DHBW77

EXERCISE: OLAPThe following is a data model used by a supermarket chain to analyze theirbusiness:Daimler TSSData Warehouse / DHBW78

EXERCISE: OLAPWith each transaction, an average of 20 different articles are bought.The data warehouse collects sales transactions data over 2 years.There are 1000 stores with 2000 transactions per store and day.Questions: 1. What are the columns of the ROLAP fact table? 2. How many records are stored in the fact table? 3. What is the size of the cube (number of cells) that stores the aggregated values atthe most detailed level? 4. Compute the respective cube sizes for the other 3 (higher) hierarchy levels.Daimler TSSData Warehouse / DHBW79

EXERCISE: OLAP1. What are the columns of the ROLAP fact table? Trans. No. (FK to dimension)Date (FK to dimension)Location (FK to dimension)Article (FK to dimension)No of articles (measure) and Article Price (measure)2. How many records are stored in the fact table? One record per transaction and article (with quantity and price) 2 years * 365 days/year * 1000 stores * 2000 transactions/(store*day)* 20articles/transaction 29.200.000.000 articles/recordsDaimler TSSData Warehouse / DHBW80

EXERCISE: OLAP3. What is the size of the cube (number of cells) that stores the aggregatedvalues at the most detailed level? 2 years * 365 [days]/year * 2000 [transactions] * 1000 [stores] * 50000 [articles] 73.000.000.000.000 cells4. Compute the respective cube sizes for the other 3 hierarchy levels. Level 2: 2 years * 12 [months]/year * 500 [cities] * 2000 [product groups] 24.000.000 cells Level 3: 2 years * 4 [quarters]/year * 20 [regions] * 200 [product categories] 32.000 cells Level 4: 2 [years] * 5 [regions] * 10 [product departments] 100 cellsDaimler TSSData Warehouse / DHBW81

SUMMARY Data modeling in the Core Warehouse Layer Choices like Data Vault Data modeling in the Mart Layer Dimensional Modeling ROLAP (Star Schema with fact and dimension tables) MOLAP (Cubes)Daimler TSSData Warehouse / DHBW82

EXERCISE - RECAPTURE DATA MODELING Recapture data modeling topics Which topics do you remember or do you find important? Write down 1-2 topics on stick-it cards.Daimler TSSData Warehouse / DHBW83

THANK YOUDaimler TSS GmbHWilhelm-Runge-Straße 11, 89081 Ulm / Telefon 49 731 505-06 / Fax 49 731 505-65 99tss@daimler.com / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSSDomicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen BäuerleDaimler TSSData Warehouse / DHBW84

OLAP – 12 CRITERIA BY CODDOnLine Analytical Processing Term introduced by E. Codd in 1993 in a white paper for Arbor Essbase 12 criteria for OLAP systems like Multi-dimensionality Transparency Constant response-times Multi-user support Flexible definition of reports No limits on dimensions and hierarchy levelsDaimler TSSData Warehouse / DHBW85

OLAP – FASMI CRITERIAFASMI – Fast Analysis of Shared Multidimensional InformationCriteria by Pendse/Creeth (1995) Fast maximum response time for regular queries 5 seconds and complex queries notmore 20 seconds Analysis intuitive analysis, easy/no programming flexible: queries may contain arbitrary computationsDaimler TSSData Warehouse / DHBW86

OLAP – FASMI CRITERIA Shared Multi user capable: Shared usage and access control Multidimensional Multidimenional view on the data regardless of the underlying data model Full support of hierarchies Information User must be able to get all data without restrictions by the used OLAP system, norestriction in regards to scalabilityDaimler TSSData Warehouse / DHBW87

ELEMENTS OF SCALE: COMPOSING AND SCALING DATAPLATFORMS (BEN STOPFORD) Sequential operations are best Sequential operations can be predicted Random operations are the main challenge Append-only journal leads to sequential IO But what about updates (in place)? Indexes speed up read random IO read performance but not random IO writeperformanceSource: scale-composing-and-scaling-data-platforms/Daimler TSSData Warehouse / DHBW88

ELEMENTS OF SCALE: COMPOSING AND SCALING DATAPLATFORMS (BEN STOPFORD)Source: scale-composing-and-scaling-data-platforms/Daimler TSSData Warehouse / DHBW89

MULTIDIMENSIONAL OPERATIONS - SELECTIONSelectionDef

Understand differences in data modeling between OLTP and OLAP Understand why data modeling is important Understand data modeling in the Core Warehouse Layer and Data Mart Layer Data Vault Dimensional Model / Star schema Understand dimensions and facts Understand ROLAP & MOLAP WHAT YOU WILL LEARN TODAY Daimler TSS Data Warehouse / DHBW 7