Lecture @Dhbw: Data Warehouse Part I: Introduction To Dwh And Dwh .

Transcription

A company of Daimler AGLECTURE @DHBW: DATA WAREHOUSEPART I: INTRODUCTION TO DWH ANDDWH ARCHITECTUREANDREAS BUCKENHOFER, DAIMLER TSS

ABOUT MEAndreas enior DB hemen/datenbank/in-memory/Since 2009 at Daimler TSSDepartment: Big DataBusiness Unit: Analyticshttp://wwwlehre.dhbw-stuttgart.de/ buckenhofer/https://www.xing.com/profile/Andreas Buckenhofer2

NOT JUST AVERAGE: OUTSTANDING.As a 100% Daimler subsidiary, we give100 percent, always and never less.We love IT and pull out all the stops toaid Daimler's development with ourexpertise on its journey into the future.Our objective: We make Daimler themost innovative and digital mobilitycompany.Daimler TSS

INTERNAL IT PARTNER FOR DAIMLER Holistic solutions according to the Daimler guidelines IT strategy Security Architecture Developing and securing know-how TSS is a partner who can be trusted with sensitive dataAs subsidiary: maximum added value for Daimler Market closeness Independence Flexibility (short decision making process,ability to react quickly)Daimler TSS4

LOCATIONSDaimler TSS Germany7 locations1000 employees*Ulm (Headquarters)Daimler TSS ChinaHub Beijing10 employeesStuttgartBerlinKarlsruhe* as of August 2017Daimler TSSDaimler TSS IndiaHub Bangalore22 employeesDaimler TSS MalaysiaHub Kuala Lumpur42 employeesData Warehouse / DHBW5

DWH, BIG DATA, DATA MININGThis lecture is about the classical DWH 8 sessionsMr. Bollinger’s/Roth's lecture is about Data MiningDaimler TSSData Warehouse / DHBW6

DWH LECTURE - LEARNING TARGETS Describe different DWH architectures Explain DWH data modeling methods and design logical models Name DB techniques that are well-suited for DWHs Explain ETL processes Specify reporting & project management & meta data requirements Name current DWH trendsDaimler TSSData Warehouse / DHBW7

OVERVIEW OF THE LECTUREIntroduction to Data Warehouse 15.02.2018DWH Architectures, Data Modeling 01.03.2018Data Modeling, OLAP 08.03.2018OLAP, ETL 15.03.2018ETL, Metadata, DWH Projects 22.03.2018DWH Projects, Advanced Topics 29.03.2018/05.04.2018/12.04.2018Daimler TSSData Warehouse / DHBW8

ABOUT THE LECTUREStructure of the lecture Review of the preceding lecturePresentation of contentGroup tasks, exercisesComprehensive case study15:45 – 18:00 1x15min breakDaimler TSSData Warehouse / DHBW9

COURSE MATERIAL Download slides from (theory)http://wwwlehre.dhbw-stuttgart.de/ buckenhofer/ Additional material for case study (practice)https://github.com/abuckenhofer/dwh courseDaimler TSSData Warehouse / DHBW10

HOW TO CONTACT THE COURSE?Who is the class representative? Please send me an email so that I have your contact dataDo you have a class email address?Daimler TSSData Warehouse / DHBW11

WHAT YOU WILL LEARN TODAYData Warehousing is a major topic of computer scienceAfter the end of this lecture you will be able to Understand the basic business and technology drivers for data warehousingDescribe the characteristics of a data warehouseDescribe the differences between production and data warehouse systems Understand logical standard DWH architecture Describe different layers and their meaning Describe advantages and disadvantages of further DWH architecturesDaimler TSSData Warehouse / DHBW12

MANY EMPLOYMENT OPPORTUNITIESDWH department in every (bigger) end user company, also in many mediumsized or small-sized companiesDWH department in every (bigger) consulting companyDWH-only specialized consulting companiesDWH tool vendorsDaimler TSSData Warehouse / DHBW13

MANY EMPLOYMENT OPPORTUNITIES – CHALLENGING JOBREQUIREMENTSDWHs are complex, much more complex compared to most OLTP systemsChallenging job profiles with comprehensive requirements Data ArchitectureData Integration / ETLData Modeling (not only 3NF)Data VisualizationData QualityData SecurityRequirements EngineeringProject ManagementDaimler TSSData Warehouse / DHBW14

JOB DESCRIPTION EXAMPLESDaimler TSSData Warehouse / DHBW15

JOB DESCRIPTION EXAMPLESDaimler TSSData Warehouse / DHBW16

JOB DESCRIPTION EXAMPLESDaimler TSSData Warehouse / DHBW17

DATA WAREHOUSE (DWH) ORBUSINESS INTELLIGENCE (BI)?Often used as synonymDWH more technical focusBI more business / process focus “Business intelligence is a set of methodologies, processes, architectures, andtechnologies that transform raw data into meaningful and useful information used toenable more effective strategic, tactical, and operational insights and decisionmaking.” (Boris Evelson, Forrester Research, 2008)Daimler TSSData Warehouse / DHBW18

INFORMATION TECHNOLOGY (1960’IES – 80‘IES)Many systems throughout the enterprises for dedicated purposes Support daily transactions / day-to-day business Target: replace manual and time consuming activitiesData embedded in process-specific application Process-orientation dedicated purposeCustomer data, order data, etc. spread over many systems in manyvariations and with contradictionsDaimler TSSData Warehouse / DHBW19

SAMPLE APPLICATIONS FOR AN AIRLINESeatsFlight Reservation SystemCustomer dataPlanesInternal Human Ressources SystemCrewsCRM System, e.g. campaignsCustomer dataInventory Purchasing SystemsPlanesSeatsFood / DrinksOperational PlanningPlanesCrewsSeatsMaintenance TrackingPlanesAirline Frequent Flyer SystemCustomer dataDaimler TSSBilling SystemCustomer dataData Warehouse / DHBW20

NEED FOR DECISION SUPPORT SYSTEM / MANAGEMENTINFORMATION SYSTEMSeatsFlight Reservation SystemCustomer dataPlanesInternal Human Ressources SystemCrewsCRM System, e.g. campaignsCustomer dataInventory Purchasing SystemsPlanesSeatsMaintenance TrackingPlanesAirline Frequent Flyer SystemCustomer dataDaimler TSSDCS / MISSeatsFood / DrinksOperational PlanningPlanesCrewsBilling SystemCustomer dataData Warehouse / DHBW21

EARLY DECISION SUPPORT SYSTEMS (1960’IES – 80‘IES)Can be characterized as “Unplanned decision support” or “UnplannedManagement Information Systems (MIS)” Management needs reports / combined data from different systems to make decisionsfor company Reports are manually written by IT people Extract, combine, accumulate data Can take several days to write report and to get the data Error prone and labour-intensive Relevant information may be forgotten or combined in a wrong wayDid not really workDaimler TSSData Warehouse / DHBW22

INFORMATION TECHNOLOGY TODAYFURTHER REQUIREMENTSData still spread across many applications, but additional requirementsData as Asset, getting more and more important also in productionindustries Not only classical data-intensive companies like Google or FacebookIncreasing interest e.g. in insurance, health care, automotive, Connected cars, Smart Home, Tailor-made insurances, etc.Hype technologies New databases technologies like NoSQL and Big DataDWH still booming with additional stimuli coming from Big Data, Digitization,Internet Of Things IOT, Industry 4.0, Real Time, Time To Market, etc.Daimler TSSData Warehouse / DHBW23

EXERCISE – OLTP SYSTEMSOLTP: ONLINE TRANSACTIONAL PROCESSINGOutline at least 5 operational systems for a vehicle manufacturer which data is stored by these systems characterize which operations are performed by them which questions can be answered by these systems (and which questionscan not be answered major problems for decision support)Daimler TSSData Warehouse / DHBW24

SAMPLE OLTP SYSTEMSVehicleWorkerVehicle Financial ServicesCustomerRepair dataVehicleCar rentalsBookingPartWorkshopRouteSupplierDaimler TSSCustomerRouteDiagnosticdataVehicleVehicle SalesParts LogisticsPlantPartsSellerProductiondateData Warehouse / DHBW25

SAMPLE OLTP SYSTEMSTruckDriverTruck fleet managementEngineerVehicleEngineering, Research and developmentDaimler TSSInteriorWebsite and Car configuratorCRM LeadRoutePrototypeVehicleTests etc Data Warehouse / DHBW26

CHALLENGEHow to get an overall viewacross OLTP applications / functions that works?Daimler TSSData Warehouse / DHBW27

MAJOR PROBLEMS FOR EFFECTIVE DECISION SUPPORTDistributed dataDifferent data structuresHistoric dataSystem workloadInadequate technologyDaimler TSSData Warehouse / DHBW28

DISTRIBUTED DATAProblem: Data resides on different systems / storages different applications different technologiesSolution: Data has to be accumulated on one system for further analysis Data is inhomogeneous, e.g. each system has their own customer number or ordernumber, etc. How to combine the data? Data must be ingested regularly, e.g. daily and not ad-hocDaimler TSSData Warehouse / DHBW29

DIFFERENT DATA STRUCTURESProblem: Systems developed independently from each other Different data types E.g.: zip-code as integer or character string Different encodings E.g.: kilometer or miles Different data modeling E.g.: last name / first name in different fields vs last name / first name (badlymodelled) in one single fieldSolution: Dedicated system required that harmonizes / standardizes the dataDaimler TSSData Warehouse / DHBW30

ISSUES WITH HISTORIC DATAProblem: Data is updated and deleted or archived after max. 3 months daily transactions produce lots of data limited size of storage high amounts of data fill up systemsHistoric data is required for decision support e.g. how did sales figures develop compared to last month / last year /etc.Solution: All data (changes) have to be stored in a system capable of dealingwith huge amounts of dataDaimler TSSData Warehouse / DHBW31

ISSUES WITH SYSTEM WORKLOADProblem: Performance not optimized for new workloads Systems stressed by additional load (due to reports)Not optimized for this kind of workloadPerformance of daily transaction business jeopardizedMay possibly lead to system failure!Imagine what happens if a system like Amazon gets slowSolution: Dedicated system that handles complex (arithmetic) queries onhuge amounts of data. A system that is optimized for that kind of workloads.Daimler TSSData Warehouse / DHBW32

INADEQUATE TECHNOLOGYProblem: Tooling and technology different from OLTP Inadequate tools for data integration and analysis Infrastructure configured for OLTP transactions and not for DWH load Storage systems and processors to weak to fulfill the requirementsSolution: Standard Tools and technology that help to increase productivityand solve such problems, e.g. Reporting Tools for Data Analysis or ETL toolsfor Data ingestion/loadDaimler TSSData Warehouse / DHBW33

TEXTUAL AND VISUAL REPORTSDaimler TSSData Warehouse / DHBW34

VISUAL DATA INTEGRATION TOOLDaimler TSSData Warehouse / DHBW35

CHALLENGEHow to get an overall viewacross OLTP applications / functions that works?Daimler TSSData Warehouse / DHBW36

CONCLUSIONOperative systems not suitable for analytical evaluationsNeed for a new, separated system fast answers, ad-hoc questions possible no interference with daily transaction business Data WarehouseDaimler TSSData Warehouse / DHBW37

EXERCISEList possible (functional and non-functional) requirements for a datawarehouse end-user. Think of deficiencies of transactional systems like Distributed data Different data structures Problem with historic data Problem with system workload Inadequate technologyWhat are requirements from a Data Warehouse user perspective? (List atleast 5 requirements)Daimler TSSData Warehouse / DHBW38

DATA WAREHOUSE USER Wants to trust the data: quality assured data Wants to access and analyze all data in a single database Wants to get a complete analysis including history, e.g. where did thecustomer live 5 years ago or how did bookings develop the last 10 days? Wants fast data access for his queries Wants to understand the data model one single and easy data modeland not many different applications Wants to browse through combined data sets to identify correlations ornew insightsDaimler TSSData Warehouse / DHBW39

DATA WAREHOUSEContains data from different systemsImports data from different systems on a regular basis detailed data and summarized data provide historic data generate metadataOLTP applications remain, DWH is a completely new systemOvercomes difficulties when using existing transaction systems for thosetasksDaimler TSSData Warehouse / DHBW40

DATA WAREHOUSENot a product, but a overall concept / architectureApplications come, applications go. The data, however, lives forever. It is notabout building applications; it really is about the data underneath theseapplication (Tom Kyte)Daimler TSSData Warehouse / DHBW41

FIRST DWH ARCHITECTUREBY DEVLIN/MURPHY (1988)Source: http://www.9sight.com/pdfs/EBIS Devlin & Murphy is/Daimler TSSData Warehouse / DHBW42

HIGH-LEVEL DATA WAREHOUSE ARCHITECTUREData er TSSData Warehouse / DHBW43

DATA WAREHOUSE DEFINITIONSBY TWO “FATHERS” OF THE DWHRalph KimballWilliam Harvey „Bill“ Inmon„A data warehouse is a copy oftransaction data specificallystructured for querying andreporting““A data warehouse is a subjectoriented, integrated, timevariant, nonvolatile collection ofdata in support ofmanagement’sdecision-makingprocess”Daimler TSSData Warehouse / DHBW44

SUBJECT-ORIENTEDA data warehouse is organized around the major subjects (business entities)of the enterprise like CustomerVendorCarTransaction or activityIn contrast to the application/process/functional orientation such as Booking application Delivery handlingDaimler TSSData Warehouse / DHBW45

SUBJECT-ORIENTED - EXAMPLEDWHOLTPPassengersPlanesFlight Reservation SystemBookingsCrewsFlight Operation SystemPlanesCustomerAirline Frequent Flyer SystemCustomerPlanesMarketing:Planning:Which areHow many flightpopularkilometers anddestinations,flight times doe.g. Paris andplanes have.make theWhen does acustomer anplane needexclusive offer.maintenance?Capacity planning:What is a forecastedpassenger demand for flightsto London? Is a larger planerequired on the route?PointsDaimler TSSData Warehouse / DHBW46

INTEGRATEDData contained in the warehouse are integrated.Aspects of integration consistent naming conventionsconsistent measurement of variablesconsistent encoding structuresconsistent physical attributes of dataDaimler TSSData Warehouse / DHBW47

INTEGRATED - EXAMPLEDWHOLTPSystem1: m,wSystem2: male, femaleSystem3: 1, 0System1: John BrownSystem2: Brown, J.System3: Brown, JoSystem1: Varchar(5)System2: Number(8)System3: Char(12)Daimler TSSm,wJohnBrownVarchar(12)Data Warehouse / DHBW48

NONVOLATILEOperations in operational environment InsertDeleteUpdateSelectOperations in a data warehouse Insert: the initial and additional loading of data by (batch) processes Select: the access of data (almost) no updates and deletes (technical updates / deletes only)Daimler TSSData Warehouse / DHBW49

NONVOLATILE - EXAMPLEOLTPDWHFlight Reservation SystemPassenger John flies fromStuttgart to London on 15.02at 06:00Insert into DB:Passenger John, From Stuttgart to London,15.02. 06:00Insert into DB:Passenger John, From Stuttgart to London,15.02. 06:00Passenger John changes hismind and flies at 10:00Update in DB:Passenger John, 15.02. 10:00Daimler TSSInsert into DB:Passenger John, From Stuttgart to London,15.02. 10:00Data Warehouse / DHBW50

NONVOLATILE - EXAMPLEWhat happens in the OLTP system if the customer cancels his booking? Delete operation in OLTP Seat gets available again and can be sold to another passengerWhat happens in the DWH? Insert operation in DWH with e.g. a flag indicating that the customercancelled/deleted his booking Business can make analysis about cancelled booking: why might thecustomer have cancelled? How to prevent the customer or other customersto cancel next time?Daimler TSSData Warehouse / DHBW51

TIME-VARIANTAll data in the data warehouse is accurate as of some moment in time Has to be associated with a time stampOnce data is correctly recorded in the data warehouse, it cannot be updatedor deleted Data warehouse data is, for all practical purposes, a long series of snapshotsIn the operational environment data is accurate as of the moment of accessOperational data, being accurate as of the moment of access, can beupdated as the need arisesDaimler TSSData Warehouse / DHBW52

TIME-VARIANT - EXAMPLEDWHInsert into DB:Passenger John, From Stuttgart to London,15.02. 06:00DB insert timestamp: 02.02. 15:03:21Insert into DB:Passenger Jim, From Hamburg to Munich,18.02. 15:00DB insert timestamp: 02.02. 15:04:29Insert into DB:Passenger John, From Stuttgart to London,15.02. 10:00DB insert timestamp: 05.02. 12:15:03Insert into DB:Passenger Mike, From Hamburg to Munich,15.02. 10:00DB insert timestamp: 05.02. 12:15:11Insert into DB:Passenger John, From Stuttgart to London,15.02. 10:00, Cancel FlagDB insert timestamp: 08.02. 09:52:33Daimler TSSData Warehouse / DHBW53

EXERCISE - DWHYou outlined OLTP systems for a vehicle manufacturer in an earlier exercise.Now start designing a Data Warehouse: Describe what data can be stored in it. Define at least 5 subject-areas! Which questions can/should be answered with this informationDaimler TSSData Warehouse / DHBW54

DWH – SUBJECT AREASDriverCRM lePrototypeGPS dataCar RentalRental starttimeFormula-1carBillRental endtimeLocationPlantCars builtDaimler TSSData Warehouse / DHBW55

EXERCISE – SAMPLE QUESTIONSWhich customers own a car and use car rental regularly?Which parts have the most defects? Can diagnostic data be used to predictpotential defects and warn customers?Which areas and times are popular for car rentals? Does it make sense torelocate cars to these areas? (e.g. cinema in the evening/night)Daimler TSSData Warehouse / DHBW56

OLTP VS OLAPOPERATIONAL SYSTEM VS DWHOnline Transaction ProcessingOnline Analytical ProcessingTransaction-oriented systemQuery-oriented systemOptimized for insert and update consistencyOptimized for complex queries with shortresponse times; ad-hoc queriesMany users change dataOnly ETL process writes dataSelective queries on the dataEvaluations of all data including history(complex queries)Avoid redundancyRedundant data storageNormalized data management 3NFDe-normalized data managementRelational Data ModelingSeveral layers with different data models, onemodel usually Dimensional Data ModelingDaimler TSSData Warehouse / DHBW57

OPERATIVE VS INTEGRATED DATAOperative dataIntegrated dataHandlingStructured, parallel processes with Information for management (decisionshort and isolated ("atomic")support)transactionsModelingProcess- and function oriented,individual for each applicationDifferent data models in one DWH;historic, stable and summarized, data# usersManyFew(er) but increasing user baseSystem return timeMillisecondsSeconds to minutes (even hours)Daimler TSSData Warehouse / DHBW58

OPERATIVE VS ANALYTICAL DATABASESOperative DBsAnalytical DBsPurposeProcessing of daily businesstransactionsInformation for management (decisionsupport)ContentDetailed, complete, most recentdataHistoric, stable and summarized dataData amountSmall amount of data pertransaction. Nested Loop JoinsLarge amount of data for load, andoften per query. Hash Joins commonData structureSuitable for operationaltransactionsSeveral models; suitable for long termstorage and business analysesTransactionsACID; very short read/writetransactionsLong load operations, longer readtransactionsDaimler TSSData Warehouse / DHBW59

SUMMARYWhich challenges could not be solved by OLTP? Why is a DWH necessary? Integrated view, distributed data, historic data, technological challenges,system workload, different data structuresName two “fathers” of the DWH Bill Inmon and Ralph KimballWhich characteristics does a DWH have according to Bill Inmon? Subject-oriented, integrated, non-volatile, time-variantDaimler TSSData Warehouse / DHBW60

DWH ARCHITECTURE

PURPOSE: WHY ARE DWH ARCHITECTURES USEFUL? Specific implementation can follow an architecture Architecture describes an ideal type. Therefore an implementation may not useall components or can combine components Better understanding, overview and complexity reduction bydecomposing a DWH into its components Can be used in many projects: repeatable, standardizable Map DWH tools into the different components and compare functionality Functional oriented as it describes data and control flowDaimler TSSData Warehouse / DHBW62

EXAMPLES OF DATA WAREHOUSES IN THE INDUSTRYApple: multiple Petabytes Customer insights: who’s who and what are the customers up toWalmart: 300TB (2003), several PB today It tells suppliers, “You have three feet of shelf space. Optimize it.”eBay: 10PB, 100s of production DBs fed in Get better understanding of customersMost DWHs are much smaller though. For huge and small DWHs: Highchallenges to architect develop maintain run such complex uses-youve-ever-seen/ and -data-warehouses/Daimler TSSData Warehouse / DHBW63

LOGICAL STANDARD DATA WAREHOUSE ARCHITECTUREInternal data sourcesData torageLayer)FrontendAggregationLayerMart Layer(OutputLayer)(ReportingLayer)External data sourcesMetadata ManagementSecurityDWH Manager incl. MonitorDaimler TSSData Warehouse / DHBW64

DATA SOURCES Providing internal and external data out of the source systems Enabling data through Push (source is generating extracts) or Pull (BI DataBackend is requesting or directly accessing data) Example for Push practice (deliver csv or text data through file interface; ChangeData Capture (CDC)) Example for Pull practice (direct access to the source system via ODBC, JDBC, APIand so on)Daimler TSSData Warehouse / DHBW65

STAGING LAYER “Landing Zone” for data coming into a DWH Purpose is to increase speed into DWH and decouple source and targetsystem (repeating extraction run, additional delivery) Granular data (no pre-aggregation or filtering in the Data Source Layer, i.e.the source system) Usually not persistent, therefore regular housekeeping is necessary (forinstance delete data in this layer that is few days/weeks old or – morecommon - if a correct upload to Core Warehouse Layer is ensured) Tables have no referential integrity constraints, columns often varcharDaimler TSSData Warehouse / DHBW66

INTEGRATION LAYER Business Rules, harmonization and standardization of data Classical Layer for transformations: ETL Extract – TRANSFORM – Load Fixing data quality issues Usually not persistent, therefore regular housekeeping is necessary (forinstance after a few days or weeks or at the latest once a correct uploadto Core Warehouse Layer is ensured) The component is often not required or often not a physical part of a DBDaimler TSSData Warehouse / DHBW67

CORE WAREHOUSE LAYER Data storage in an integrated, consolidated, consistent and nonredundant (normalized) data model Contains enterprise-wide data organized around multiple subject-areas Application / Reporting neutral data storage on the most detailed level ofgranularity (incl. historic data) Size of database can be several TB and can grow rapidly due to datahistorization Write-optimised layerDaimler TSSData Warehouse / DHBW68

AGGREGATION LAYER Preparing data for the Data Mart Layer to the required granularity E.g. Aggregating daily data to monthly summaries E.g. Filtering data (just last 2 years or just data for a specific region) Harmonize computation of key performance indicators (measures) andadditional Business Rules The component is often not required or often not a physical part of a DBDaimler TSSData Warehouse / DHBW69

DATA MART LAYER Read-optimised layer: Data is stored in a denormalized data model forperformance reasons and better end user usability/understanding The Data Mart Layer is providing typically aggregated data or data withless history (e.g. latest years only) in a denormalized data model Created through filtering or aggregating the Core Warehouse Layer One Mart ideally represents one subject area Technically the Data Mart Layer can also be a part of an AnalyticalFrontend product (such as Qlik, Tableau, or IBM Cognos TM1) and neednot to be stored in a relational databaseDaimler TSSData Warehouse / DHBW70

METADATA MANAGEMENT, SECURITY, MONITORMetadata Management “Data about Data”, separate lectureSecurity Not all users are allowed to see all data Data security classification (e.g. restricted, confidential, secret)DWH Manager incl. Monitor DWH Manager initiates, controls, and checks job execution Monitor identifies changes/new data from source systems, separate lectureDaimler TSSData Warehouse / DHBW71

EXERCISE: CLASSICAL DWH ARCHITECTURESThe nces-of-opinion/compares THE two classic DWH architectures.Read the paper and complete the table / questions on the next slide.(Caution: The paper is biased / favors one approach; you may want to readother/more papers for a neutral view.)Daimler TSSData Warehouse / DHBW72

EXERCISE: CLASSICAL DWH ARCHITECTURESHow are the approaches called?Who “invented” the approach?How many layers are used and howare the layers called?Which data modeling approachesare used in which layer?In which layer are atomic detail datastored?In which layer are aggregated /summary data stored?List at least 2 advantagesList at least 2 disadvantagesDaimler TSSData Warehouse / DHBW73

EXERCISE: CLASSICAL DWH ARCHITECTURESHow are the approachescalled?Kimball Bus ArchitectureCorporate Information FactoryWho “invented” theapproach? Ralph Kimball Bill InmonHow many layers are usedand how are the layerscalled? Data StagingDimensional Data Warehouse Data AcquisitionNormalized Data WarehouseData Delivery / Dimensional MartWhich data modelingapproaches are used in whichlayer? Data Staging: variable,corresponds to source systemDimensional Data Warehouse:Dimensional Model Data Acquisition: variable, correspondsto source systemNormalized Data Warehouse: 3NFData Delivery: Dimensional ModelIn which layer are atomicdetail data stored? Dimensional Data Warehouse Normalized Data WarehouseIn which layer are aggregated/ summary data stored? Dimensional Data Warehouse Data Delivery / Dimensional MartDaimler TSS Data Warehouse / DHBW74

EXERCISE: CLASSICAL DWH ARCHITECTURESAdvantagesKimball Bus ArchitectureCorporate Information Factory Two layers only mean faster developmentand less workRather simple approach to make data fastand easily accessibleLower startup costs (but highersubsequent development costs) If table structures change (instable sourcesystems), high effort to implement thechanges and reload data, especiallyconformed dimensions (“Dimensionitis”desease)Non-metric data not optimal fordimensional modelDimensional model (esp. Star Schema)contains data redundancy Disadvantages Daimler TSS Separation of concerns: long-term enterprisedata storage separated from data presentationChanges in requirements and scope are easierto manageLower subsequent development costs (buthigher startup costs)Data model transformations from 3NF toDimensional model requiredMore complex as two different data modelsare requiredLarger team(s) of specialists requiredData Warehouse / DHBW75

OTHER ARCHITECTURES Kimball Bus Architecture (Central data warehouse based on data marts) Inmon Corporate Information Factory Data Vault 2.0 Architecture (Dan Linstedt) DW 2.0: The Architecture for the Next Generation of Data Warehousing Virtual Data Warehouse Operational Data Store (ODS)Daimler TSSData Warehouse / DHBW76

KIMBALL BUS ARCHITECTURE (CENTRAL DATA WAREHOUSEBASED ON DATA MARTS)Source: -opinion/Daimler TSSData Warehouse / DHBW77

KIMBALL BUS ARCHITECTURE (CENTRAL DATA WAREHOUSEBASED ON DATA MARTS)Data WarehouseInternal data sourcesBackendCore Warehouse Layer Mart LayerOLTPOLTPFrontendStaging Layer(Input Layer)Data Mart 1Data Mart 2More Businessprocess orientedthan leData Mart 3External data sourcesMetadata ManagementSecurityDWH Manager incl. MonitorDaimler TSSData Warehouse / DHBW78

KIMBALL BUS ARCHITECTURE (CENTRAL DATA WAREHOUSEBASED ON DATA MARTS) Bottom-up approach Dimensional model with denormalized data Sum of the data marts constitute the Enterprise DWH Enterprise Service Bus / conformed dimensions for integration purposes (don’t confuse with ESB as middleware/communication system between applications) Kimball describes that agreeing on conformed dimensions is a hard joband it’s expected that the team will get stuck from time to time trying toalign the incompatible original vocabularies of different groups Daimler TSSData marts need to be redesigned if incompatibilities existData Warehouse / DHBW79

Core WarehouseLayerDATA INTEGRATION WITH AND WITHOUT COREWAREHOUSE LAYERDaimler TSSData Warehouse / DHBW80

INMON CORPORATE INFORMATION FACTORYSource: -opinion/Daimler TSSData Warehouse / DHBW81

INMON CORPORATE INFORMATION FACTORYInternal data sourcesData WarehouseBackendOLTPOLTPStaging Layer(Input Layer)subjectoriented,Core integrated,timeWarehousevariant,Layer nonvolatile(StorageLayer)FrontendMart Layer(Output Layer)(Reporting Layer)External data sourcesMetadata ManagementSecurityDWH Manager incl. MonitorDaimler TSSData Warehouse / DHBW82

INMON CORPORATE INFORMATION FACTORY Top-down approach (Normalized) Core Warehouse is essen

Often used as synonym DWH more technical focus BI more business / process focus usiness intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision