Learning Objectives Chapter 2: Data Warehousing

Transcription

Learning ObjectivesChapter 2:Data Warehousing Understand the basic definitions andconcepts of data warehouses Learn different types of data warehousingarchitectures; their comparative advantagesand disadvantages Describe the processes used in developingand managing data warehouses Explain data warehousing operationsJason C. H. Chen, Ph.D.Professor of MISSchool of Business AdministrationGonzaga UniversitySpokane, WA 99258chen@jepson.gonzaga.edu(Continued )Dr. Chen, Data Base ManagementOpening Vignette Learning ObjectivesIsle of Capri Casinos Is Winning with EnterpriseData Warehouse Company background Problem description Proposed solution Results Answer & discuss the case questions. Explain the role of data warehouses indecision support Explain data integration and the extraction,transformation, and load (ETL) processes Describe real-time (a.k.a. right-time and/oractive) data warehousing Understand data warehouse administrationand security issuesDr. Chen, Business Intelligence3Questions for the OpeningVignetteDr. Chen, Business Intelligence4 1. Why is it important for Isle to have an EDW? In the gaming industry, companies distinguishthemselves based on their customer relationships.An enterprise data warehouse (EDW) gathers andprovides the data needed to tell Isle of Capri whatcustomers respond to, so the casinos can adapttheir offerings. The information provided by the EDW lets Isledeepen its understanding of customers, so it canefficiently give them more of the kinds ofentertainment they are looking for.1. Why is it important for Isle to have an EDW?2. What were the business challenges or opportunities thatIsle was facing?3. What was the process Isle followed to realize EDW?Comment on the potential challenges Isle might have hadgoing through the process of EDW development.4. What were the benefits of implementing an EDW at Isle?Can you think of other potential benefits that were notlisted in the case?5. Why do you think large enterprises like Isle in the gamingindustry can succeed without having a capable datawarehouse/business intelligence infrastructure?Dr. Chen, Business Intelligence2Dr. Chen, Business Intelligence5Dr. Chen, Business Intelligence61

2. What were the business challenges or opportunities thatIsle was facing? Isle of Capri Casinos is one of the largest publicly tradedgaming companies in the United States, but the gamingbusiness is competitive. Other casinos offer essentially thesame games, so Isle must find ways to make its entertainmentand hospitality atmosphere one that exceeds customerexpectations. Before Isle implemented the EDW, casinomanagers had to wait to review monthly data until the secondweek of the following month. The time lag made it difficult for casinos to identify whatactions were appealing to customers in time to respond.Adding to these business challenges, Hurricane Katrina setback initial efforts to set up a data warehouse at thesoutheastern company.Dr. Chen, Business Intelligence 3. What was the process Isle followed to realize EDW?Comment on the potential challenges Isle might have hadgoing through the process of EDW development. Isle of Capri brought in two expert suppliers. Teradataprovided the core solution; IBM Cognos provided expertisein business intelligence. Isle hired a management team thatunderstood how EDW could support decision making atIsle. That team would be able to help Isle’s managers with thechallenge of understanding how they can frame queries andfollow-up questions to figure out ways to improve thebusiness. Most likely, the potential of using detailed, up-tothe-minute data would be unfamiliar to many of thesemanagers.7 4. What were the benefits of implementing an EDW at Isle? Can youthink of other potential benefits that were not listed in the case? The implementation of EDW brought several benefits related to thetimeliness and detail of the data that became available. Instead of fiveweek-old monthly reports, managers can now study a variety of daily,weekly, and monthly reports. The reports segment data by particularproperties and customer groups, so managers can zero in on particularproblems and successes, easily making comparisons among properties. Managers can submit queries about data sets and receive informationwithin minutes. In this way, managers can find out how particularpromotions are affecting customer behavior at particular casinos. TheEDW also connects data about casino activity with data aboutcustomers’ use of hotels and efforts by Isle’s hosts. This, too, helps thecompany target promotions and offer customers incentives they value.Even decisions as detailed as where to locate slot machines can beadjusted to boost profits based on data from the EDW.Dr. Chen, Business Intelligence 5. Why do you think large enterprises like Isle in thegaming industry can succeed without having a capabledata warehouse/business intelligence infrastructure? In the past, businesses in the gaming industry could succeedwithout a capable data warehouse/business intelligenceinfrastructure because their managers knew as much aboutcustomers as their competitors’ management knew. They were all testing ideas for promotion or entertainmentand responding to customer behavior at about the samepace. However, when one company such as Isle begins torespond to daily, property-by-property data, choosing themarketing and entertainment options that deliver exactlywhat most profitably lures customers, competitors withoutthat capability will soon begin to suffer.9Key Fields Keys are special fields that serve two main purposes:DW definitionCharacteristics of DWData MartsODS, EDW, MetadataDW FrameworkDW Architecture & ETL ProcessDW DevelopmentDW IssuesDr. Chen, Business Intelligence10Dr. Chen, Business IntelligenceMain Data Warehousing Topics 8Dr. Chen, Business Intelligence Primary keys are unique identifiers of the relation.Examples include employee numbers, social securitynumbers, etc. This guarantees that all rows are unique.Foreign keys are identifiers that enable a dependent relation(on the many side of a relationship) to refer to its parentrelation (on the one side of the relationship). Keys can be simple (a single field) or composite(more than one field). Keys usually are used as indexes to speed up theresponse to user queries.11Dr. Chen, Business Intelligence122

Referential IntegrityNormalized vs. De-normalizedCan we “delete” customers#5 if orders#1 is still in thedatabase? Why? We will study the concept and technique of“normalization and de-normalization”(especially for those do not havebackground on database management) aswell as OLTP and OLAP.customerspkCustomers#5pkcustomer#1001 10051020ordersOrder#orders#11000 10031012LastNameMORALES GIRARDFALAHReferredNULLNULLNULLRegionSE NWNEfkcustomer#100510011007 ShipZipShipCost98114 32328490022.00 4.006.00Can we “create” orders#1 if customers#5 is notcreated? Why?Dr. Chen, Business Intelligence13More on OLTP vs. OLAPpkfkpkpk: primary keyfk: foreign keyFig. Extra-a: A simpledatabase with a relationbetween two tables.For those have databasebackground.pk The figure depicts a relationaldatabase environment with twotables. The first table contains informationabout pet owners; the second,information about pets. The tablesare related by the single columnthey have in common: Owner ID. By relating tables to one another,redundancy ofwe can reducedata and improve databaseperformance. The process of breaking tablesapart and thereby reducing dataredundancy is callednormalization.15Dr. Chen, Business Intelligence14Dr. Chen, Business IntelligencepkOLTP vs. OLAP (cont.)fk Most relational databases which are designed to handle a high number of readsOLTPand writes (updates and retrievals of information) are referred to as(OnLine Transaction Processing) systems. OLTP systems are very efficient for high volume activities such as cashiering,where many items are being recorded via bar code scanners in a very shortperiod of time. However, using OLTP databases for analysis is generally not very efficient,because in order to retrieve data from multiple tables at the same time, a queryjoins must be used.containingDr. Chen, Business Intelligence16OLTP vs. OLAP (cont.) In order to keep our transactional databases running quickly and smoothly, wemay wish to create a data warehouse. A data warehouse is a type of largedatabase (including both current and historical data) that has beendenormalizedand archived. Denormalization is the process of intentionally combining some tables into asingle table in spite of the fact that this may introduce duplicate data in somecolumns.Why need Data Warehouses? A solution to A solution to bridging the informationdata warehouses whichgap is theconsolidate and integrate informationfrom many different sources andarrange it in a meaningful format formaking accurate business decisions.Fig. Extra-b: A combination of the tables into a single dataset. The figure depicts what our simple example data might look like if it were in adata warehouse. When we design databases in this way, we reduce the numberof joins necessary to query related data, thereby speeding up the process ofanalyzing our data.OLAP (OnLine Analytical Databases designed in this manner are calledProcessing) systems.Dr. Chen, Business Intelligence17Dr. Chen, Business Intelligence183

A Historical Perspective toData WarehousingWhat is a Data Warehouse? üüüüüA physical repository where relational data ormultidimensional structure (including both currentand historical data) are specially organized toprovide enterprise-wide, cleansed data in astandardized format for analytical processingactivities (i.e., OLAP, data mining, querying,reporting and other decision support apps).“The data warehouse is a collection of integrated,subject-oriented databases designed to support DSSfunctions, where each unit of data is non-volatileand relevant to some moment in time”1970s e.g. customers, patients, students, productsüüüüüüüMini/personal computers (PCs)Business applications for PCsDistributer DBMSRelational DBMSTeradata ships commercial DBs2010sExponentially growing data Web dataConsolidation of DW/BI industryData warehouse appliances emergedBusiness intelligence popularizedData mining and predictive modelingOpen source softwareSaaS, PaaS, Cloud ComputingFig. 2.1 A List of Events That Led to Data Warehousing Development20ü Dependent data mart– A subset that is created directly from a data warehouseü Independent data mart– A small data warehouse designed for a strategic business unit or adepartmentCan study trends and changesOperational data stores 2. (ODS) Read-only, periodically refreshedSummarizedNot normalizedMetadataWeb based, relational/multi-dimensionalClient/serverReal-time and/or right-time (active)Dr. Chen, Business IntelligenceA type of database often used as an interim area for a data warehousefor short-term decisions involving mission-critical apps.data warehouse 3. Enterprise(EDW)A data warehouse for the enterprise for medium- and long-termdecisions.Operational marts: An operational data mart and it is created whenoperational data need to be analyzed multi-dimensionally. Its data comefrom an ODS. 21Types of Data WarehousesDr. Chen, Business Intelligence22An Example of CUSTOMERS Data File (Part of Database)Customer#NUMBER(4) Metadata 2000sA departmental data warehouse that stores only relevant dataConsistent naming conventions, formats, encoding structures; frommultiple data sourcesvolatileNon- Big Data analyticsSocial media analyticsText and Web AnalyticsHadoop, MapReduce, NoSQLIn-memory, in-database(Three) Types of Data WarehousesTime-variant (time series) Dr. Chen, Business IntelligenceIntegrated 1990süüüüüData Mart 1.Subject -oriented Centralized data storageData warehousing was bornInmon, Building the Data WarehouseKimball, The Data Warehouse ToolkitEDW architecture designü Business Data Warehouse coinedCharacteristics of DW üüüüü1980süüüüü19Dr. Chen, Business IntelligenceMainframe computersSimple data entryRoutine reportingPrimitive database structuresTeradata (2)VARCHAR2(30)Data about data. In a data warehouse, metadata describe the structure ofand some meaning about data and the manner of its acquisition and theireffective or ineffective use.D.M.D.W.ODSOper.martDSS:EDWDr. Chen, Business IntelligenceCRM,SCM,BPM,BAM,PLM,RM,KM23Which one is the “Metadata” for CUSTOMERS data file?Dr. Chen, Business Intelligence244

Application Case 2.1A Better Data Plan: Well-Established TELCOsLeverage Data Warehousing and Analytics to Stayon Top in a Competitive IndustryQuestions for Discussion1. What are the main challenges for TELCOs?2. How can data warehousing and data analyticshelp TELCOs in overcoming their challenges?3. Why do you think TELCOs are well suited totake full advantage of data analytics?Dr. Chen, Business Intelligence 2. How can data warehousing and data analytics helpTELCOs in overcoming their challenges? Highly targeted data analytics play an ever more critical rolein helping carriers secure or improve their standing in anincreasingly competitive marketplace. Argentina’s Telefónica de Argentina used analytics for its“traceability project,” which tracked the factors involved incustomer churn, a big problem among phone service carriers. France’s Bouygues Telecom used BI technologies to facilitatecost reduction through automation via its Teradata-basedmarketing operations management system, which automatesmarketing/communications collateral production. Pakistan’s Mobilink uses BI to help acquire customers andgrow their subscriber network, largely aided by socialnetworking.Dr. Chen, Business Intelligence25 1. What are the main challenges for TELCOs? To stay competitive, TELCOs must continuouslyrefine everything from

ü Kimball, The Data Warehouse Toolkit ü EDW architecture design ü Exponentially growing data Web data ü Consolidation of DW/BI industry ü Data warehouse appliances emerged ü Business intelligence popularized ü Data mining and predictive modeling ü Open source software ü SaaS, PaaS, Cloud Computing ü Big Data analytics ü Social media analytics ü Text and Web Analytics ü Hadoop .