UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF

Transcription

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYTIES443Lecture 2Introduction to BusinessIntelligenceMykola PechenizkiyCourse webpage: http://www.cs.jyu.fi/ mpechen/TIES443November 2, 2006Department of Mathematical Information TechnologyUniversity of JyväskyläTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ1DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYTopics for today Decision Making Process– as motivation for Business Intelligence (BI) Introduction to BI– Basic definitions BI, DW, OLTP, OLAP etc.– BI processes Increasing potential to support business decisions– Decision Support System (DSS) from BI perspective 3-layered architecture– OLTP vs. OLAP Operational applications vs. analytical applications– OLAP vs. DM– Placing DM in BI context DM myths; interests of academia and business in different aspects of DMTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence21

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYDecision Making Process Decision making at different levels– Operational Related to daily activities with short-term effect Structured decisions taken by lower management– Tactical Semi-structured decisions taken by middle management– Strategic Long-term effect Unstructured decisions taken by top management Decision making steps include– Problem identification,– Finding alternative solutions,– Making a choice Information and knowledge form the backbone of thedecision making processTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ3DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYTechnology is needed “ to push information closer to the point of service toenhance decision-making, and to make the data actionable” – SAS vision oftheir customers’ needsTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence42

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYTypes of Knowledge Available Expert knowledge– Common/contextual, possed/distributed among a few experts– extensive training and/or experience Organizational knowledge– Represents intricate relationships between components of anorganization– Embodies all the human knowledge embedded within theorganization– Captures other implicit knowledge as well Organizational knowledge is embedded in thetransactional data Knowledge Acquisition– Knowledge elicitation (experts) vs. Knowledge discovery (data)– Interviewing/observing a human expert vs. Data Mining for Identifying basic rules– IF temperature -35 AND time 9.00 THEN don’t go to lectureTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ5DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYPros and Cons of Knowledge Discovery Advantages– Not dependent on one expert– Based on actual performance If the expert made wrong decisions, those failures are pruned out– Potentially, can capture all relevant knowledge Not just in-human knowledge– Objective, not subjective– Well understood in theory and practice Disadvantages– Depends heavily on the data set used Noise in the data set can throw one off, GIGO– Based on historical data If the future context changes, then performance can drop The underlying basic rule (theory) may never be discoveredTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence63

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYMotivation – Enabling Decision Support Decision Support - IT to help the knowledge worker(executive, manager, analyst) make faster & betterdecisions Organizations need various kinds of information tosupport decisions– Two types of applications: Operational applications Analytical applications Decision-making speed if an important success factor inthe information economy The problem is to find the right information and analyze itTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ7DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYBasic Definitions Business IntelligenceData WarehouseOLTPOLAPData MartData Cube More buzzwords in the following lecture on datawarehousingTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence84

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYWhat Is Business Intelligence? Business Intelligence (BI) is– the new technology for understanding the past & predictingthe future – a broad category of technologies that allows for gathering, storing, accessing & analyzing data to help business usersmake better decisions analyzing business performance through data-driven insight– a broad category of applications, which include the activities of decision support systemsquery and reportingonline analytical processing (OLAP)statistical analysis, forecasting, and data mining. BI applications can be:– mission-critical and integral to an enterprise's operations oroccasional to meet a special requirement– enterprise-wide or local to one division, department, or project– centrally initiated or driven by user demandTIES443: Introduction to DMUNIVERSITY OF JYVÄSKYLÄLecture 2: Introduction to Business Intelligence9DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYOne simple BI exampleTIES443: Introduction to DMhttp://exonous.typepad.com/mis/business intelligence.jpgLecture 2: Introduction to Business Intelligence105

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYWhat is Data Warehouse? Defined in many different ways, but not rigorously.– A decision support database that is maintained separately from theorganization’s operational database– A consistent database source that bring together information frommultiple sources for decision support queries– Support information processing by providing a solid platform ofconsolidated, historical data for analysis Data warehousing:– The process of constructing and using data warehouses A data warehouse is based on amultidimensional data model whichviews data in the form of a data cube We will consider different aspect ofdata warehousing in the followinglecture tomorrowTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ11DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYData Warehouse vs. Operational DBMS OLTP (on-line transaction processing)– Major task of traditional relational DBMS– Day-to-day operations: purchasing, inventory, banking, manufacturing,payroll, registration, accounting, etc.– Aims at reliable and efficient processing of a large number of transactionsand ensuring data consistency OLAP (on-line analytical processing)– Major task of data warehouse system– Data analysis and decision making– Aims at efficient multidimensional processing of large data volumes Fast, interactive answers to large aggregate queries Distinct features (OLTP vs. OLAP):–––––User and system orientation: customer vs. marketData contents: current, detailed vs. historical, consolidatedDatabase design: ER application vs. star subjectView: current, local vs. evolutionary, integratedAccess patterns: update vs. read-only but complex queriesTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence126

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYOLTP vs.OLAPUserClerk, IT ProfessionalKnowledge workerFunctionDay to day operationsDecision supportDB ent, IsolatedHistorical, ConsolidatedViewDetailed, Flat relationalSummarized, MultidimensionalUsageStructured, RepetitiveAd hocUnit of workShort, Simple transactionComplex queryAccessRead/writeRead MostlyOperationsIndex/hash on prim. KeyLots of Scans# Rec. accessed TensMillions#UsersThousandsHundredsDb size100 MB-GB100GB-TBMetricTrans. throughputQuery throughput, responseTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ13DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYNeed of Data Warehousing (for OLAP) High performance for both systems– DBMS— tuned for OLTP access methods, indexing, concurrency control, recovery– Warehouse—tuned for OLAP complex OLAP queries, multidimensional view, consolidation. Different functions and different data– Missing data: Decision support requires historical data whichoperational DBs do not typically maintain– Data consolidation: DS requires consolidation (aggregation,summarization) of data from heterogeneous sources– Data quality: different sources typically use inconsistent datarepresentations, codes and formats which have to be reconciledTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence147

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYSQL, OLAP, and Data MiningSQLOLAPData MiningTaskExtraction ofdetailed andsummary dataSummaries, trendsand forecastsKnowledge discoveryType ofresultInformationAnalysisInsight and PredictionMethodDeduction (Ask the Multidimensional data Induction (Build thequestion, verifymodeling,model, apply it to newwith data)Aggregation, Statistics data, get the result)ExamplequestionWho purchasedmutual funds inthe last 3 years?What is the averageWho will buy a mutualincome of mutual fundfund in the next 6buyers by region bymonths and why?year?Note: OLAP helps to helps in discovering the patterns in data and can be useful forknowledge organization also;the better we understand the data, the more effective DM/KDD will beTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ15DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYExample of SQL, OLAP & DM: Weather TIES443: Introduction to DMLecture 2: Introduction to Business Intelligence168

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYExample of SQL, OLAP & DM: Weather Data By querying a DBMS containing the above table we may answerquestions like:– What was the temperature in the sunny days? {85, 80, 72, 69, 75}– Which days the humidity was less than 75? {6, 7, 9, 11}– Which days the temperature was greater than 70 and the humidity wasless than 75? The intersection of the above two: {11} Using OLAP we can create a Multidimensional Model of our data(Data Cube).– E.g. using the dimensions: time, outlook and play we can create thefollowing model.9/5sunnyrainyovercastWeek 10/22/12/0Week 22/11/12/0 Using the DM algorithm (e.g ID3) we can produce the followingdecision tree:–outlook sunny humidity high: no humidity normal: yes––outlook overcast: yesoutlook rainy windy true: no windy false: yesTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ17DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYData(internal & external)Objective(s)BI/DM processesInput-Output ViewBusiness KnowledgeReportsDecision ModelsNew KnowledgeData Mining is a business-driven process, supported byadequate tools, aimed at the discovery and consistent use ofmeaningful, profitable knowledge from corporate dataTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence189

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYData Mining in the BI ContextData ExtractionData StorageCollecting / TransformingStoring / Aggregating / HistorisingBusiness IntelligenceVisualizationReporting / EIS / MISExplorationOLAPData AnalysisDiscoveryTIES443: Introduction to DMData MiningLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ19DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYBusiness Intelligence ProcessesIncreasing potentialto supportbusiness decisionsMakingDecisionsData PresentationVisualization TechniquesData MiningInformation DiscoveryEnd UserBusinessAnalystDataAnalystData ExplorationStatistical Analysis, Querying and ReportingData Warehouses / Data MartsOLAP, MDAData SourcesPaper, Files, Information Providers, Database Systems, OLTPTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceDBA2010

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYThe Complete DSS from BI actTransformLoadRefreshDataWarehouseServeData MiningData MartsData SourcesData StorageROLAPServerOLAP Engine Front-End ToolsMultiMulti-Tiered ArchitectureTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ21DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYThree-Tier Decision Support Systems Warehouse database server– Almost always a relational DBMS, rarely flat files OLAP servers– Relational OLAP (ROLAP): extended relational DBMS that mapsoperations on multidimensional data to standard relationaloperators– Multidimensional OLAP (MOLAP): special-purpose server thatdirectly implements multidimensional data and operations Clients– Query and reporting tools– Analysis tools– Data mining toolsTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence2211

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYData Warehouse vs. Data Marts Enterprise warehouse: collects all information about subjects(customers,products,sales,assets, personnel) that spanthe entire organization– Requires extensive business modeling (may take years to designand build) Data Marts: Departmental subsets that focus on selectedsubjects– Marketing data mart: customer, product, sales– Faster roll out, but complex integration in the long run Virtual warehouse: views over operational DBs– Materialize selective summary views for efficient query processing– Easy to build but require excess capability on operat. db serversTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ23DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYMetadata Repository Meta data is the data defining warehouse objects. It hasthe following kinds– Description of the structure of the warehouse schema, view, dimensions, hierarchies, derived data defn, data martlocations and contents– Operational meta-data data lineage (history of migrated data and transformation path),currency of data (active, archived, or purged), monitoringinformation (warehouse usage statistics, error reports, audit trails)– The algorithms used for summarization– The mapping from operational environment to the datawarehouse– Data related to system performance warehouse schema, view and derived data definitions– Business data business terms and definitions, ownership of data, charging policiesTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence2412

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYBusiness Intelligence: An old DefinitionTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ25DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYBusiness Intelligence: SAS visionTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence2613

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYBusiness Intelligence: SAS visionTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ27DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYBusiness Intelligence LayersTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence2814

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYForms of Business IntelligenceTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ29DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYBI system and BIBI-related processes - Sun's visionTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence3015

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYBusiness Intelligence Cyclewww.isa.co.uk/bi portal.htmTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ31DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYBusiness IntelligenceTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence3216

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYSummary BI, DW, OLAP, and DM concepts Decision making and BI BI processes DM in BI context DSS from BI perspective – 3 layers SQL vs. OLAP vs. DM OLTP vs. OLAPWhat else did you get from this lecture?TIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ33DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYIf we still have time DM in BI contextSome DM myths; success factors; current state of the art in DM –what is emphasized in research community and what is muchmore important for business/industryConcrete DM Myths Extracted from:– “Debunking Data Mining Myths: Don't let contradictory claimsabout data mining keep you from improving your business” byRobert D. Small Information Week: January 20, 1997 Copyright 1997CMP Media, Inc.TIES443: Introduction to DMLecture 2: Introduction to Business Intelligence3417

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYA Few Quotes– “Data mining is quickly becoming a necessity, andthose who do not do it will soon be left in the dust.Data mining is one of the few software activities withmeasurable return on investment associated with it.”– “People who can't see the value in data mining as aconcept either don't have the data or don't have datawith integrity.”TIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ35DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYSome DM Myths (1 of 2) DM produces surprising results that will utterlytransform your business. DM techniques are so sophisticated that they cansubstitute for domain knowledge or for experience inanalysis and model building. DM tools automatically find the patterns you're lookingfor, without being told what to do. DM is useful only in certain areas, such as marketing,sales, and fraud detection.TIES443: Introduction to DMLecture 2: Introduction to Business Intelligence3618

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYSome DM Myths (2 of 2) The methods used in DM are fundamentally differentfrom the older quantitative model-building techniques. DM is an extremely complex process. Only massive databases are worth mining. DM is more effective with more data, so all existing datashould be brought into any data-mining effort. Building a DM model on a sample of a database isineffective, because sampling loses the information in theunused data. DM is another fad that will soon fade, allowing us toreturn to standard business practice.TIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ37DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYThe Right Expectation Data Mining is unlikely to produce surprising results thatwill utterly transform a business. Rather:– Early results: scientific confirmation of human intuition– Beyond: steady improvement to an already successfulorganization– Occasionally: discovery of one rare “breakthrough” factTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence3819

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYThe Right Organization Data Mining is not sophisticated enough to be substitutedfor domain knowledge or for experience in analysis andmodel building. Rather:– Data Mining is a joint venture– “ put teams together that have a variety of skills (e.g., statistics,business and IT skills), are creative and are close to the businessthinking .”TIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ39DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYKey Success Factors Have a clearly articulated business problem that needs tobe solved and for which Data Mining is the adequatetechnology Ensure that the problem being pursued is supported bythe right type of data of sufficient quality and in sufficientquantity Recognize that Data Mining is a process with manycomponents and dependencies Plan to learn from the Data Mining process whateverthe outcomeTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence4020

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYDM – state of the art DM is still a technology having great expectations toenable organizations to take more benefit of theirhuge databases. There exist some success stories whereorganizations have managed to have competitiveadvantage of DM. Still the strong focus of most DM-researchers intechnology-oriented topics does not supportexpanding the scope in less rigorous but practicallyvery relevant sub-areas. Research in the IS discipline has strong traditions totake into account human and organizational aspectsof systems beside the technical ones.TIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ41DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYDM – state of the art (cont.) Currently the maturation of DM-supporting processes which wouldtake into account human and organizational aspects is still living itschildhood. DM community might benefit, at least from the practical point ofview, looking at some other older sub-areas of IT having traditionsto consider solution-driven concepts with a focus also on humanand organizational aspects. The DM community by becoming more amenable to research resultsof the IS community might be able to increase its collectiveunderstanding of– how DM artifacts are developed – conceived, constructed, andimplemented,– how DM artifacts are used, supported and evolved,– how DM artifacts impact and are impacted by the contexts in whichthey are embedded.TIES443: Introduction to DMLecture 2: Introduction to Business Intelligence4221

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYSo, where are we? a new successful industry (as DM) can followconsecutive phases:1.2.3.4.5.discovering a new idea,ensuring its applicability,producing small-scale systems to test the market,better understanding of new technology andproducing a fully scaled system. At the present moment there are several dozens of DMsystems, none of which can be compared to the scale of aDBMS system.– This fact indicates that we are still in the 3rd phase in the DMarea!TIES443: Introduction to DMLecture 2: Introduction to Business Intelligence43UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYDM: Academy vs. IndustryDM Un-)Successful Applications inthe appropriate environmentTIES443: Introduction to DMRefineApplicableKnowledgeEnvironmentKnowledge BaseFoundationsDesign knowledgeContribution to Knowledge BaseLecture 2: Introduction to Business Intelligence4422

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYWhere is the focus?Still! speeding-up, scaling-up, and increasing the accuracies ofDM techniques. Piatetsky-Shapiro : “we see many papers proposing incrementalrefinements in association rules algorithms, but very few papersdescribing how the discovered association rules are used” R&D goals of DM are quite different: – since research is knowledge-oriented while development is profitoriented.– Thus, DM research is concentrated on the development of newalgorithms or their enhancements,– but the DM developers in domain areas are aware of costconsiderations: investment in research, product development,marketing, and product support. the study of the DM development and DM use processes is equallyimportant as the technological aspects and therefore such researchactivities are likely to emerge within the DM field.TIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ45DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYAdditional SlidesThe following topics will be covered in thefollowing lecture in more detail. These slideare for answering your questions if anyTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence4623

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYEfficient Processing OLAP Queries Determine which operations should be performed on theavailable cuboids:– transform drill, roll, etc. into corresponding SQL and/or OLAPoperations, e.g, dice selection projection Determine to which materialized cuboid(s) the relevantoperations should be applied. Exploring indexing structures and compressed vs. densearray structures in MOLAPTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ47DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYData Warehouse BackBack-End Tools and Utilities Data extraction:– get data from multiple, heterogeneous, and externalsources Data cleaning:– detect errors in the data and rectify them whenpossible Data transformation:– convert data from legacy or host format to warehouseformat Load:– sort, summarize, consolidate, compute views, checkintegrity, and build indicies and partitions Refresh– propagate the updates from the data sources to thewarehouseTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence4824

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYCuboids Corresponding to the Cubeall0-D(apex) y1-D cuboidsdate, country2-D cuboids3-D(base) cuboidproduct, date, countryTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ49DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYOLAP Mining: An Integration of DM and DW Data mining systems, DBMS, Data warehouse systemscoupling– No coupling, loose-coupling, semi-tight-coupling, tight-coupling On-line analytical mining data– integration of mining and OLAP technologies Interactive mining multi-level knowledge– Necessity of mining knowledge and patterns at different levels ofabstraction by drilling/rolling, pivoting, slicing/dicing, etc. Integration of multiple mining functions– Characterized classification, first clustering and then associationTIES443: Introduction to DMLecture 2: Introduction to Business Intelligence5025

UNIVERSITY OF JYVÄSKYLÄDEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYBrowsing a Data Cube Visualization OLAPcapabilities InteractivemanipulationTIES443: Introduction to DMLecture 2: Introduction to Business IntelligenceUNIVERSITY OF JYVÄSKYLÄ51DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGYTypical OLAP Operations Roll up (drill-up): summarize data– by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up– from higher level summary to lower level summary or detailed data,or introducing new dimensions Slice and dice– project and select Pivot (rotate)– reorient the cube, visualization, 3D to series of 2D planes. Other operations– drill across: involving (across) more than one fact table– drill through: through the bottom level of the cube to its back-endrelational tables (using SQL)TIES443: Introduction to DMLecture 2: Introduction to Business Intelligence5226

TIES443: Introduction to DM Lecture 2: Introduction to Business Intelligence 9 What Is Business Intelligence? Business Intelligence (BI) is – the new technology for understanding the past & predicting the future – a broad category of technologies that allows for gathering, storing, accessing &am