Big Data Analytics With Oracle Advanced Analytics In-Database Option

Transcription

Big Data Analytics withOracle Advanced AnalyticsIn-Database OptionCharlie BergerSr. Director Product Management, Data Mining andAdvanced CharlieDataMine1Copyright 2012, Oracle and/or its affiliates. All rights reserved.

The following is intended to outline our general productdirection. It is intended for information purposes only, and maynot be incorporated into any contract. It is not a commitment todeliver any material, code, or functionality, and should not berelied upon in making purchasing decisions.The development, release, and timing of any features orfunctionality described for Oracle’s products remains at thesole discretion of Oracle.2Copyright 2012, Oracle and/or its affiliates. All rights reserved.

“Big Data” “Big Data Analytics”“There was 5 exabytes of informationcreated between the dawn of civilizationthrough 2003, but that much informationis now created every 2 days, and thepace is increasing.”1.8 trillion gigabytes of datawas created in 2011 (IN BILLIONS)GIGABYTES OF DATA) CREATED10,000 More than 90% isunstructured data5,000- Google CEO Eric Schmidt Approx. 500quadrillion filesRequires capability to rapidly: Collect and integrate data Quantity doublesevery 2 years Understand data & their relationships Respond and take action020053Copyright 2012, Oracle and/or its affiliates. All rights reserved.2010Content Provided By Cloudera.2015Source: IDC 2011STRUCTURED DATAUNSTRUCTURED DATA

Oracle Big Data PlatformOptimized for Hadoop,R, and NoSQL ProcessingOracleBig DataConnectorsHadoopOpen Source ROracle NoSQLDatabaseOracle Big DataConnectors4AcquireCopyright 2012, Oracle and/or its affiliates. All rights reserved.OracleExalytics“System of Record”Optimized for DW/OLTPOptimized forAnalytics & In-Memory e izeOracleDatabaseOracle EnterprisePerformance ManagementIn-DatabaseAnalyticsOracle Big DataApplianceOracle Business IntelligenceApplicationsOracle Business IntelligenceToolsOracle Endeca InformationDiscoveryDiscover & Analyze

“Without proper analysis, it's just data; not usefulactionable information something that you can exploittoday something that your competitor may not have yetdiscovered.”5Copyright 2012, Oracle and/or its affiliates. All rights reserved.

What is Data Mining?Automatically sifting through large amounts of data tofind previously hidden patterns, discover valuable newinsights and make predictions Identify most important factor (Attribute Importance) Predict customer behavior (Classification) Predict or estimate a value (Regression) Find profiles of targeted people or items (Decision Trees) Segment a population (Clustering) Find fraudulent or “rare events” (Anomaly Detection) Determine co-occurring items in a “baskets” (Associations)6Copyright 2012, Oracle and/or its affiliates. All rights reserved.A1 A2 A3 A4 A5 A6 A7

Data Mining ProvidesBetter Information, Valuable Insights and PredictionsCell Phone ChurnersRvs. Loyal CustomersSegment #3IF CUST MO 7 AND INCOME 175K, THENPrediction Cell Phone Churner,Confidence 83%Support 6/39Insight & PredictionSegment #1IF CUST MO 14 ANDINCOME 90K, THENPrediction Cell Phone ChurnerConfidence 100%Support 8/39Customer MonthsSource: Inspired from Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management by Michael J. A. Berry, Gordon S. Linoff7Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Data Mining ProvidesBetter Information, Valuable Insights and PredictionsCell Phone Fraudvs. Loyal Customers?Customer MonthsSource: Inspired from Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management by Michael J. A. Berry, Gordon S. Linoff8Copyright 2012, Oracle and/or its affiliates. All rights reserved.R

Finding Needles in Haystacks Haystacksare usuallyBIG Needles aretypically smalland9rareCopyright 2012, Oracle and/or its affiliates. All rights reserved.

Challenge: Finding Anomalies Look for what is“different” Single observedvalue, taken alone,may seem “normal” Consider multipleattributessimultaneously Taken collectively,a record mayappear to beanomalous10Copyright 2012, Oracle and/or its affiliates. All rights reserved.X1X2X3X4X1X2X3X4

Data Mining & Predictive AnalyticsExample Use Cases for Advanced Analytics Targeting the right customer with the right offer Discovering hidden customer segments Finding most profitable selling opportunities Anticipating and preventing customer churn Exploiting the full 360 degree customer opportunity Security and suspicious activity detection Understanding sentiments in customer conversations Reducing medical errors & improving quality of health Understanding influencers in social networks11Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Advanced AnalyticsFastest Way to Deliver Scalable Enterprise-wide Predictive AnalyticsKey Features In-database data mining algorithms 12and open source R algorithmsSQL, PL/SQL, R languagesScalable, parallel in-databaseexecutionWorkflow GUI and IDEsIntegrated component of DatabaseEnables enterprise analyticalapplicationsCopyright 2012, Oracle and/or its affiliates. All rights reserved.Insert Information Protection Policy Classification from Slide 13

Why Oracle Advanced Analytics?Differentiating Features Performance and Scalability Leverages power and scalability of OracleDatabase. Fastest Way to Deliver Enterprise PredictiveAnalytics Applications Integrated with OBIEE and any application thatuses SQL queries Lowest Total Costs of Ownership No need for separate analytical servers13Copyright 2012, Oracle and/or its affiliates. All rights reserved.Insert Information Protection Policy Classification from Slide 13

Oracle Advanced Analytics Value PropositionTraditional AnalyticsOracle Advanced AnalyticsData ImportValue Proposition Fastest path from data to insightsData MiningModel “Scoring” Fastest analytical development Fastest in-database scoring engine on the planetData PreparationandTransformationSavingsData MiningModel Building Flexible deployment options for analytics Lowest TCO by eliminating data duplication Secure, Scalable and ManageableData remains in the DatabaseData Prep &TransformationData preparation for analytics is automatedModel “Scoring”Embedded Data PrepData ExtractionModel BuildingData PreparationHours, Days or WeeksSourceDataDatasets/ WorkAreaAnalyticalProcessingProcessOutputSecs, Mins or HoursTargetCopyright 2012, Oracle and/or its affiliates. All rights reserved.Scalable implementation of R programming language in-databaseFlexible interface options – SQL, R, IDE, GUIFastest and most Flexible analytic deployment optionsR14Scalable distributed-parallel implementation of machine learningtechniques in the databaseCan import 3rd party models

Turkcell İletişim Hizmetleri A.Ş.Combating Communications FraudCompany/Background Industry: CommunicationsEmployees: 3,583Annual Revenue: Over 5 BillionFirst Turkish company listed on the NYSE.Key Products Oracle Exadata Database Machine X2-2 HC FullRack Oracle Advanced Analytics OptionWhy Oracle Extremely fast sifting through huge data volumes Communications fraud is a major issue—anonymous prepaid cards can be With fraud, time is moneyChallenges/Opportunitiesused as cash vehicles—for example, to withdraw cash at ATMs Prepaid card fraud can result in millions of dollars lost every year Monitor numerous parameters for up to 10 billion daily call-data recordsSolution Leveraged SQL for the preparation and transformation of one petabyte ofuncompressed raw communications data Deployed Oracle Data Mining models on Oracle Exadata to identifyactionable information in less time than traditional methods Achieved extreme data analysis speed with in-database analyticsperformed inside Oracle Exadata, that enabled analysts to detect fraudpatterns almost immediately15Copyright 2012, Oracle and/or its affiliates. All rights reserved.“Turkcell manages 100 terabytes of compresseddata—or one petabyte of uncompressed raw data—on Oracle Exadata. With Oracle Data Mining, acomponent of the Oracle Advanced AnalyticsOption, we can analyze large volumes of customerdata and call-data records easier and faster thanwith any other tool and rapidly detect and combatfraudulent phone use.”– Hasan Tonguç Yılmaz, Manager, Turkcellİletişim Hizmetleri A.Ş.Future Plans Develop more targeted customer campaigns Understand call center interactions for better service

Oracle Data Miner 11g Release 2 GUIAnomaly Detection—Simple Conceptual WorkflowTrain on “normal” recordsApply model and sort onlikelihood to be “different”16Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Fraud Prediction Demodrop table CLAIMS SET;exec dbms data mining.drop model('CLAIMSMODEL');create table CLAIMS SET (setting name varchar2(30), setting value varchar2(4000));insert into CLAIMS SET values ('ALGO NAME','ALGO SUPPORT VECTOR MACHINES');insert into CLAIMS SET values ('PREP AUTO','ON');commit;begindbms data mining.create model('CLAIMSMODEL', 'CLASSIFICATION','CLAIMS', 'POLICYNUMBER', null, 'CLAIMS SET');end;/-- Top 5 most suspicious fraud policy holder claimsselect * from(select POLICYNUMBER, round(prob fraud*100,2) percent fraud,rank() over (order by prob fraud desc) rnk from(select POLICYNUMBER, prediction probability(CLAIMSMODEL, '0' using *) prob fraudfrom CLAIMSwhere PASTNUMBEROFCLAIMS in ('2to4', 'morethan4')))where rnk 5order by percent fraud desc;17Copyright 2012, Oracle and/or its affiliates. All rights reserved.Insert Information Protection Policy Classification from Slide ENT ---12345Automated Monthly “Application”! Justadd:CreateView CLAIMS2 30AsSelect * from CLAIMS2Where mydate SYSDATE – 30

ExampleBetter Information for OBI EE Reports and DashboardsOAA’s babilitiesavailable in theDatabase forreportingOracleBI usingEE andOraclereportingBI EE andotherother toolstools18Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Financial Sector/Accounting/ExpensesAnomaly DetectionSimple Fraud Detection Methodology—1-Class SVMMore Sophisticated Fraud Detection Methodology—Clustering 1-Class SVM19Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Advanced AnalyticsMore Details On-the-fly, single record apply with new data (e.g. from call center)Select prediction probability(CLAS DT 1 1, 'Yes'USING 7800 as bank funds, 125 as checking amount, 20 ascredit balance, 55 as age, 'Married' as marital status,250 as MONEY MONTLY OVERDRAWN, 1 as house ownership)from dual;Social MediaCall CenterLikelihood to respond:Get AdviceBranchOfficeRMobileWebEmail20Copyright 2012, Oracle and/or its affiliates. All rights reserved.R

Enabling Predictive ApplicationsExample Applications Using Oracle Advanced Analytics Human Capital Management– Predictive Workforce—employee turnover and performance prediction and “What if?” analysis CRM– Sales Prediction Engine--prediction of sales opportunities, what to sell, amount, timing, etc. Supply Chain Management– Spend Classification-real-time flagging of noncompliance and anomalies in expense submissions Identity Management– Oracle Adaptive Access Manager—real-time security and fraud analytics Retail Analytics– Oracle Retail Customer Analytics—”shopping cart analysis” and next best offers Customer Support– Predictive Incident Monitoring (PIM) Customer Service offering for Database customers Manufacturing– Response surface modeling in chip design Predictive capabilities in Oracle Industry Data Models– Communications Data Model implements churn prediction, segmentation, profiling, etc.– Retail Data Model implements loyalty and market basket analysis– Airline Data Model implements analysis frequent flyers, loyalty, etc.21Copyright 2012, Oracle and/or its affiliates. All rights reserved.R

Oracle Communications Industry Data ModelFastest Way to Deliver Scalable Enterprise-wide Predictive AnalyticsOAA’s clustering and predictionsavailable in-DB for OBIEE22Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Integrated Business IntelligenceIntegrate a range of in-DB SQL & R Predictive Analytics & Graphics In-databaseconstructionof predictivemodels thatpredictcustomerbehavior OBIEE’sintegratedspatialmappingshows where23Customer “most likely” to beHIGH and VERY HIGH valuecustomer in the futureCopyright 2012, Oracle and/or its affiliates. All rights reserved.

Integration with Oracle BI EEOracle Data Mining results available toOracle BI EE administratorsOracle BI EE defines results forend user presentation24Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Fusion HCM Predictive AnalyticsBuilt-in Predictive AnalyticsOracle Advanced Analytics factory-installed predictiveanalytics show employees likely to leave, top reasons,expected performance and real-time "What if?" analysis25Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Factors associated withEmployee’s predicteddeparture26Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Data Miner GUISQL Developer 3.2 Extension—Free OTN Download Easy to Use– Oracle Data Miner GUI for data analysts– Explore data—discover new insights– “Work flow” paradigm for analytical methodologies Powerful– Multiple algorithms & data transformations– Runs 100% in-DB– Build, evaluate and apply data mining models Automate and Deploy– Generate and deploy SQL scripts for automation– Share analytical workflows27Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Data Miner GUIOracle Data Miner Nodes — Partial ListTables and ViewsTransformationsExplore DataModelingText28Copyright 2012, Oracle and/or its affiliates. All rights reserved.R

InsuranceIdentify “Likely Insurance Buyers” and their ProfilesOAA work flows captureanalytical process and generatesSQL code for deployment29Copyright 2012, Oracle and/or its affiliates. All rights reserved.R

Oracle Advanced AnalyticsData Mining Unstructured Data Mines unstructuredi.e. “text” data Include text andcomments in models Cluster and classifydocuments Oracle Text usedto preprocessunstructured text30Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Exadata Data Mining 11g Release 2Data Mining Model ”Scoring” Pushed to StorageFaster SQL predicates and OAA models are pushed to storage level for executionFor example, find the US customers likely to churn:select cust idfrom customerswhere region ‘US’and prediction probability(churnmod,‘Y’ using *) 0.8;31Copyright 2012, Oracle and/or its affiliates. All rights reserved.R

Oracle Advanced AnalyticsRSQL Data Mining tionLogistic Regression (GLM)Decision TreesNaïve BayesSupport Vector MachineClassical statistical techniquePopular / Rules / transparencyEmbedded appWide / narrow data / textRegressionMultiple Regression (GLM)Support Vector MachineClassical statistical techniqueWide / narrow data / textOne Class SVMLack examples of target fieldAnomalyDetectionAttributeImportanceMinimum Description Length (MDL)A1 A2 A3 A4 A5 A6 A7AssociationRulesAprioriHierarchical K-MeansHierarchical O-ClusterClusteringFeatureExtraction32Nonnegative Matrix FactorizationF1 F2 F3 F4Copyright 2012, Oracle and/or its affiliates. All rights reserved.Attribute reductionIdentify useful dataReduce data noiseMarket basket analysisLink analysisProduct groupingText miningGene and protein analysisText analysisFeature reduction

Oracle Advanced AnalyticsIn-DB SQLStatisticsSQL Statistics and SQL Analytics (free) Ranking functions– rank, dense rank, cume dist, percent rank, ntile Window Aggregate functions(moving & cumulative)– Avg, sum, min, max, count, variance, stddev,first value, last value LAG/LEAD functions– Direct inter-row reference using offsets Reporting Aggregate functions– Sum, avg, min, max, variance, stddev, count,ratio to report Statistical Aggregates– Correlation, linear regression family, covariance Linear regression– Fitting of an ordinary-least-squares regression lineto a set of number pairs.– Frequently combined with the COVAR POP,COVAR SAMP, and CORR functions Descriptive Statistics– DBMS STAT FUNCS: summarizes numerical columnsof a table and returns count, min, max, range, mean,median, stats mode, variance, standard deviation,quantile values, /- n sigma values, top/bottom 5 values Correlations– Pearson’s correlation coefficients, Spearman's andKendall's (both nonparametric). Cross Tabs– Enhanced with % statistics: chi squared, phi coefficient,Cramer's V, contingency coefficient, Cohen's kappa Hypothesis Testing– Student t-test , F-test, Binomial test, Wilcoxon SignedRanks test, Chi-square, Mann Whitney test, KolmogorovSmirnov test, One-way ANOVA Distribution Fitting– Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, ExponentialNote: Statistics and SQL Analytics are included in Oracle Database Standard Edition and Enterprise Edition33Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Independent Samples T-Test(Pooled Variances) Query compares the mean of AMOUNT SOLD betweenMEN and WOMEN within CUST INCOME LEVEL ranges.Returns observed t value and its related two-sided significanceSELECT substr(cust income level,1,22) income level,avg(decode(cust gender,'M',amount sold,null)) sold to men,avg(decode(cust gender,'F',amount sold,null)) sold to women,stats t test indep(cust gender, amount sold, 'STATISTIC','F')t observed,stats t test indep(cust gender, amount sold) two sided p valueFROM sh.customers c, sh.sales sWHERE c.cust id s.cust idGROUP BY rollup(cust income level)ORDER BY 1;SQL Plus34Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Advanced AnalyticsRR Graphics Direct Access to Database DataR boxplot(split(CARSTATS mpg, CARSTATS model.year), col "green")MPG increasesover time 35Copyright 2012, Oracle and/or its affiliates. All rights reserved.

How Oracle R Enterprise WorksORE Computation EnginesR Oracle R Enterprise tightly integrates R with the database and fullymanages the data operated upon by R code.– The database is always involved in serving up data to the R code.– Oracle R Enterprise runs in the Oracle Database. Oracle R Enterprise eliminates data movement and duplication, maintainssecurity and minimizes latency time from raw data to new information. Three ORE Computation Engines– Oracle R Enterprise provides three different interfaces between the open-source R engineand the Oracle database:1. Oracle R Enterprise (ORE) Transparency Layer2. Oracle Statistics Engine3. Embedded R36Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle Advanced AnalyticsR Enterprise Compute Engines1R Engine32Oracle DatabaseOther RpackagesSQLROracle R Enterprise packagesResultsR EngineRUser tables?xOpen SourceROther RpackagesOracle R Enterprise packagesResultsUser R Engine on desktopDatabase Compute EngineR Engine(s) spawned by Oracle DB R-SQL Transparency Framework intercepts R Scale to large datasetsfunctions for scalable in-database execution Function intercept for data transforms,statistical functions and advanced analytics Interactive display of graphical results and flowcontrol as in standard R Submit entire R scripts for execution bydatabase Access tables, views, and external tables, as Database can spawn multiple R engines fordatabase-managed parallelism Efficient data transfer to spawned R engines Emulate map-reduce style algorithms andapplications Enables “lights-out” execution of R scripts37Copyright 2012, Oracle and/or its affiliates. All rights reserved.well as data throughDB LINKS Leverage database SQL parallelism Leverage new and existing in-databasestatistical and data mining capabilities

Oracle Advanced Analytics ExampleUse of All 3 ORE Engines Within 1 R Script38Copyright 2012, Oracle and/or its affiliates. All rights reserved.

You Can Think of OAA Like This Traditional SQLOracle Advanced Analytics (SQL & R)– “Human-driven” queries– Automated knowledge discovery, model buildingand deployment– Domain expertise– Domain expertise to assemble the “right” data to– Any “rules” must be defined and managedmine/analyze SQL Queries Analytical “Verbs”– SELECT– DISTINCT– AGGREGATE39 – PREDICT– DETECT– CLUSTERR– WHERE– CLASSIFY– AND OR– REGRESS– GROUP BY– PROFILE– ORDER BY– IDENTIFY FACTORS– RANK– ASSOCIATECopyright 2012, Oracle and/or its affiliates. All rights reserved.

Learn MoreSend Charlie.berger@oracle.comemail and I’ll send you my “fav links”1. Link to my latest OOW presentation – Digging for Gold in your DW with OracleAdvanced Analytics Option.2. Take a Free Test Drive of Oracle Advanced Analytics (Oracle Data Miner GUI) on theAmazon Cloud3. Link to ODM Blog entry with YouTube-like recorded of OAA/ODM presentation andseveral "live" demos4. Link to Getting Started w/ ODM blog entry5. Link to New OAA/Oracle Data Mining 2-Day Instructor Led Oracle University course.6. Link to OAA/Oracle Data Mining Oracle by Examples (free) Tutorials on OTN7. Link to OAA/Oracle R Enterprise (free) Tutorial Series on OTN8. Link to SQL Developer Days Virtual Event w/ downloadable Virtual Machine (VM)images of Oracle Database ODM/ODMr and e-training for Hands on Labs9. Main OAA/Oracle Data Mining on OTN page10.Main Oracle Advanced Analytics Option on OTN page11.Main OAA/Oracle R Enterprise page on OTN page & ORE Blog40Copyright 2012, Oracle and/or its affiliates. All rights reserved.

41Copyright 2012, Oracle and/or its affiliates. All rights reserved.

42Copyright 2012, Oracle and/or its affiliates. All rights reserved.

on Oracle Exadata. With Oracle Data Mining, a component of the Oracle Advanced Analytics Option, we can analyze large volumes of customer data and call-data records easier and faster than with any other tool and rapidly detect and combat fraudulent phone use."Deployed Oracle Data Mining models on Oracle Exadata to identify