Concepts Oracle Machine Learning For SQL

Transcription

Oracle Machine Learning for SQLConcepts20cF16384-02May 2020

Oracle Machine Learning for SQL Concepts, 20cF16384-02Copyright 2005, 2020, Oracle and/or its affiliates.Primary Author: Sarika SurampudiContributors: David McDermid, Boriana MilanovaThis software and related documentation are provided under a license agreement containing restrictions onuse and disclosure and are protected by intellectual property laws. Except as expressly permitted in yourlicense agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify,license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means.Reverse engineering, disassembly, or decompilation of this software, unless required by law forinteroperability, is prohibited.The information contained herein is subject to change without notice and is not warranted to be error-free. Ifyou find any errors, please report them to us in writing.If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it onbehalf of the U.S. Government, then the following notice is applicable:U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software,any programs embedded, installed or activated on delivered hardware, and modifications of such programs)and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government endusers are "commercial computer software" or “commercial computer software documentation” pursuant to theapplicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use,reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/oradaptation of i) Oracle programs (including any operating system, integrated software, any programsembedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oraclecomputer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in thelicense contained in the applicable contract. The terms governing the U.S. Government’s use of Oracle cloudservices are defined by the applicable contract for such services. No other rights are granted to the U.S.Government.This software or hardware is developed for general use in a variety of information management applications.It is not developed or intended for use in any inherently dangerous applications, including applications thatmay create a risk of personal injury. If you use this software or hardware in dangerous applications, then youshall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure itssafe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of thissoftware or hardware in dangerous applications.Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks oftheir respective owners.Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks areused under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc,and the AMD logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registeredtrademark of The Open Group.This software or hardware and documentation may provide access to or information about content, products,and services from third parties. Oracle Corporation and its affiliates are not responsible for and expresslydisclaim all warranties of any kind with respect to third-party content, products, and services unless otherwiseset forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not beresponsible for any loss, costs, or damages incurred due to your access to or use of third-party content,products, or services, except as set forth in an applicable agreement between you and Oracle.

ContentsPrefaceTechnology RebrandxivAudiencexivDocumentation AccessibilityxivRelated DocumentationxvConventionsxviChanges in This Release for Oracle Machine Learning for SQLConceptsChanges in Oracle Machine Learning for SQL 20cPart I1xviiIntroductionsWhat Is Machine Learning?1.11.21.3What Is Machine Learning?1-11.1.1Automatic 4Actionable Information1-2Machine Learning and Statistics1-2Oracle Machine Learning and OLAP1-3Oracle Machine Learning and Data Warehousing1-3What Can Machine Learning Do and Not Do?1-31.2.1Asking the Right Questions1-41.2.2Understanding Your Data1-4TheOracle Machine Learning Process1-41.3.1Problem Definition1-51.3.2Data Gathering and Preparation1-51.3.3Model Building and Evaluation1-6iii

1.3.422.1About Oracle Machine Learning for SQL2-12.2Oracle Machine Learning for SQL in the Database Kernel2-12.3Oracle Machine Learning for SQL in Oracle Exadata2-22.4About Partitioned Models2-32.5Interfaces to Oracle Machine Learning for SQL2-32.5.1PL/SQL API2-42.5.2SQL Functions2-42.5.3Oracle Data Miner2-52.5.4Predictive Analytics2-6Overview of Database Analytics2-7Oracle Machine Learning Basics3.1Machine Learning Functions3.1.1Supervised Machine Learning3.33-23.1.1.2Supervised Learning: Scoring3-2Unsupervised Machine Learning3-23-33-43.2.1Oracle Machine Learning Supervised Algorithms3-43.2.2Oracle Machine Learning Unsupervised Algorithms3-5Data Preparation3-63.3.1Oracle Machine Learning for SQL Simplifies Data Preparation3-73.3.2Case Data3-73.3.3Part IIUnsupervised Learning: ScoringAlgorithms3.3.2.13.43-1Supervised Learning: Testing3.1.2.13.23-13.1.1.13.1.241-6Introduction to Oracle Machine Learning for SQL2.63Knowledge DeploymentNested Data3-7Text Data3-8In-Database Scoring3-83.4.1Parallel Execution and Ease of Administration3-83.4.2SQL Functions for Model Apply and Dynamic Scoring3-9Machine Learning FunctionsRegression4.1About Regression4.1.1How Does Regression Work?4-14-2iv

4.24.1.1.1Linear Regression4-24.1.1.2Multivariate Linear Regression4-34.1.1.3Regression Coefficients4-34.1.1.4Nonlinear Regression4-34.1.1.5Multivariate Nonlinear Regression4-44.1.1.6Confidence Bounds4-4Testing a Regression Model4.2.14.35Regression Statistics4-54.2.1.1Root Mean Squared Error4-54.2.1.2Mean Absolute Error4-5Regression Algorithms4-6Classification5.1About Classification5-15.2Testing a Classification Model5-25.2.1Confusion Matrix5-25.2.2Lift5-35.2.2.15.2.35.3Receiver Operating Characteristic (ROC)5-45-4The ROC Curve5-55.2.3.2Area Under the Curve5-55.2.3.3ROC and Model Bias5-55.2.3.4ROC Statistics5-5Biasing a Classification ModelCosts5-65-65.3.1.1Costs Versus Accuracy5-65.3.1.2Positive and Negative Classes5-75.3.1.3Assigning Costs and Benefits5-75.3.25.4Lift Statistics5.2.3.15.3.164-4Priors and Class WeightsClassification Algorithms5-85-9Anomaly Detection6.16.2About Anomaly Detection6-16.1.1One-Class Classification6-16.1.2Anomaly Detection for Single-Class Data6-26.1.3Anomaly Detection for Finding Outliers6-2Anomaly Detection Algorithms6-3v

78Ranking7.1About Ranking7-17.2Ranking Methods7-17.3Ranking Algorithms7-2Clustering8.19About Clustering8.1.1How are Clusters Computed?8-18.1.2Scoring New Data8-28.1.3Hierarchical Clustering8-28.1.3.1Rules8-28.1.3.2Support and Confidence8-28.2Evaluating a Clustering Model8-28.3Clustering Algorithms8-3Association9.110About Association9-19.1.1Association Rules9-19.1.2Market-Basket Analysis9-19.1.3Association Rules and eCommerce9-29.2Transactional Data9-29.3Association Algorithm9-3Feature Selection10.1Finding the Best Attributes10-110.2About Feature Selection and Attribute Importance10-210.2.110.3118-1Attribute Importance and ScoringAlgorithms for Attribute Importance10-210-2Feature Extraction11.1About Feature Extraction11.1.111.2Feature Extraction and ScoringAlgorithms for Feature Extraction11-111-211-2vi

1213Row Importance12.1About Row Importance12-112.2Row Importance Algorithms12-1Time Series13.1About Time Series13-113.2Choosing a Time Series Model13-213.3Time Series Statistics13-213.3.1Conditional Log-Likelihood13-213.3.2Mean Square Error (MSE) and Other Error Measures13-313.3.3Irregular Time Series13-413.3.4Build Apply13-4Time Series Algorithm13-413.4Part III14AlgorithmsApriori14.1About Apriori14-114.2Association Rules and Frequent Itemsets14-214.2.1Antecedent and Consequent14-214.2.2Confidence14-214.3Data Preparation for Apriori14-214.3.1Native Transactional Data and Star Schemas14-214.3.2Items and Collections14-314.3.3Sparse Data14-314.3.4Improved Sampling14-314.3.4.114.4Sampling ImplementationCalculating Association Rules14-414-414.4.1Itemsets14-514.4.2Frequent Itemsets14-514.4.3Example: Calculating Rules from Frequent Itemsets14-614.4.4Aggregates14-814.4.5Example: Calculating Aggregates14-814.4.6Including and Excluding Rules14-914.4.7Performance Impact for Aggregates14-914.5Evaluating Association Rules14.5.1Support14.5.2Minimum Support Count14-914-914-10vii

151614.5.3Confidence14-1014.5.4Reverse Confidence14-1014.5.5Lift14-11CUR Matrix Decomposition15.1About CUR Matrix Decomposition15-115.2Singular Vectors15-115.3Statistical Leverage Score15-215.4Column (Attribute) Selection and Row Selection15-315.5CUR Matrix Decomposition Algorithm Configuration15-3Decision Tree16.1About Decision Tree16.1.1Decision Tree Rules16.1.1.116-1Confidence and Support16-316.1.2Advantages of Decision Trees16-316.1.3XML for Decision Tree Models16-316.21716-1Growing a Decision Tree16-316.2.1Splitting16-416.2.2Cost Matrix16-516.2.3Preventing Over-Fitting16-516.3Tuning the Decision Tree Algorithm16-516.4Data Preparation for Decision Tree16-6Expectation Maximization17.1About Expectation Maximization17-117.1.1Expectation Step and Maximization Step17-117.1.2Probability Density Estimation17-117.2Algorithm Enhancements17-217.2.1Scalability17-317.2.2High Dimensionality17-317.2.3Number of Components17-317.2.4Parameter Initialization17-317.2.5From Components to Clusters17-417.3Configuring the Algorithm17-417.4Data Preparation for Expectation Maximization17-5viii

18Explicit Semantic Analysis18.1About Explicit Semantic Analysis18.1.1Data Preparation for ESA18-218.3Scoring with ESA18-318.4Scoring Large ESA ModelsTerminologies in Explicit Semantic Analysis18-318-3Exponential Smoothing19.1About Exponential Smoothing19-119.1.1Exponential Smoothing Models19-219.1.2Simple Exponential Smoothing19-219.1.3Models with Trend but No Seasonality19-219.1.4Models with Seasonality but No Trend19-319.1.5Models with Trend and Seasonality19-319.1.6Prediction Intervals19-319.22018-218.218.3.119ESA for Text Analysis18-1Data Preparation for Exponential Smoothing Models19-319.2.1Input Data19-419.2.2Accumulation19-419.2.3Missing Value19-519.2.4Prediction19-519.2.5Parallellism by Partition19-6Generalized Linear Model20.1About Generalized Linear Model20-120.2GLM in Oracle Machine Learning for SQL20-220.2.1Interpretability and Transparency20-220.2.2Wide Data20-320.2.3Confidence Bounds20-320.2.4Ridge Regression20-320.320.2.4.1Configuring Ridge Regression20-420.2.4.2Ridge and Confidence Bounds20-420.2.4.3Ridge and Data Preparation20-4Scalable Feature Selection20.3.1Feature Selection20-520-520.3.1.1Configuring Feature Selection20-520.3.1.2Feature Selection and Ridge Regression20-520.3.2Feature Generation20.3.2.1Configuring Feature Generation20-520-5ix

20.4Build nt Statistics20-720.4.2.2Global Model Statistics20-720.4.2.3Row Diagnostics20-720.5GLM Solvers20-820.6Data Preparation for GLM20-820.6.1Data Preparation for Linear Regression20-920.6.2Data Preparation for Logistic Regression20-920.6.3Missing ValuesLinear Regression20-1020-1020.7.1Coefficient Statistics for Linear Regression20-1020.7.2Global Model Statistics for Linear Regression20-1120.7.3Row Diagnostics for Linear Regression20-1220.8Logistic Regression20-1220.8.1Reference Class20-1220.8.2Class Weights20-1220.8.3Coefficient Statistics for Logistic Regression20-1220.8.4Global Model Statistics for Logistic Regression20-1320.8.5Row Diagnostics for Logistic Regression20-13k-Means21.12220-620.4.120.721Tuning and Diagnostics for GLMAbout k-Means21-121.1.1Oracle Machine Learning for SQL Enhanced k-Means21-121.1.2Centroid21-221.2k-Means Algorithm Configuration21-221.3Data Preparation for k-Means21-2Minimum Description Length22.1About MDL22.1.122-1Compression and Entropy22-122.1.1.1Values of a Random Variable: Statistical Distribution22-222.1.1.2Values of a Random Variable: Significant Predictors22-222.1.1.3Total Entropy22-222.1.2Model Size22-222.1.3Model Selection22-222.1.4The MDL Metric22-3x

22.22323.2About Multivariate State Estimation Technique - Sequential Probability RatioTest23-1Score an MSET-SPRT Model23-3Naive Bayes24.1About Naive Bayes24.1.125Advantages of Naive Bayes24-124-324.2Tuning a Naive Bayes Model24-324.3Data Preparation for Naive Bayes24-4Neural Network25.12622-3Multivariate State Estimation Technique - Sequential ProbabilityRatio Test23.124Data Preparation for MDLAbout Neural Network25-125.1.1Neurons and Activation Functions25-225.1.2Loss or Cost function25-225.1.3Forward-Backward Propagation25-225.1.4Optimization ce Check25-325.1.7LBFGS SCALE HESSIAN25-425.1.8NNET HELDASIDE MAX FAIL25-425.2Data Preparation for Neural Network25-425.3Neural Network Algorithm Configuration25-525.4Scoring with Neural Network25-6Non-Negative Matrix Factorization26.1About NMF26-126.1.1Matrix Factorization26-126.1.2Scoring with NMF26-226.1.3Text Analysis with NMF26-226.2Tuning the NMF Algorithm26-226.3Data Preparation for NMF26-3xi

27O-Cluster27.1About O-Cluster27.1.12930Partitioning Numerical Attributes27-227.1.1.2Partitioning Categorical Attributes27-227.1.2Active Sampling27-227.1.3Process Flow27-327.1.4Scoring27-327.2Tuning the O-Cluster Algorithm27-327.3Data Preparation for O-Cluster27-4User-Specified Data Preparation for O-Cluster27-4R Extensibility28.1Oracle Machine Learning for SQL with R Extensibility28-128.2About Algorithm Metadata Registration28-228.3Scoring with R28-2Random Forest29.1About Random Forest29-129.2Building a Random Forest29-229.3Scoring with Random Forest29-2Singular Value Decomposition30.1About Singular Value Decomposition30-130.1.1Matrix Manipulation30-130.1.2Low Rank ing the Algorithm30-330.2.1Model Size30-330.2.2Performance30-330.2.3PCA scoring30-4Data Preparation for SVD30-430.33127-227.1.1.127.3.128Partitioning Strategy27-1Support Vector Machine31.1About Support Vector Machine31-231.1.1Advantages of SVM31-231.1.2Advantages of SVM in Oracle Machine Learning for SQL31-2xii

-Based Learning31-331.1.331.2Tuning an SVM Model31-431.3Data Preparation for SVM31-431.3.1Normalization31-431.3.2SVM and Automatic Data Preparation31-531.4SVM Classification31.4.132Class Weights31-531-531.5One-Class SVM31-631.6SVM Regression31-6XGBoost32.1About XGBoost32-132.2Scoring with XGBoost32-2GlossaryIndexxiii

PrefacePrefaceThis manual describes Oracle Machine Learning for SQL (OML4SQL), acomprehensive, state-of-the-art machine learning capability within Oracle Database,previously known as Oracle Data Mining. This manual presents the concepts thatunderlie the procedural information that is presented in Oracle Machine Learning forSQL User’s Guide.The preface contains these topics: Technology Rebrand Audience Documentation Accessibility Related Documentation ConventionsTechnology RebrandOracle is rebranding the suite of products and components that support machinelearning with Oracle Database and Big Data. This technology is now known as OracleMachine Learning (OML).The OML application programming interfaces (APIs) for SQL include PL/SQLpackages, SQL functions, and data dictionary views. Using these APIs is described inpublications, previously under the name Oracle Data Mining, that are now namedOracle Machine Learning for SQL (OML4SQL).AudienceOracle Machine Learning for SQL Concepts is intended for anyone who wants to learnabout Oracle Machine Learning for SQL.Documentation AccessibilityFor information about Oracle's commitment to accessibility, visit the OracleAccessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx acc&id docacc.Access to Oracle SupportOracle customers that have purchased support have access to electronic supportthrough My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx acc&id info or visit http://www.oracle.com/pls/topic/lookup?ctx acc&id trsif you are hearing impaired.xiv

PrefaceRelated DocumentationRelated documentation for Oracle Machine Learning for SQL (OML4SQL) includespublications and web pages.The following publications document OML4SQL: Oracle Machine Learning for SQL Concepts (this publication) Oracle Machine Learning for SQL User’s Guide Oracle Machine Learning for SQL API GuideNote:Oracle Machine Learning for SQL API Guide combines key passagesfrom Oracle Machine Learning for SQL Concepts and Oracle MachineLearning for SQL User’s Guide with related reference documentationfrom Oracle Database PL/SQL Packages and Types Reference, OracleDatabase Reference, and Oracle Database SQL Language Reference. Oracle Database PL/SQL Packages and Types Reference Oracle Database Reference Oracle Database SQL Language ReferenceFor other information and resources about OML4SQL, see the Oracle MachineLearning for SQL web page.Oracle Machine Learning for SQL Resources on the OracleTechnology NetworkThe Oracle Machine Learning for SQL page on the Oracle Technology Network (OTN)provides a wealth of information, including white papers, demonstrations, blogs,discussion forums, and Oracle By Example tutorials.You can download Oracle Data Miner, the graphical user interface to Oracle MachineLearning for SQL, from this site:Oracle Data MinerApplication Development and Database Administration DocumentationRefer to the documentation to assist you in developing database applications and inadministering Oracle Database. Oracle Database Concepts Oracle Database Administrator’s Guide Oracle Database Development Guide Oracle Database Performance Tuning Guide Oracle Database VLDB and Partitioning Guidexv

PrefaceConventionsThe following text conventions are used in this document:ConventionMeaningboldfaceBoldface type indicates graphical user interface elements associatedwith an action, or terms defined in text or the glossary.italicItalic type indicates book titles, emphasis, or placeholder variables forwhich you supply particular values.monospaceMonospace type indicates commands within a paragraph, URLs, codein examples, text that appears on the screen, or text that you enter.xvi

Changes in This Release for OracleMachine Learning for SQL ConceptsChanges in this release for Oracle Machine Learning for SQL Concepts.Changes in Oracle Machine Learning for SQL 20cChanges in Oracle Machine Learning for SQL Concepts for Oracle Database 20c.New Features in 20cOracle Machine Learning for SQL features new in Oracle Database 20c.New Algorithms MSET-SPRTThe Multivariate State Estimation Technique - Sequential Probability Ratio Test(MSET-SPRT) algorithm is a nonlinear, nonparametric anomaly detection machinelearning technique designed for monitoring critical processes. It detects subtleanomalies while also producing minimal false alarms.The algorithm calibrates an expected behavior from available, historical data fromthe normal operational sequence of monitored signals. The learned behavior of thesystem is then incorporated into a persistent Oracle Machine Learning for SQLMSET-SPRT model that captures expected normal behavior and can be applied tonew records to detect anomalous behaviors. XGBoostXGBoost is machine learning algorithm for regression and classification thatmakes available the XGBoost open source package. Oracle Machine Learning forSQL XGBoost prepares training data, invokes XGBoost, builds and persists amodel, and applies the model for prediction.New Algorithm SettingAdam Optimization SolverAdam is an extension to stochastic gradient descent that uses mini-batch optimization.The Adam solver can make progress faster by seeing less data than the L-BFGSsolver. Adam is computationally efficient, with little memory requirements, and is wellsuited for problems that are large in terms of data or parameters or both.EnhancementsNeural Network Algorithm Settingsxvii

Changes in This Release for Oracle Machine Learning for SQL ConceptsThe Neural Network algorithm setting NNET ACTIVATIONS now accepts the valueNNET ACTIVATIONS RELU. Rectified Linear Units is a commonly used activation functionfor deep learning models that addresses the vanishing gradient problem in largeneural networks.The algorithm has a new setting, DMSSET NN SOLVER, that specifies the method ofoptimization, either L-BFGS or Adam.For the NNET NODES PER LAYER and NNET ACTIVATIONS settings, you can now specify asingle value that is then applied to each hidden layer.The NNET ITERATIONS setting has a default value for the LBFGS solver and now has adefault value for the Adam solver. The default values are different for each solver.xviii

Part IIntroductionsIntroduces Oracle Machine Learning for SQL.Provides a high-level overview for those who are new to OML4SQL technology. What Is Machine Learning? Introduction to Oracle Machine Learning for SQL Oracle Machine Learning Basics

1What Is Machine Learning?Orientation to machine learning technology. What Is Machine Learning? What Can Machine Learning Do and Not Do? TheOracle Machine Learning ProcessNote:Information about machine learning is widely available. No matter what yourlevel of expertise, you can find helpful books and articles on machinelearning.Related Topics https://en.wikipedia.org/wiki/Machine learning1.1 What Is Machine Learning?Learn about machine learning.Machine learning is the subset of artificial intelligence (AI) that focuses on buildingsystems that learn or improve performance based on the data they consume. Machinelearning and AI are often discussed together. An important distinction is that althoughall machine learning is AI, not all AI is machine learning. Machine learning is atechnique that discovers previously unknown relationships in data. Machine learningautomatically searches large stores of data to discover patterns and trends that gobeyond simple analysis. Machine learning uses sophisticated mathematical algorithmsto segment the data and to predict the likelihood of future events based on pastevents.The key properties of machine learning are: Automatic discovery of patterns Prediction of likely outcomes Creation of actionable information Focus on large data sets and databasesMachine learning can answer questions that cannot be addressed through simplequery and reporting techniques.1-1

Chapter 1What Is Machine Learning?1.1.1 Automatic DiscoveryMachine learning is performed by a model that uses an algorithm to act on a set ofdata.Machine learning models can be used to mine the data on which they are built, butmost types of models are generalizable to new data. The process of applying a modelto new data is known as scoring.1.1.2 PredictionMany forms of machine learning are predictive. For example, a model can predictincome based on education and other demographic factors. Predictions have anassociated probability (How likely is this prediction to be true?). Prediction probabilitiesare also known as confidence (How confident can I be of this prediction?).Some forms of predictive machine learning generate rules, which are conditions thatimply a given outcome. For example, a rule can specify that a person who has abachelor's degree and lives in a certain neighborhood is likely to have an incomegreater than the regional average. Rules have an associated support (Whatpercentage of the population satisfies the rule?).1.1.3 GroupingOther forms of machine learning identify natural groupings in the data. For example, amodel might identify the segment of the population that has an income within aspecified range, that has a good driving record, and that leases a new car on a yearlybasis.1.1.4 Actionable InformationMachine learning can derive actionable information from large volumes of data. Forexample, a town planner might use a model that predicts income based ondemographics to develop a plan for low-income housing. A car leasing agency mightuse a model that identifies customer segments to design a promotion targeting highvalue customers.Machine Learning and StatisticsThere is a great deal of overlap between machine learning and statistics. In fact mostof the techniques used in machine learning can be placed in a statistical framework.However, machine learning techniques are not the same as traditional statisticaltechniques.Statistical models usually make strong assumptions about the data and, based onthose assumptions, they make strong statements about the results. However, if theassumptions are flawed, the validity of the model becomes questionable. By contrast,the machine learning methods typically make weak assumptions about the data. As aresult, machine learning cannot generally make such strong statements about theresults. Yet machine learning can produce very good results regardless of the data.Traditional statistical methods, in general, require a great deal of user interaction inorder to validate the correctness of a model. As a result, statistical methods can be1-2

Chapter 1What Can Machine Learning Do and Not Do?difficult to automate. Statistical methods rely on testing hypotheses or findingcorrelations based on smaller, representative samples of a larger population.Less user interaction and less knowledge of the data is required for machine learning.The user does not need to massage the data to guarantee that a method is valid for agiven data set. Oracle Machine Learning techniques are easier to automate thantraditional statistical techniques.Oracle Machine Learning and OLAPOn-Line Analytical Processing (OLAP) can be defined as fast analysis ofmultidimensional data. OLAP and Oracle Machine Learning are different butcomplementary activities.OLAP supports activities such as data summarization, cost allocation, time seriesanalysis, and what-if analysis. However, most OLAP systems do not have inductiveinference capabilities beyond the support for time-series forecast. Inductive inference,the process of reaching a general conclusion from specific examples, is acharacteristic of machine learning. Inductive inference is also known as computationallearning.OLAP systems provide a multidimensional view of the data, including full support forhierarchies. This view of the data is a natural way to analyze businesses andorganizations.Oracle Machine Learning and OLAP can be integrated in a number of ways. OLAPcan be used to analyze machine learning results at different levels of granularity.Machine learning can help you construct more interesting and useful cubes. Forexample, the results of predictive machine learning can be added as custom measuresto a cube. Such measures can provide information such as "likely to default" or "likelyto buy" for each customer. OLAP processing can then aggregate and summarize theprobabilities.Oracle Machine Learning and Data WarehousingData can be mined whether it is stored in flat files, spreadsheets, database tables, orsome other storage format. The important criteria for the data is not the storageformat, but its applicability to the problem to be solved.Proper data cleansing and preparation are very important for machine learning, and adata warehouse can facilitate these activities. However, a data warehouse is of no useif it does not contain the data you need to solve your problem.1.2 What Can Machine Learning Do and Not Do?Machine learning is a powerful tool that can help you find patterns and relationshipswithin your data. But machine learning does not work by itself. It does not eliminate theneed to know your business, to understand your data, or to understand analyticalmethods. Machine learning discovers hidden information in your data, but it cannot tellyou the value of the information to your organization.You might already be aware of important patterns as a result of working with your dataover time. Machine learning can confirm or qualify such empirical observations inaddition to finding new patterns that are not immediately discernible through simpleobservation.1-3

Chapter 1TheOracle Machine Learning ProcessIt is important to remember that the predictive relationships discovered throughmachine learning are not causal relationships. For example, machine learning mightdetermine that males with incomes between 50,000 and 65,000 who subscribe tocertain magazines are likely to buy a given product. You can use this information tohelp you develop a marketing strategy. However, you must not assume that thepopulation identified through machine learning buys the product because they belongto this population.Machine learning yields probabilities, not exact answers. It is important to keep in mindthat rare events can happen; they just do not happen very often.1.2.1 Asking the Right QuestionsMachine learning does not automatically discover information without guidance. Thepatterns you find through machine learning are very different depending on how youformulate the problem.To obtain meaningful results, you must learn how to ask the right questions. Forexample, rather than trying to learn how to "improve the response to a direct mailsolicitation," you might try to find the characteristics of people who have responded toyour solicitations in the past.1.2.2 Understanding Your DataTo ensure meaningful machine learning results, you must understand your data.Machine learning algorithms are often sensitive to specific characteristics of the data:outliers (data values that are very different from the typical values in your database),irrelevant co

2 Introduction to Oracle Machine Learning for SQL 2.1 About Oracle Machine Learning for SQL 2-1 2.2 Oracle Machine Learning for SQL in the Database Kernel 2-1 2.3 Oracle Machine Learning for SQL in Oracle Exadata 2-2 2.4 About Partitioned Models 2-3 2.5 Interfaces to Oracle Machine Learning for SQL 2-3 2.5.1 PL/SQL API 2-4 2.5.2 SQL Functions 2-4