IBM SPSS Modeler 15 Applications Guide

Transcription

iIBM SPSS Modeler 15 ApplicationsGuide

Note: Before using this information and the product it supports, read the general informationunder Notices on p. 385.This edition applies to IBM SPSS Modeler 15 and to all subsequent releases and modificationsuntil otherwise indicated in new editions.Adobe product screenshot(s) reprinted with permission from Adobe Systems Incorporated.Microsoft product screenshot(s) reprinted with permission from Microsoft Corporation.Licensed Materials - Property of IBM Copyright IBM Corporation 1994, 2012.U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADPSchedule Contract with IBM Corp.

PrefaceIBM SPSS Modeler is the IBM Corp. enterprise-strength data mining workbench. SPSSModeler helps organizations to improve customer and citizen relationships through an in-depthunderstanding of data. Organizations use the insight gained from SPSS Modeler to retainprofitable customers, identify cross-selling opportunities, attract new customers, detect fraud,reduce risk, and improve government service delivery.SPSS Modeler’s visual interface invites users to apply their specific business expertise, whichleads to more powerful predictive models and shortens time-to-solution. SPSS Modeler offersmany modeling techniques, such as prediction, classification, segmentation, and associationdetection algorithms. Once models are created, IBM SPSS Modeler Solution Publisherenables their delivery enterprise-wide to decision makers or to a database.About IBM Business AnalyticsIBM Business Analytics software delivers complete, consistent and accurate information thatdecision-makers trust to improve business performance. A comprehensive portfolio of businessintelligence, predictive analytics, financial performance and strategy management, and analyticapplications provides clear, immediate and actionable insights into current performance and theability to predict future outcomes. Combined with rich industry solutions, proven practices andprofessional services, organizations of every size can drive the highest productivity, confidentlyautomate decisions and deliver better results.As part of this portfolio, IBM SPSS Predictive Analytics software helps organizations predictfuture events and proactively act upon that insight to drive better business outcomes. Commercial,government and academic customers worldwide rely on IBM SPSS technology as a competitiveadvantage in attracting, retaining and growing customers, while reducing fraud and mitigatingrisk. By incorporating IBM SPSS software into their daily operations, organizations becomepredictive enterprises – able to direct and automate decisions to meet business goals and achievemeasurable competitive advantage. For further information or to reach a representative visithttp://www.ibm.com/spss.Technical supportTechnical support is available to maintenance customers. Customers may contact TechnicalSupport for assistance in using IBM Corp. products or for installation help for one of thesupported hardware environments. To reach Technical Support, see the IBM Corp. web siteat http://www.ibm.com/support. Be prepared to identify yourself, your organization, and yoursupport agreement when requesting assistance. Copyright IBM Corporation 1994, 2012.iii

Contents1About IBM SPSS Modeler1IBM SPSS Modeler Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1IBM SPSS Modeler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IBM SPSS Modeler Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IBM SPSS Modeler Administration Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IBM SPSS Modeler Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IBM SPSS Modeler Solution Publisher. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IBM SPSS Modeler Server Adapters for IBM SPSS Collaboration and Deployment Services .IBM SPSS Modeler Editions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1222223IBM SPSS Modeler Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4SPSS Modeler Professional Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4SPSS Modeler Premium Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Demos Folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Part I: Introduction and Getting Started2IBM SPSS Modeler Overview8Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Starting IBM SPSS Modeler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Launching from the Command Line . . . . . . . . . . .Connecting to IBM SPSS Modeler Server . . . . . .Changing the Temp Directory . . . . . . . . . . . . . . . .Starting Multiple IBM SPSS Modeler Sessions . .IBM SPSS Modeler Interface at a Glance . . . . . . . . . .99121313IBM SPSS Modeler Stream Canvas . . . . . . . .Nodes Palette . . . . . . . . . . . . . . . . . . . . . . . .IBM SPSS Modeler Managers . . . . . . . . . . . .IBM SPSS Modeler Projects . . . . . . . . . . . . .IBM SPSS Modeler Toolbar . . . . . . . . . . . . . .Customizing the Toolbar . . . . . . . . . . . . . . . . .Customizing the IBM SPSS Modeler Window.Changing the icon size for a stream . . . . . . . .Using the Mouse in IBM SPSS Modeler . . . . .Using Shortcut Keys . . . . . . . . . . . . . . . . . . .Printing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1414151617191920222223.Automating IBM SPSS Modeler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Copyright IBM Corporation 1994, 2012.iv

3Introduction to Modeling25Building the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Browsing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Evaluating the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Scoring Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414Automated Modeling for a Flag Target42Modeling Customer Response (Auto Classifier). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Historical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Building the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Generating and Comparing Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535Automated Modeling for a Continuous Target54Property Values (Auto Numeric) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Building the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Comparing the Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Part II: Data Preparation Examples6Automated Data Preparation (ADP)63Building the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Comparing Model Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697Preparing Data for Analysis (Data Audit)72Building the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72v

Browsing Statistics and Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Handling Outliers and Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798Drug Treatments (Exploratory Graphs/C5.0)84Reading in Text Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84Adding a Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Creating a Distribution Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Creating a Scatterplot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Creating a Web Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Deriving a New Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Building a Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Browsing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Using an Analysis Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1019Screening Predictors (Feature Selection)103Building the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Building the Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106Comparing the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10910 Reducing Input Data String Length (Reclassify Node)110Reducing Input Data String Length (Reclassify). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110Reclassifying the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110Part III: Modeling Examples11 Modeling Customer Response (Decision List)116Historical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117Building the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118Creating the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121vi

Calculating Custom Measures Using Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134Modifying the Excel template. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140Saving the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14312 Classifying Telecommunications Customers (MultinomialLogistic Regression)144Building the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145Browsing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14913 Telecommunications Churn (Binomial Logistic Regression) 154Building the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154Browsing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16214 Forecasting Bandwidth Utilization (Time Series)168Forecasting with the Time Series Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168Creating the Stream. . . . . . . .Examining the Data . . . . . . . .Defining the Dates . . . . . . . . .Defining the Targets. . . . . . . .Setting the Time Intervals . . .Creating the Model . . . . . . . .Examining the Model . . . . . . .Summary . . . . . . . . . . . . . . . .Reapplying a Time Series Model . .169170174176177179181190190Retrieving the Stream . . . . . .Retrieving the Saved Model . .Generating a Modeling Node .Generating a New Model. . . .Examining the New Model . . .Summary . . . . . . . . . . . . . . . .191193194195196198vii

15 Forecasting Catalog Sales (Time Series)199Creating the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199Examining the Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21416 Making Offers to Customers (Self-Learning)215Building the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216Browsing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22217 Predicting Loan Defaulters (Bayesian Network)227Building the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227Browsing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23218 Retraining a Model on a Monthly Basis (Bayesian Network)237Building the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237Evaluating the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24119 Retail Sales Promotion (Neural Net/C&RT)249Examining the Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249Learning and Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25220 Condition Monitoring (Neural Net/C5.0)254Examining the Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258viii

21 Classifying Telecommunications Customers (DiscriminantAnalysis)260Creating the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260Examining the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265Stepwise Discriminant Analysis . . . . . . . . . . . . . .A Note of Caution Concerning Stepwise MethodsChecking Model Fit . . . . . . . . . . . . . . . . . . . . . . .Structure Matrix . . . . . . . . . . . . . . . . . . . . . . . . .Territorial Map. . . . . . . . . . . . . . . . . . . . . . . . . . .Classification Results. . . . . . . . . . . . . . . . . . . . . .Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26726826826927027127122 Analyzing Interval-Censored Survival Data (Generalized LinearModels)272Creating the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272Tests of Model Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278Fitting the Treatment-Only Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280Predicted Recurrence and Survival Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281Modeling the Recurrence Probability by Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285Tests of Model Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291Fitting the Reduced Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293Predicted Recurrence and Survival Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29823 Using Poisson Regression to Analyze Ship Damage Rates(Generalized Linear Models)300Fitting an “Overdispersed” Poisson Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300Goodness-of-Fit Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305Omnibus Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305Tests of Model Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306Fitting Alternative Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307ix

Goodness-of-Fit Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31124 Fitting a Gamma Regression to Car Insurance Claims(Generalized Linear Models)312Creating the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31625 Classifying Cell Samples (SVM)317Creating the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318Examining the Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323Trying a Different Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325Comparing the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32826 Using Cox Regression to Model Customer Time to Churn329Building a Suitable Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329Censored Cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Categorical Variable Codings . . . . . . . . . . . . . . . . . . .Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . .Covariate Means . . . . . . . . . . . . . . . . . . . . . . . . . . . .Survival Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Hazard Curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Tracking the Expected Number of Customers Retained . . .333334335338339340341346Scoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36527 Market Basket Analysis (Rule Induction/C5.0)366Accessing the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366Discovering Affinities in Basket Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368x

Profiling the Customer Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37228 Assessing New Vehicle Offerings (KNN)373Creating the Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374Examining the Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379Predictor Space. . . . . . . . . . .Peers Chart . . . . . . . . . . . . . .Neighbor and Distance Table .Summary . . . . . . . . . . . . . . . . . . .380381384384AppendixA Notices385Bibliography388Index389xi

ChapterAbout IBM SPSS Modeler1IBM SPSS Modeler is a set of data mining tools that enable you to quickly develop predictivemodels using business expertise and deploy them into business operations to improve decisionmaking. Designed around the industry-standard CRISP-DM model, SPSS Modeler supports theentire data mining process, from data to better business results.SPSS Modeler offers a variety of modeling methods taken from machine learning, artificialintelligence, and statistics. The methods available on the Modeling palette allow you to derivenew information from your data and to develop predictive models. Each method has certainstrengths and is best suited for particular types of problems.SPSS Modeler can be purchased as a standalone product, or used as a client incombination with SPSS Modeler Server. A number of additional options are alsoavailable, as summarized in the following sections. For more information, ucts/modeler/.IBM SPSS Modeler ProductsThe IBM SPSS Modeler family of products and associated software comprises the following. IBM SPSS Modeler IBM SPSS Modeler Server IBM SPSS Modeler Administration Console IBM SPSS Modeler Batch IBM SPSS Modeler Solution Publisher IBM SPSS Modeler Server adapters for IBM SPSS Collaboration and Deployment ServicesIBM SPSS ModelerSPSS Modeler is a functionally complete version of the product that you install and run on yourpersonal computer. You can run SPSS Modeler in local mode as a standalone product, or use itin distributed mode along with IBM SPSS Modeler Server for improved performance onlarge data sets.With SPSS Modeler, you can build accurate predictive models quickly and intuitively, withoutprogramming. Using the unique visual interface, you can easily visualize the data mining process.With the support of the advanced analytics embedded in the product, you can discover previouslyhidden patterns and trends in your data. You can model outcomes and understand the factors thatinfluence them, enabling you to take advantage of business opportunities and mitigate risks.SPSS Modeler is available in two editions: SPSS Modeler Professional and SPSS ModelerPremium. For more information, see the topic IBM SPSS Modeler Editions on p. 3. Copyright IBM Corporation 1994, 2012.1

2Chapter 1IBM SPSS Modeler ServerSPSS Modeler uses a client/server architecture to distribute requests for resource-intensiveoperations to powerful server software, resulting in faster performance on larger data sets.SPSS Modeler Server is a separately-licensed product that runs continually in distributed analysismode on a server host in conjunction with one or more IBM SPSS Modeler installations.In this way, SPSS Modeler Server provides superior performance on large data sets becausememory-intensive operations can be done on the server without downloading data to the clientcomputer. IBM SPSS Modeler Server also provides support for SQL optimization andin-database modeling capabilities, delivering further benefits in performance and automation.IBM SPSS Modeler Administration ConsoleThe Modeler Administration Console is a graphical application for managing many of the SPSSModeler Server configuration options, which are also configurable by means of an options file.The application provides a console user interface to monitor and configure your SPSS ModelerServer installations, and is available free-of-charge to current SPSS Modeler Server customers.The application can be installed only on Windows computers; however, it can administer a serverinstalled on any supported platform.IBM SPSS Modeler BatchWhile data mining is usually an interactive process, it is also possible to run SPSS Modelerfrom a command line, without the need for the graphical user interface. For example, you mighthave long-running or repetitive tasks that you want to perform with no user intervention. SPSSModeler Batch is a special version of the product that provides support for the complete analyticalcapabilities of SPSS Modeler without access to the regular user interface. An SPSS ModelerServer license is required to use SPSS Modeler Batch.IBM SPSS Modeler Solution PublisherSPSS Modeler Solution Publisher is a tool that enables you to create a packaged version of anSPSS Modeler stream that can be run by an external runtime engine or embedded in an externalapplication. In this way, you can publish and deploy complete SPSS Modeler streams for use inenvironments that do not have SPSS Modeler installed. SPSS Modeler Solution Publishe

SPSS Modeler Server is a separately-licensed product that runs continually in distributed analysis mode on a server host in conjunction with one or more IBM SPSS Modeler installations. In this way, SPSS Modeler Server provides superior performance on large data sets because