SAS Enterprise Miner

Transcription

FACT SHEETSAS Enterprise Miner Create highly accurate analytical models that enable youto predict with confidenceWhat does SAS Enterprise Miner do?It streamlines the data mining process so you can create accurate predictive and descriptive analytical models using vast amounts of data. Our customers use this software to detect fraud, minimize risk, anticipate resource demands, reduce asset downtime, increaseresponse rates for marketing campaigns, curb customer attrition and more.Why is SAS Enterprise Miner important?It offers state-of-the-art predictive analytics and data mining capabilities that enableorganizations to analyze complex data, find useful insights and act confidently to makefact-based decisions.For whom is SAS Enterprise Miner designed?It’s designed for advanced analytical professionals who need to analyze increasingvolumes of data to identify and solve critical business or research issues – and help othersmake well-informed decisions.Turning increasing volumes of data intouseful information is a challenge for mostorganizations. Relationships and answersthat identify key opportunities lie buriedsomewhere in all of that data. Which customers will purchase whatproducts and when? Which customers are leaving and whatcan be done to retain them? How should insurance rates be set toensure profitability? How can you predict failures, reduceunnecessary maintenance and increaseuptime to optimize asset performance?To get answers to complex questions andgain an edge in today’s competitivemarket, you need powerful advancedanalytical solutions. Discovering previously unknown patterns can help decisionmakers across your enterprise createeffective strategies. Those who choose toimplement SAS data mining into theirbusiness processes will be able to staycompetitive and achieve success intoday’s fast-moving markets.Benefits Understand key relationships anddevelop models intuitively and quickly.The GUI makes it easy to interact withinformation at any point in the modelingcycle. Both advanced analytical professionals and business analysts use acommon, easy-to-interpret visual view ofthe data mining process and can collaborate to solve tough challenges. Build better models more efficiently witha versatile data mining workbench. Aninteractive self-documenting processflow diagram environment shortensmodel development time. It efficientlymaps the data mining process to producethe best possible results. Easily derive insights in a self-sufficientand automated manner. The SAS RapidPredictive Modeler enables businessanalysts and subject-matter experts withlimited statistical skills to automaticallygenerate models and act on themquickly. Analytical results are provided ineasy-to-understand charts for improveddecision making. Ensure the right decisions are made andbest actions are taken. Better-performingmodels enhance the stability and accuracyof predictions, which can be verified easilyby visual model assessment and validationmetrics. Model profiling helps you understand how predictor variables contribute tooutcomes. Ease model deployment and scoringprocesses for faster results. SAS EnterpriseMiner automates the tedious process ofscoring new data and provides completescoring code for all stages of model development. Deploy scoring code in real-timeor batch environments, saving time andhelping you achieve accurate results. Reap the benefits of a unified SAS Platform.Combining the power of SAS 9 and SASViya (our latest update to the SASPlatform) adds value to any analyticalendeavor. Submit and execute SAS Viyacode from within SAS Enterprise Miner forextremely fast responses to large, complexproblems. Then, compare SAS 9 and SASViya algorithms to determine the bestmethod and create the most accuratepredictions.

Product OverviewAny organization can benefit from usinganalytics in a secure and scalable manner.But doing so requires collaboration, andit calls for a powerful, multipurpose datamining solution that can be tailored tomeet a variety of needs.An analytical approach that works on onedata collection may not perform well withother data sources, or it may not besuited to answer new business questions.That’s why having a wide selection ofhigh-performance analysis tools is crucial.Different tools produce different models,and only when you compare models sideby side can you see which data miningapproach produces the best “fit.” If youstart with a limited analytical workbench(e.g., only regression models or onlydecision trees), the end result could likelybe a model with limited predictive value.SAS Enterprise Miner is delivered as adistributed client/server system. Thisprovides an optimized architecture sodata miners and business analysts canwork more quickly to create accuratepredictive and descriptive models, andproduce results that can be shared andincorporated into business processes.To enhance the data mining process, thissoftware is designed to work seamlessly withother SAS technologies, such as data integration, forecasting and reporting.An integrated, complete viewof your dataData mining is most effective when it ispart of an integrated information deliverystrategy – one that includes data gatheredfrom hugely diverse enterprise sources.Call center logs, survey results, customerfeedback forms, web data, time series dataand transactional point-of-sale data can allbe combined and analyzed with the industry’s most sophisticated data miningpackage. Adding SAS Text Miner lets youanalyze structured and unstructured datatogether for more accurate and completeresults.Easy-to-use GUIAn easy-to-use, drag-and-drop interface isdesigned to appeal to analytical professionals. The advanced analytical algorithmsare organized under core tasks that areperformed in any successful data miningendeavor. The SAS data mining processencompasses five primary steps: sampling,exploration, modification, modeling andassessment (SEMMA). In each step, youperform an array of actions as the datamining project develops. By deployingnodes from the SEMMA toolbar, you canapply advanced statistics, identify the mostsignificant variables, transform dataelements with expression builders,develop models to predict outcomes,validate accuracy and generate a scoreddata set with predicted values to deployinto your operational applications.A quick, easy and self-sufficientway to generate modelsSAS Rapid Predictive Modeler automatically steps nontechnical users through aworkflow of data mining tasks (e.g., transforming data, selecting variables, fitting avariety of algorithms and assessingmodels) to quickly generate predictivemodels for a wide range of businessproblems. SAS Rapid Predictive Modeler isa SAS Enterprise Guide or SAS Add-In forMicrosoft Office (Microsoft Excel only) taskand uses prebuilt SAS Enterprise Minermodeling steps. A collaborative approachallows models developed using SAS RapidPredictive Modeler to be customized byadvanced analytical professionals usingSAS Enterprise Miner.Both classic and modernmodeling techniquesSAS Enterprise Miner provides superioranalytical depth with a suite of statistical,data mining and machine-learning algorithms. Decision trees, bagging andboosting, time series data mining, neuralnetworks, memory-based reasoning, hierarchical clustering, linear and logisticregression, associations, sequence andweb path analysis are all included. Andmore. The breadth of analytical algorithmsextends to industry-specific algorithmssuch as credit scoring, and state-of-the-artmethods such as gradient boosting andleast angular regression splines.Sophisticated data preparation,summarization and explorationFigure 1: Perform principal component analysis for dimension reduction, a frequentintermediate step in the data mining process.Preparing data is a time-consuming aspectof all data mining endeavors. A powerful setof interactive data preparation tools is available for addressing missing values, filteringoutliers and developing segmentationrules. Core data preparation tools includefile importing and appending, and mergingand dropping variables. Extensive descriptive summarization features and interactiveexploration tools enable even noviceusers examine large amounts of data in

dynamically linked, multidimensional plots.This produces quality data mining resultstailored and optimally suited to specificbusiness problems.Business-based model comparisons, reporting and managementAssessment features let you comparemodels to identify the ones that producethe best lift and overall ROI. Models generated with different algorithms can be evaluated consistently using a highly visualassessment interface. Data miners candiscuss results with business domainexperts for improved collaboration andbetter results. An innovative Cutoff nodeexamines posterior probability distributions to define the optimal actions forsolving the business problem at hand.Extensible design providesflexibilityThe customizable environment of SASEnterprise Miner provides the ability toadd tools and include personalized SAScode. Existing SAS models developedoutside of the SAS Enterprise Miner environment can be integrated easily into theprocess flow environment while maintainingfull control of each syntax statement. TheExtension node includes interactive editorfeatures for training and score codes. Userscan edit and submit code interactivelywhile viewing the log and output listings.Default selection lists can be extendedwith custom-developed tools written withSAS code or XML logic, which opens theentire world of SAS to data miners.Open source integrationYou can now easily integrate R and Pythoncode inside of a SAS Enterprise Minerprocess flow diagram. This enables you toperform data transformation and exploration, as well as train and score supervisedand unsupervised models in otherprogramming languages. You can thenintegrate the results, assess your R orPython models and compare them tomodels generated by SAS EnterpriseMiner. R is invoked in the Open SourceIntegration node. Python is called in theSAS Code node.Take advantage of SAS Viya for faster resultsComplement your SAS Enterprise Minerenvironment with SAS Viya, which bringshigh availability, faster in-memoryprocessing, image data types and nativecloud support to the SAS Platform. TheSAS Viya Code node enables you tosubmit and execute SAS Viya code directlyin a SAS Enterprise Miner process flow.With a look and feel that is similar to thetraditional SAS Code node, SAS EnterpriseMiner users should find the SAS Viya Codenode easy to use.In-database and in-Hadoopscoring delivers faster resultsScoring is the process of regularlyapplying a model to new data for implementation into an operational environment. This can be tedious, especiallywhen it entails manually rewriting orconverting code, which delays modelimplementation and can introduce potentially costly errors. SAS Enterprise Minerautomatically generates score code inSAS, C, Java and PMML. The scoring codecan be deployed in a variety of real-timeor batch environments within SAS, on theweb, or directly in relational databases orHadoop.Combined with a SAS Scoring Accelerator(available for Hadoop, Pivotal Greenplum,DB2, IBM Netezza, Oracle, Teradata andSAS Scalable Performance Data Server),SAS Enterprise Miner models can bepublished as database-specific scoringfunctions for execution directly in thedatabase. Results can be passed to otherSAS solutions for deployment of datamining results into real-time operationalenvironments.Score data on complex algorithms such asforest and support vector machine modelsmore quickly using the SAS Scoring Accelerator. Score code generated withinprocess flows that include forest or supportvector machine models will be included inthe Score node and Score Export nodeusing the SAS ASTORE format for fasterin-memory scoring.Parallelized grid-enabledworkbenchScale from a single-user system to verylarge enterprise solutions with the Javaclient and SAS server architecture. Powerfulservers can be dedicated to computing,while users move from office to home toremote sites without losing access tomining projects or services. Many processintensive tasks, such as data sorting,summarization, variable selection andregression modeling, are multithreaded,and processes can be run in parallel fordistribution and workload balancingacross a grid of servers or scheduled forbatch processing.Distributable data miningsystem suited for enterprisesSAS Enterprise Miner is deployable via athin-client web portal for distribution tomultiple users with minimal maintenanceof the clients. Alternatively, the completesystem can be configured on a standalonePC. SAS Enterprise Miner supportsWindows servers and UNIX platforms,making it the software of choice for organizations with large-scale data miningprojects. Model result packages can becreated and registered to the SASMetadata Server for promotion to SASModel Manager, SAS Data IntegrationStudio (a component of SAS Data Integration) and SAS Enterprise Guide.High-performance data miningA select set of high-performance datamining nodes is included in SAS EnterpriseMiner. Multithreaded procedures executeconcurrently and take advantage of all available cores on your existing symmetric multiprocessing (SMP) server to speed upprocessing. Depending on the data andcomplexity of analysis, users may findperformance gains in a single-machine SMPmode.

Key FeaturesIntuitive interfaces Easy-to-use GUI for building processflow diagrams: Build more, better models faster. Deliverable via the web. Access the SAS programmingenvironment. Provides XML diagram exchange. Reuse diagrams as templates forother projects or users. Directly load a specific datamining project or diagram, orchoose from a Project Navigatortree that contains the most recentprojects or diagrams. Batch processing (program development interface): Encapsulates all features of theGUI. SAS macro based. Embed training and scoringprocesses into customizedapplications.Scalable processing Server-based processing. Grid computing, in-database anddistributed in-memory processingoptions. Asynchronous model training. Ability to stop processing cleanly. Parallel processing – run multipletools and diagrams concurrently. Multithreaded predictive algorithms. All storage located on servers.Accessing and managing data Access and integrate structured andunstructured data sources, includingtime series data, market baskets,web paths and survey data as candidate predictors. File Import node for easy access toMicrosoft Excel, comma-delimitedfiles, SAS and other common fileformats. Support for variables with specialcharacters. SAS Library Explorer and LibraryAssignment wizard.Figure 2: Within the SAS Enterprise Miner GUI, the process flow diagram is a self-documenting template that can be easily updated or applied to new problems and sharedwith modelers or other analysts. Enhanced Explorer window toquickly locate and view table listingsor develop a plot using interactivegraph components. Drop Variables node. Merge Data node. Append node. Filter outliers: Apply various distributionalthresholds to eliminate extremeinterval values. Combine class values with fewerthan n occurrences. Interactively filter class andnumeric values. Metadata node for modifyingcolumns metadata such as role,measurement level and order. Integrated with SAS Data IntegrationStudio, SAS Enterprise Guide, SASModel Manager and SAS Add-Infor Microsoft Office through SASMetadata Server: Build training tables for mining. Deploy scoring code.Sampling Simple random. Stratified. Weighted.Cluster.Systematic.First N.Rare event sampling.Stratified and event-level sampling inTeradata 13.Data partitioning Create training, validation and testdata sets. Ensure good generalization of yourmodels through use of holdout data. Default stratification by the classtarget. Balanced partitioning by any classvariable. Output SAS tables or views.Transformations Simple: log, log 10, square root, inverse,square, exponential and standardized. Binning: bucketed, quantile and optimalbinning for relationship to target. Best power: maximize normality,maximize correlation with target andequalize spread with target levels. Interactions editor: define polynomialand nth degree interaction effects.

Interactively define transformations: Define customized transformations using the Expression Builderor SAS code editor. Compare the distribution of the newvariable with the original variable. Predefine global transformation codefor reuse.Interactive variable binning Quantile or bucket. Gini variable selection. Handle missing values as separategroup. Fine and coarse classing detail. Profile bins by target. Modify groups interactively. Save binning definitions. Distribution plots. Statistics breakdown for each levelof the class target.Bivariate statistics and plots: Ordered Pearson and Spearmancorrelation plot. Ordered chi-square plot withoption for binning continuousinputs into nbins. Coefficient of variation plot.Variable selection by logworth.Other interactive plots: Variable worth plot ranking inputsbased on their worth with thetarget. Class variable distributions acrossthe target and/or the segmentvariable.Scaled mean deviation plots.Graphs/visualization Batch and interactive plots: scatter,matrix, box, constellation, contour,needle, lattice, density and multidimensional plots; 3-D, pie and areabar charts; and histograms. Segment profile plots: Interactively profile segments ofdata created by clustering andmodeling tools. Easily identify variables that determine the profiles and the differences between groups. Easy-to-use Graphics Explorer wizardand Graph Explore node: Create titles and footnotes. Apply a WHERE clause. Choose from color schemes. Easily rescale axes.Rules Builder node Create ad hoc data-driven rules andpolicies. Interactively define the value of theoutcome variable and paths to theoutcome.Data replacement Measures of centrality.Distribution-based.Tree imputation with surrogates.Mid-medium spacing.Robust M-estimators.Default constant.Replacement Editor: Specify new values for classvariables. Assign replacement values forunknown values. Interactively cap extreme intervalvalues to a replacement threshold.Descriptive statistics Univariate statistics and plots: Interval variables: n, mean, median,min, max, standard deviation, scaleddeviation and percent missing. Class variables: number of categories, counts, mode, percent modeand percent missing.Figure 3: Develop customized transformations using the interactive TransformVariables node Expression Builder.

Surface underlying data fromstandard SAS Enterprise Miner resultsto develop customized graphics. Plots and tables are interactivelylinked, supporting tasks such asbrushing and banding. Data and plots can be easily copiedand pasted into other applications orsaved as BMP files. Interactive graphs are automaticallysaved in the Results window of thenode.Clustering and self-organizingmaps Clustering: User defined or automaticallychooses the best clusters. Several strategies for encodingclass variables into the analysis. Handles missing values. Variable segment profile plotsshow the distribution of the inputsand other factors within eachcluster. Decision tree profile usesthe inputs to predict clustermembership. PMML score code. Self-organizing maps: Batch SOMs with NadarayaWatson or local-linear smoothing. Kohonen networks. Overlay the distribution of othervariables onto the map. Handles missing values.Market basket analysis Associations and sequence discovery: Grid plot of the rules ordered byconfidence. Expected confidence versusconfidence scatter plot. Statistics line plot of the lift, confidence, expected confidence andsupport for the rules. Statistics histogram of thefrequency counts for given rangesof support and confidence. Rules description table. Network plot of the rules. Interactively subset rules basedFigure 4: Use link analysis to evaluate relationships between nodes to visually discovernew patterns.on lift, confidence, support, chainlength, etc. Seamless integration of rules withother inputs for enriched predictivemodeling. Hierarchical associations: Derive rules at multiple levels. Specify parent and childmappings for the dimensionalinput table.Web path analysis Scalable and efficient mining of themost frequently navigated pathsfrom clickstream data. Mine frequent consecutive subsequences from any type of sequencedata.Link analysis Converts data into a set of interconnected linked objects that can bevisualized as a network of effects. Provides a visual model of how twovariables’ levels in relational data orbetween two items’ conoccurrencein transactional data are linked. Provides centrality measures andcommunity information to understand linkage graphs. Provides weighted confidencestatistics to provide next-best offerinformation. Generates cluster scores for datareduction and segmentation.Dimension reduction Variable selection: Remove variables unrelated totarget based on a chi-square or R2selection criterion. Remove variables in hierarchies. Remove variables with manymissing values. Reduce class variables with a largenumber of levels. Bin continuous inputs to identifynonlinear relationships. Detect interactions. Least Angle Regression (LARS)variable selection: AIC, SBC, Mallows C(p), cross-validation and other selection criteria. Plots include: parameter estimates, coefficient paths, iterationplot, score rankings and more. Generalizes to support LASSO(least absolute shrinkage andselection operator). Supports class inputs and targetsas well as continuous variables. Score code generation.

Principal components: Calculate Eigenvalues andEigenvectors from correlation andcovariance matrices. Plots include: principal components coefficients, principalcomponents matrix, Eigenvalue,Log Eigenvalue and CumulativeProportional Eigenvalue. Interactively choose the numberof components to be retained. Mine selected principal components using predictive modelingtechniques. Variable clustering: Divide variables into disjoint orhierarchical clusters. Eigenvalue or principal components learning. Includes class variable support. Dendrogram tree of the clusters. Selected variables table withcluster and correlation statistics. Cluster network and R-square plot. Interactive user override ofselected variables. Time series mining: Reduce transactional data intoa time series using severalaccumulation methods andtransformations. Analysis methods includeseasonal, trend, time domain, andseasonal decomposition. Mine the reduced time seriesusing clustering and predictivemodeling techniques. Batch code now integrates projectstart code that you can use todefine libraries and options. Easy-to-use program developmentinterface: Macro variables to reference datasources, variables, etc. Interactive code editor and submit. Separately manage training,scoring and reporting code. SAS Output and SAS LOG. Create graphics. Write SAS Viya code in code editorwindow within a SAS Enterprise Minerprocess flow.Consistent modeling features Select models based on eitherthe training, validation (default)or test data using several criteriasuch as profit or loss, AlC, SBC,average square error, misclassification rate, ROC, Gini, or KS(Kolmogorov-Smirnov). Incorporate prior probabilities intothe model development process. Supports binary, nominal, ordinaland interval inputs and targets. Easy access to score code and allpartitioned data sources. Display multiple results in onewindow to help better evaluatemodel performance. Decisions node for setting targetevent and defining priors and profit/loss matrices.Regression Linear and logistic. Stepwise, forward and backwardselection. Equation terms builder: polynomials,general interactions, and effect hierarchy support. Cross-validation. Effect hierarchy rules. Optimization techniques include:Conjugate Gradient, DoubleDogleg, Newton-Raphson with LineSearch or Ridging, Quasi-Newtonand Trust Region. Dmine Regression node: Fast forward stepwise leastsquares regression. Optional variable binning todetect nonlinear relationships. Optional class variable reduction. Include interaction terms. In-database modeling for Teradata13. PMML score code.SAS Code node andSAS Viya Code node Write SAS code for easy-to-complexdata preparation and transformationtasks. Incorporate procedures from otherSAS products. Develop custom models. Create SAS Enterprise Miner extension nodes. Augment score code logic. Support for SAS procedures. Batch code uses input tables ofdifferent names and locations.Figure 5: Integrate customized SAS code to create variable transformations, incorporateSAS procedures, develop new nodes, augment scoring logic, tailor reports and more.

Decision trees Methodologies: CHAID, classification andregression trees, bagging andboosting, gradient boosting,and bootstrap forest. Tree selection based on profitor lift objectives and ability toprune accordingly. K-fold cross-validation. Splitting criterion: Prob Chi-squaretest, Prob F-test, Gini, Entropy andvariance reduction. Switch targets for designing multiobjective segmentation strategies. Automatically output leaf IDs asinputs for modeling and groupprocessing. Displays English rules. Calculates variable importance forpreliminary variable selection andmodel interpretation. Display variable precision values inthe split branches and nodes. Unique consolidated tree maprepresentation of the tree diagram. Interactive tree capabilities: Interactive growing/pruningof trees; expand/collapse treenodes. Incorporates validation data toevaluate tree stability. Define customized split points,including binary or multiwaysplits. Split on any candidate variable. Copy split. Tables and plots are dynamicallylinked to better evaluate the treeperformance. Easy-to-print tree diagrams ona single page or across multiplepages. Interactive subtree selection. User-specified display of text andstatistics in the Tree node. User-controlled sample size withininteractive trees. Based on the fast ARBORETUMprocedure. PMML score code.Figure 6: Fit highly complex nonlinear relationships using the Neural Network node.Neural networks Neural Network node: Flexible network architectureswith combination and activationfunctions. 10 training techniques. Preliminary optimization. Automatic standardization of inputs. Supports direction connections. Autoneural Neural node: Automated multilayer perceptronbuilding searches for optimalconfiguration. Type and activation functionselected from four different typesof architectures. PMML score code. DM Neural node: Model building with dimensionreduction and function selection. Fast training; linear and nonlinearestimation.Partial Least Squares node Especially useful for extracting factors from a large number of potentialcorrelated variables. Performs principal components regression and reduced rank regression. User or automated selection of thenumber of the factors. Choose from five cross-validationstrategies. Supports variable selection.Rule induction Recursive predictive modelingtechnique. Especially useful for modeling rareevents.Two-stage modeling Sequential and concurrent modelingfor both the class and interval target. Choose a decision tree, regressionor neural network model for eachstage. Control how the class prediction isapplied to the interval prediction. Accurately estimate customer value.Memory-based reasoning k-nearest neighbor technique tocategorize or predict observations. Patented Reduced DimensionalityTree and Scan.Model ensembles Combine model predictions to forma potentially better solution.

Methods include: Averaging, Votingand Maximum.Open Source Integration node Write code in the R language insideof SAS Enterprise Miner. SAS Enterprise Miner data andmetadata are available to your Rcode with R results returned toSAS Enterprise Miner. Train and score supervised andunsupervised R models. The nodeallows for data transformation andexploration. Generate model comparisons andSAS score code for supported models.Incremental response/net liftmodels Net treatment vs. control models.Binary and interval targets.Stepwise selection.Fixed or variable revenue calculations.Net information value variable selection.User can specify the treatment levelof the treatment variable.User can specify a cost variable inaddition to a constant cost.Penalized Net Information Value(PNIV) for variable selection.Separate model selection optionsavailable for an incremental salesmodel. Computes similarity measuresbetween the target and inputseries, or among input time series. Similarity matrix for all combinations of the series. Hierarchical clustering using thesimilarity matrix with dendrogramresults. Constellation plot for evaluatingthe clusters. Exponential smoothing: Control weights decay using oneor more smoothing parameters. Best-fitting smoothing method(simple, double, linear, dampedtrend, seasonal or Winters’ method)is selected automatically. Dimension reduction: Supports five time series dimension reduction techniques:Discrete Wavelet Transform,Discrete Fourier Transform,Singular Value Decomposition,Line Segment Approximationwith the Mean, and Line SegmentApproximation with the Sum. Cross-correlation: Provides autocorrelation andcross-correlation analysis for timestamped data. The Time Series Correlation nodeoutputs time-domain statisticsbased on whether autocorrelationor cross-correlation is performed. Seasonal decomposition.Survival analysis Discrete time to event regressionwith additive logistic regression. Event probability for time effect ismodeled using cubic splines. Users can now enter the cubic splinebasis functions as part of the stepwise variable selection pr

SAS Rapid Predictive Modeler is a SAS Enterprise Guide or SAS Add-In for Microsoft Office (Microsoft Excel only) task and uses prebuilt SAS Enterprise Miner modeling steps.