Statistics And Machine Learning Toolbox Release Notes

Transcription

Statistics and Machine Learning Toolbox ReleaseNotes

How to Contact MathWorksLatest news:www.mathworks.comSales and services:www.mathworks.com/sales and servicesUser community:www.mathworks.com/matlabcentralTechnical support:www.mathworks.com/support/contact usPhone:508-647-7000The MathWorks, Inc.3 Apple Hill DriveNatick, MA 01760-2098Statistics and Machine Learning Toolbox Release Notes COPYRIGHT 2005–2017 by The MathWorks, Inc.The software described in this document is furnished under a license agreement. The software may be usedor copied only under the terms of the license agreement. No part of this manual may be photocopied orreproduced in any form without prior written consent from The MathWorks, Inc.FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentationby, for, or through the federal government of the United States. By accepting delivery of the Programor Documentation, the government hereby agrees that this software or documentation qualifies ascommercial computer software or commercial computer software documentation as such terms are usedor defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms andconditions of this Agreement and only those rights specified in this Agreement, shall pertain to andgovern the use, modification, reproduction, release, performance, display, and disclosure of the Programand Documentation by the federal government (or other entity acquiring for or through the federalgovernment) and shall supersede any conflicting contractual terms or conditions. If this License failsto meet the government's needs or is inconsistent in any respect with federal procurement law, thegovernment agrees to return the Program and Documentation, unused, to The MathWorks, Inc.TrademarksMATLAB and Simulink are registered trademarks of The MathWorks, Inc. Seewww.mathworks.com/trademarks for a list of additional trademarks. Other product or brandnames may be trademarks or registered trademarks of their respective holders.PatentsMathWorks products are protected by one or more U.S. patents. Please seewww.mathworks.com/patents for more information.

ContentsR2017aRegression Learner App: Train regression models usingsupervised machine learning . . . . . . . . . . . . . . . . . . . . . . . .1-2Big Data Algorithms: Perform support vector machine (SVM)and Naive Bayes classification, create bags of decisiontrees, and fit lasso regression on out-of-memory data . . .1-3Code Generation: Generate C code for prediction by usinglinear models, generalized linear models, decision trees,and ensembles of classification trees (requires MATLABCoder) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-4Bayesian Statistics: Perform gradient-based sampling usingHamiltonian Monte Carlo (HMC) sampler . . . . . . . . . . . . .1-4Feature Extraction: Perform unsupervised feature learningby using sparse filtering and reconstruction independentcomponent analysis (RICA) . . . . . . . . . . . . . . . . . . . . . . . . .1-5t-SNE: Visualize high-dimensional data . . . . . . . . . . . . . . . . .1-5Survival Analysis: Fit Cox proportional hazards models withtime-dependent covariates . . . . . . . . . . . . . . . . . . . . . . . . . .1-5Distribution Fitting App: dfittool Renamed todistributionFitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6lasso and lassoglm Functions: Specify maximum number ofiterations allowed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6Functionality Being Changed . . . . . . . . . . . . . . . . . . . . . . . . .1-6iii

R2016bBig Data Algorithms: Perform dimension reduction,descriptive statistics, k-means clustering, linearregression, logistic regression, and discriminant analysison out-of-memory data . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-2Bayesian Optimization: Tune machine learning algorithms bysearching for optimal hyperparameters . . . . . . . . . . . . . . .2-2ivContentsFeature Selection: Use neighborhood component analysis(NCA) to choose features for machine learning models . .2-2Code Generation: Generate C code for prediction by usingSVM and logistic regression models (requires MATLABCoder) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3Classification Learner: Train classifiers in parallel (requiresParallel Computing Toolbox) . . . . . . . . . . . . . . . . . . . . . . . .2-4Machine Learning Performance: Speed up Gaussian mixturemodeling, SVM with duplicate observations, and distancecalculations for sparse data . . . . . . . . . . . . . . . . . . . . . . . . .2-4Survival Analysis: Fit Cox proportional hazards models withnew options for residuals and handling ties . . . . . . . . . . .2-4Ensemble Methods Usability: Use simpler functions to trainclassification or regression ensembles . . . . . . . . . . . . . . . .2-5Quantile Regression: Use bagged regression trees(TreeBagger) to implement quantile regression . . . . . . . .2-5GPU support: pdist, pdist2, and knnsearch acceptgpuArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-5Gaussian Processes: Use additional popular kernelfunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-5coxphfit Function: Specify coefficient initial values andobservation weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-6

fitgmdist Function: Set initial values using kmeans algorithm by default . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-6fitgmdist Function: Specify tolerance for posteriorprobabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-6fitctree, fitrtree, and templateTree Functions: Unbiasedfeature selection for decision trees . . . . . . . . . . . . . . . . . . .2-6R2016aMachine Learning for High-Dimensional Data: Perform fastfitting of linear classification and regression modelswith techniques such as stochastic gradient descent and(L)BFGS using fitclinear and fitrlinear functions . . .3-2Classification Learner: Train multiple models automatically,visualize results by class labels, and perform logisticregression classification . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3Performance: Perform clustering using kmeans, kmedoids,and Gaussian mixture models faster when data has a largenumber of clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3Probability Distributions: Fit kernel smoothing density tomultivariate data using the ksdensity and mvksdensityfunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3Stable Distributions: Model financial and other data thatrequires heavy-tailed distributions . . . . . . . . . . . . . . . . . . .3-4Half-Normal Distributions: Model truncated data and createhalf-normal probability plots . . . . . . . . . . . . . . . . . . . . . . . .3-4Linear Regression: CompactLinearModel object reducesmemory footprint of linear regression model . . . . . . . . . .3-4Robust covariance estimation for multivariate sample datausing robustcov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-4v

Squared Euclidean distance measure for pdist and pdist2functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5Performance enhancements for nearest neighbor searchusing kd-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5GPU support for extreme value distribution functions andkmeans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5Changes to default online update phase for kmeansfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5Name change in ksdensity . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6Name change in paretotails . . . . . . . . . . . . . . . . . . . . . . . . .3-6Functionality Being Changed . . . . . . . . . . . . . . . . . . . . . . . . .3-6R2015bClassification Learner: Train discriminant analysis toclassify data, train models using categorical predictors,and perform dimensionality reduction using PCA . . . . . .4-2Nonparametric Regression: Fit models using support vectorregression (SVR) or Gaussian processes (Kriging) . . . . . .4-5Tables and Categorical Data for Machine Learning: Usetable and categorical predictors in classification andnonparametric regression functions and in ClassificationLearner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-6Code Generation: Automatically generate C and C code forkmeans and randsample functions (using MATLAB Coder).4-6GPU Acceleration: Speed up computation for over 65functions including probability distributions, descriptiveviContents

statistics, and hypothesis testing (using ParallelComputing Toolbox) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7Option to turn off clipping of Alpha coefficients in fitcsvm4-7Name changes in TreeBagger . . . . . . . . . . . . . . . . . . . . . . . . . .4-7R2015aClassification app to train models and classify data usingsupervised machine learning . . . . . . . . . . . . . . . . . . . . . . . .5-2Statistical tests for comparing accuracies of twoclassification models using compareHoldout, testcholdout,and testckfold functions . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-3Speedup of kmedoids, fitcknn, and other functionswhen using cosine, correlation, or spearman distancecalculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-3Performance enhancements for decision trees andperformance curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-3Additional option to control decision tree depth using'MaxNumSplits' argument in fitctree, fitrtree, andtemplateTree functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4Code generation for pca and probability distributionfunctions (using MATLAB Coder) . . . . . . . . . . . . . . . . . . . .5-4Power and sample size for two-sample t-test usingsampsizepwr function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4Discard support vectors of SVM and ECOC models . . . . . . .5-4Minimum leaf size for boosted regression trees . . . . . . . . . .5-5Additional option to plot grouped histograms using thescatterhist and gplotmatrix functions . . . . . . . . . . . . . .5-5vii

Confidence interval computation for residuals using thefunction regress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-6Functionality Being Changed . . . . . . . . . . . . . . . . . . . . . . . . .5-6R2014bMulticlass learning for support vector machines and otherclassifiers using the fitcecoc function . . . . . . . . . . . . . . .6-2Generalized linear mixed-effects models using the fitglmefunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-2Clustering that is robust to outliers using the kmedoidsfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-3Speedup of the kmeans and gmdistribution clustering usingthe kmeans algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-3Fisher’s exact test for 2-by-2 contingency tables . . . . . . . . . .6-3templateEnsemble function for creating ensemble learningtemplate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-4templateSVM function for creating SVM learning template .6-4Standardizing training data in k-nearest neighborclassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-4fitcnb function for naive Bayes classification . . . . . . . . . . .6-4R2014aRepeated measures modeling for data with multiplemeasurements per subject . . . . . . . . . . . . . . . . . . . . . . . . . .viiiContents7-2

fitcsvm function for enhanced performance of supportvector machines (SVMs) for binary classification . . . . . . .7-2evalclusters methods to expand the number of clusters andnumber of gap criterion simulations . . . . . . . . . . . . . . . . .7-3p-value output from the multcompare function . . . . . . . . . . .7-4mnrfit, lassoglm, and fitglm functions accept categoricalvariables as responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-4Functions accept table inputs as an alternative to datasetarray inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-4Functions and model properties return a table rather than adataset array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-5Default value of 'EmptyAction' on kmeans is now'singleton'. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-6Functions for classification methods and clustering . . . . . .7-6Functionality being changed . . . . . . . . . . . . . . . . . . . . . . . . . .7-7R2013bLinear mixed-effects models . . . . . . . . . . . . . . . . . . . . . . . . . .8-2Code generation for probability distribution and descriptivestatistics functions (using MATLAB Coder) . . . . . . . . . . . .8-2evalclusters function for estimating the optimal number ofclusters in data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2mvregress function that now accepts a design matrix even ifY has multiple columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-3Upper tail probability calculations for cumulativedistribution functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-3ix

partialcorri function for partial correlation withasymmetric treatment of inputs and outputs . . . . . . . . . .8-5Fitting functions for linear, generalized linear, and nonlinearmodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-5Functionality being changed . . . . . . . . . . . . . . . . . . . . . . . . . .8-6R2013aSupport vector machines (SVMs) for binary classification(formerly in Bioinformatics Toolbox) . . . . . . . . . . . . . . . . .9-2Probabilistic PCA and alternating least-squares algorithmsfor principal component analysis with missing data . . . .9-2Anderson-Darling goodness-of-fit test . . . . . . . . . . . . . . . . . . .9-2Decision-tree performance improvements and categoricalpredictors with many levels . . . . . . . . . . . . . . . . . . . . . . . . .9-2Grouping and kernel density options in scatterhistfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-3Nonlinear model enhancements . . . . . . . . . . . . . . . . . . . . . . .9-3Syntax changes in parametric hypothesis test functions . . .9-4Probability distribution enhancements . . . . . . . . . . . . . . . . .9-4R2012bBoosting algorithms for imbalanced data, sparse ensembles,and multiclass boosting, with self termination . . . . . . . .xContents10-2

Burr distribution for expressing a wide range of distributionshapes while preserving a single functional form for thedensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-2Data import to a dataset array with the MATLAB ImportTool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-2Principal component analysis enhancements for handlingNaN as missing data, weighted PCA, and choosing betweenEIG or SVD as the underlying algorithm . . . . . . . . . . . . .10-2Speedup of k-means clustering using Parallel ComputingToolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-3One-sided nonparametric hypothesis tests . . . . . . . . . . . . .10-3Reorder nodes in dendrogram plots . . . . . . . . . . . . . . . . . . .10-3Nonlinear model enhancements . . . . . . . . . . . . . . . . . . . . . .10-3Changes to LinearModel diagnostics . . . . . . . . . . . . . . . . . .10-4Functionality being changed . . . . . . . . . . . . . . . . . . . . . . . . .10-4R2012aLinear, Generalized Linear, and Nonlinear Models forRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-2Variable Editor for Dataset Arrays . . . . . . . . . . . . . . . . . . . .11-2Lasso for Generalized Linear Regression . . . . . . . . . . . . . . .11-2K-Nearest Neighbor Classification . . . . . . . . . . . . . . . . . . . .11-2Random Subspace Ensembles . . . . . . . . . . . . . . . . . . . . . . . .11-3Regularized Discriminant Analysis with VariableSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-3xi

stepwisefit Coefficient History . . . . . . . . . . . . . . . . . . . . . . .11-3RobustWgtFun Replaces WgtFun . . . . . . . . . . . . . . . . . . . . .11-3ClassificationTree Now Predicts Class with MinimalMisclassification Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-3fpdf Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-4R2011bLasso Regularization for Linear Regression . . . . . . . . . . . .12-2Discriminant Analysis Classification Object . . . . . . . . . . . .12-2Nearest Neighbor Searching for Points Within a FixedDistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-2datasample Function for Random Sampling . . . . . . . . . . . .12-3Fractional Factorial Design Improvements . . . . . . . . . . . . .12-3nlmefit Returns the Covariance Matrix of EstimatedCoefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-3signrank Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-3Conversion of Error and Warning Message Identifiers . . . .12-3R2011axiiContentsBoosted Decision Trees for Classification and Regression .13-2Memory and Performance Improvements in LinkageMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-2

Conditional Weighted Residuals and Derivative Step Controlin nlmefit and nlmefitsa . . . . . . . . . . . . . . . . . . . . . . . . . . .13-2Detecting Ties in k-Nearest Neighbor Search . . . . . . . . . . .13-3Distribution Fitting Tool Uses fitdist Function . . . . . . . . . .13-3Speed and Accuracy Improvements in Noncentral Chi-SquareCDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-3Perfect Separation in Binomial Regression . . . . . . . . . . . . .13-3Sign Convention in mdscale . . . . . . . . . . . . . . . . . . . . . . . . . .13-3Demo of Credit Rating Classification Via Bagged DecisionTrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-3R2010bParallel Computing Support for More Functions . . . . . . . .14-2Algorithm to Rank Features in Classification andRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-2nlmefit Support for Error Models, and nlmefitsa changes .14-2Surrogate Splits for Decision Trees . . . . . . . . . . . . . . . . . . .14-3New Bagged Decision Tree Properties . . . . . . . . . . . . . . . . .14-3Enhanced Cluster Analysis Performance . . . . . . . . . . . . . . .14-3Export Probability Objects with dfittool . . . . . . . . . . . . . . .14-3Compute Partial Correlation of Two Variables Correcting forAll Other Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-4Specify Number of Evenly Spaced Quantiles . . . . . . . . . . . .14-4xiii

Control Location and Orientation of Marginal Histogramswith scatterhist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-4Return Bootstrapped Statistics with bootci . . . . . . . . . . . . .14-4R2010aStochastic Algorithm Functionality in NLME Models . . . . .15-2k-Nearest Neighbor Searching . . . . . . . . . . . . . . . . . . . . . . . .15-2Confidence Intervals Option in perfcurve . . . . . . . . . . . . . .15-2Observation Weights Options in Resampling Functions . . .15-2R2009bNew Parallel Computing Support for Certain Functions . .16-2New Stack and Unstack Methods for Dataset Arrays . . . . .16-2New Support for SAS Transport (.xpt) Files . . . . . . . . . . . .16-2New Output Function in nlmefit for Monitoring or CancelingCalculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-2R2009axivContentsEnhanced Dataset Functionality . . . . . . . . . . . . . . . . . . . . . .17-2New Naïve Bayes Classification . . . . . . . . . . . . . . . . . . . . . . .17-2

New Ensemble Methods for Classification and RegressionTrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17-2New Performance Curve Function . . . . . . . . . . . . . . . . . . . .17-2New Probability Distribution Objects . . . . . . . . . . . . . . . . . .17-3R2008bClassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-2Data Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-2Model Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-2Multivariate Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-2Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-2Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-3Statistical Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-3Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-4R2008aDescriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19-2Model Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19-2Multivariate Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19-2Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .19-2xv

Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19-2Statistical Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19-3Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19-3R2007bCluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20-2Design of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20-2Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20-2Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .20-3Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20-4Statistical Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20-4R2007axviContentsData Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-2Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-2Multivariate Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-2Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-3Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-3Statistical Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-4

Other Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21-4R2006bDemos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22-2Design of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22-2Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22-2Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . .22-3Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Multinomial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . .Multivariate Regression . . . . . . . . . . . . . . . . . . . . . . . . . . .Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22-322-322-322-3Statistical Process Control . . . . . . . . . . . . . . . . . . . . . . . . . . .22-3R2006aAnalysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23-2Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23-2Demos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23-2Design of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23-2Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23-3Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .23-3xvii

Random Number Generation . . . . . . . . . . . . . . . . . . . . . . . . .Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Markov Chain Monte Carlo Methods . . . . . . . . . . . . . . . . . .Pearson and Johnson Systems of Distributions . . . . . . . . . .23-323-323-323-4Robust Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23-4Statistical Process Control . . . . . . . . . . . . . . . . . . . . . . . . . . .23-4R14SP3Demos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-2Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-2Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Chi-Square Goodness-of-Fit Test . . . . . . . . . . . . . . . . . . . . .Variance Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Ansari-Bradley Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Tests of Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-224-224-224-324-3Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .Generalized Extreme Value Distribution . . . . . . . . . . . . . . .Generalized Pareto Distribution . . . . . . . . . . . . . . . . . . . . .24-324-324-4Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-4Statistical Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24-4R14SP2Multivariate Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xviiiContents25-2

R2017aVersion: 11.1New FeaturesBug FixesCompatibility Considerations

R2017aRegression Learner App: Train regression models using supervisedmachine learningRegression Learner is a new app that you can use to train regression models to predictdata. Using this app, you can explore your data, select features, specify validationschemes, train models, and assess results. You can perform automated training to searchfor the best regression model type, including linear regression models, regression trees,Gaussian process models, support vector machines, and ensembles of regression trees.To use the model with new data, or to learn about programmatic regression, you canexport the model to the workspace or generate MATLAB code to recreate the trainedmodel.For more information, see “Regression Learner App”.1-2

Big Data Algorithms: Perform support vector machine (SVM) and NaiveBayes classification, create bags of decision trees, and fit lasso regressionon out-of-memory dataSeveral classification and regression functions add support for tall arrays: fitclinear for support vector machine classification fitcnb for Naive Bayes classification TreeBagger for creating bags of decision trees using ADMM algorithm lasso for fitting lasso regression using ADMM algorithm The loss and predict methods of these regression classes: CompactRegressionSVM CompactRegressionGP CompactRegressionTree CompactRegressionEnsemble RegressionLinear The predict, loss, margin, and edge methods of these classification classes: CompactClassificationEnsemble CompactClassificationTree CompactClassificationTree CompactClassificationDiscriminant CompactClassificationNaiveBayes CompactClassificationSVM CompactClassificationECOC ClassificationKNN ClassificationLinearFor a complete list of supported functions, see “Tall Array Support, Usage Notes, andLimitations”.1-3

R2017aCode Generation: Generate C code for prediction by using linear models,generalized linear models, decision trees, and ensembles of classificationtrees (requires MATLAB Coder)You can generate C code that predicts responses by using trained linear models,generalized linear models (GLM), decision trees, or ensembles of classification trees. Thefollowing prediction functions support code generation: predict — Predict responses or estimate confidence intervals on predictions byapplying a linear model to new predictor data. predict or glmval — Predict responses or estimate confidence intervals on predictionsby applying a GLM to new predictor data. predict or predict — Classify observations or estimate classification scores by applyinga classification tree or ensemble of classification trees, respectively, to new data. predict — Predict responses by applying a regression tree to new data.You can generate C code to simulate responses from a linear model or a generalizedlinear model using random or random, respectively.Bayesian Statistics: Perform gradient-based sampling using HamiltonianMonte Carlo (HMC) samplerYou can now perform Hamiltonian Monte Carlo (HMC) sampling from a probabilitydensity function. Use the hmcSampler function to create a HamiltonianSampler objectfor the log probability density that you specify. The object samples from this density bygenerating a Markov chain with the corresponding equilibrium distribution using HMC.After creating a HamiltonianSampler object, you can use: tuneSampler to tune the HMC sampler prior to drawing samples drawSamples to draw samples from the density estimateMAP to estimate the maximum of the log probability density diagnostics to assess the convergenceFor a workflow example, see “Bayesian Linear Regression

Big Data Algorithms: Perform support vector machine (SVM) and Naive Bayes classification, create bags of decision trees, and fit lasso regression on out-of-memory data. 1-3 Code Generation: Generate C code for prediction by using linear models, generalized linear models, decision trees, and ensembles of classification trees (requires MATLAB