Abhishek Gupta Sr. Application Engineer - MathWorks

Transcription

Machine Learning with MATLABAbhishek GuptaSr. Application Engineer 2014 The MathWorks, Inc.1

Goals Overview of machine learning Machine learning models & techniques available inMATLAB Streamlining the machine learning workflow withMATLAB2

Machine LearningCharacteristics and Examples Characteristics– Lots of data (many variables)– System too complex to knowthe governing equation(e.g., black-box modeling) Examples––––Pattern recognition (speech, images)Financial algorithms (credit scoring, algo trading)Energy forecasting (load, price)Biology (tumor detection, drug discovery)AAA 93.68%5.55%0.59%0.18%0.00%0.00%0.00%0.00%AA 2.44%92.60%4.03%0.73%0.15%0.00%0.00%0.06%A 0.14%4.18%91.02%3.90%0.60%0.08%0.00%0.08%BBB 0.03%0.23%7.49%87.86%3.78%0.39%0.06%0.16%BB 0.03%0.12%0.73%8.27%86.74%3.28%0.18%0.64%B 0.00%0.00%0.11%0.82%9.64%85.37%2.41%1.64%CCC 0.00%0.00%0.00%0.37%1.84%6.24%81.88%9.67%D BCCCDAAA4

Challenges – Machine Learning Significant technical expertise required No “one size fits all” solution Locked into Black Box solutions Time required to conduct the analysis5

Overview – Machine LearningMachineLearningType of LearningCategories of AlgorithmsUnsupervisedLearningClusteringGroup and interpretdata based onlyon input dataClassificationSupervisedLearningDevelop predictivemodel based on bothinput and output dataRegression6

Unsupervised Learningk-Means,Fuzzy nMixtureHidden MarkovModel7

Supervised LearningRegressionNeuralNetworksDecision TreesEnsembleMethodsNon-linear Reg.(GLM, Logistic)LinearRegressionClassificationSupport VectorMachinesDiscriminantAnalysisNaive BayesNearestNeighbor8

Supervised Learning - WorkflowSpeed up ComputationsSelect ModelDataImport DataExplore DataPrepare DataTrain the ModelKnown dataUse for PredictionModelPredictedResponsesModelKnown responsesNew DataMeasure Accuracy9

Example – Bank Marketing Campaign Goal:– Predict if customer would subscribe tobank term deposit based on differentattributesBank Marketing CampaignMisclassification Rate1009080702010BReducedTnTr eesVMeBaggerTreDecisioayesSuppo rthbor sNaiveBNeiginantAk -nearestnaly sis0Reg res sionTrain a classifier using different modelsMeasure accuracy and compare modelsReduce model complexityUse classifier for prediction30Discr esMisclassified50Logistic Percentage60Data set downloaded from UCI Machine Learning ank Marketing11

Example – Bank Marketing Campaign Numerous predictive models with richdocumentationBank Marketing CampaignMisclassification Rate10090Interactive visualizations and apps toaid discovery7060Percentage dTnTr eesVMeBaggerTreDecisioayesSuppo rthbor sNaiveBNeignaly sisinantAk -nearestReg res sionQuick prototyping; Focus onmodeling not programmingDiscr im NeuralNetBuilt-in parallel computing supportLogistic 012

ClusteringOverview1 What is clustering?– Segment data into groups,based on data similarity0.90.80.70.6 Why use clustering?– Identify outliers– Resulting groups may bethe matter of interest0.50.40.30.20.10-0.1 00.10.20.30.40.50.6How is clustering done?– Can be achieved by various algorithms– It is an iterative process (involving trial and error)13

Example – Clustering Corporate 500Approach:– Cluster the bonds data using distancebased and probability-basedtechniques– Evaluate clusters for validity1000200030004000k-Means Clustering0.85001000Data Point # 0.240000.6150020000.4250030000.2Dist Metric:cosine– Cluster similar corporate bondstogetherData Point # 500Dist Metric:spearmanHierarchical Clustering35004000100020003000Data Point #4000014

Example – Clustering Corporate BondsNumerous clustering functions withrich 00 Interactive visualizations to aiddiscovery0.43500Dist Metric:spearman Data Point #Hierarchical Clustering0.240001000200030004000k-Means Clustering0.8Rapid exploration & development10000.6150020000.4250030000.2Dist Metric:cosine Viewable source; not a black boxData Point # 50035004000100020003000Data Point #4000015

Short-term Load Forecaster Goal:– Develop a tool for Excel users to generate next day electricitydemand predictions Requirements:– Easy to use interface– Accurate predictive model16

Deploying MATLAB Applications to Excel3ToolboxesMATLABDesktop1End-UserMachineMATLAB Compiler2MATLABBuilder EX.dll.bas17

Deployment HighlightsDatabase ServersDesktop Applications.exeExcelSpreadsheetsHADOOPApplication ServersWeb Applications.NETClient Front EndApplicationsCJavaCTFBatch/Cron Jobs Royalty-free deployment Point-and-click workflow Unified process for desktop and server apps18

MATLAB for Machine LearningChallengesMATLAB SolutionTime (loss of productivity)Rapid analysis and application developmentHigh productivity from data preparation, interactiveexploration, visualizations.Extract value from dataMachine learning, Video, Image, and FinancialDepth and breadth of algorithms in classification, clustering,and regressionComputation speedFast training and computationParallel computation, Optimized librariesTime to deploy & integrateEase of deployment and leveraging enterprisePush-button deployment into productionTechnology riskHigh-quality libraries and supportIndustry-standard algorithms in use in productionAccess to support, training and advisory services whenneeded19

Learn More: Machine Learning withMATLABmathworks.com/machine-learning20

Training ServicesExploit the full potential of MathWorks productsFlexible delivery options: Public training available worldwideOnsite training with standard orcustomized coursesWeb-based training with live, interactiveinstructor-led coursesSelf-paced interactive online trainingMore than 30 course offerings: Introductory and intermediate training on MATLAB, Simulink,Stateflow, code generation, and Polyspace productsSpecialized courses in control design, signal processing, parallel computing,code generation, communications, financial analysis,and other areas23

Consulting ServicesAccelerating return on investmentA global team of experts supporting every stage of tool and process integrationContinuous ImprovementProcess and TechnologyAutomationProcess and TechnologyStandardizationFull ApplicationDeploymentProcess AssessmentComponentDeploymentAdvisory ServicesJumpstartMigration PlanningResearchAdvanced EngineeringProduct Engineering TeamsSupplier Involvement24

Technical SupportResources Over 100 support engineers– All with MS degrees (EE, ME, CS)– Local support in North America,Europe, and AsiaComprehensive, product-specific Websupport resourcesHigh customer satisfaction 95% of calls answeredwithin three minutes70% of issues resolvedwithin 24 hours80% of customers surveyedrate satisfaction at 80–100% 25

MATLAB Central Community for MATLAB and SimulinkusersOver 1 million visits per monthFile Exchange––– Newsgroup–– Upload/download access to free filesincluding MATLAB code, Simulink models,and documentsAbility to rate files, comment, and ask questionsMore than 12,500 contributed files, 300submissions per month, 50,000 downloadsper monthWeb forum for technical discussions aboutMathWorks productsMore than 300 posts per dayBlogs––Commentary from engineers who design, build,and support MathWorks productsOpen conversation at blogs.mathworks.comBased on February 2011 data26

Questions?27

15 Example -Clustering Corporate Bonds Numerous clustering functions with rich documentation Interactive visualizations to aid discovery Viewable source; not a black box Rapid exploration & development Hierarchical Clustering 1000 2000 3000 4000 500 1000 1500 2000 2500 3000 3500 4000 n 0.2 0.4 0.6 0.8 1 1.2 1.4