Operational Machine Learning

Transcription

Operational Machine LearningUsing Microsoft Technologies for Applied Data ScienceKhalid M. Salama, Ph.D.Business Insights & AnalyticsHitachi Consulting UK

Outline Introduction to Data Science From Experimental Data Science to Operational Machine Learning MS Technologies for Data Science & Advanced Analytics Demos & Screenshots Concluding Remarks

Introduction to Data Science andMachine Learning

Data Science and Machine LearningWhat?“Data mining, an interdisciplinary subfield of computer science, is thecomputational process of automatic discovering interesting and usefulpatterns in large data sets”Other Related Technologies:ArtificialIntelligence Visualization Big DataStatisticsDatabases High Performance Computing Cloud Computing Others.MachineLearningDataScienceOtherTechnologies

Data Science and Machine LearningWhy?The objective of datascience is to provide youwith actionable insights tosupport decision making .ChurnanalysisSocial tracking andservicesVision AnalyticsWeatherforecasting forbusiness planningLegaldiscovery nganalysisPricing analysisFrauddetectionPersonalizedInsurance

Data Science and Machine LearningHow?Classification LearningTime Series AnalysisBuild a model that can predict the target classof an input caseAnalysis of temporal data to forecastfuture valuesRegression ModelingProbabilistic ModelingCompute the probability of an event to occurgiven a set of conditionsBuild a model that can estimate the responsevalue given an input caseCluster AnalysisSimilarity AnalysisDiscover natural groupings within thedata pointsIdentify similar cases to a given input casebased on the input featuresAssociation Rule DiscoveryExtract frequent patterns presentin the dataIF . AND . AND . THEN AELSE IF . AND . THEN CELSE IF . AND . THEN B.ELSE CCollaborative FilteringFiltering of information using techniquesinvolving collaboration viewpoints

From Experimental Data Science toOperational Machine Learning

Data Science ActivitiesExperimentation vs. OperationalizationExploratory Data AnalysisData Analysis & ExperimentationCollect Data InteractiveBlend Easy to performPrepareVisualize Rich VisualizationsLearningDatasetML ExperimentAlgorithm SelectionParameter TuningReport of Visuals &FindingsTraining & TestingModelDecision!

Data Science ActivitiesExperimentation vs. OperationalizationOperational ML PipelinesBatchAutomated ML Pipeline Pipelined (ETL Integration) ScalableData IngestionDataProcessingModel TrainingDeploy Apps IntegrationScoringModelWeb APIsExportTrainPredictOnline AppsReal-time

Microsoft Advanced AnalyticsTechnologies

Microsoft Advanced AnalyticsCortana Intelligence Suitehttps://gallery.cortanaintelligence.com/

Microsoft Advanced AnalyticsData Science, Machine Learning, & IntelligenceAzure MachineLearningMicrosoft R Server – SQLServer R ServicesData Mining – SQL ServerAnalysis ServicesSpark ML – AzureHDInsightCognitive Features – AzureData Lake AnalyticsAzure CognitiveServicesMicrosoft Bot Framework

Microsoft Azure Machine Learning

Azure Machine LearningMS Cloud-native Data Science Cloud-based Machine Learning Services Interactive Data Science StudioLimitations Rich built-in functionality Only Cloud-based (Data Regulations) Imports data from everywhere Scalability – Maximum dataset size 10GB Easy to develop and productionize – Web Services Microsoft R Open is not supported, yet Extensible via R and Python scripts No Source ControlRetrain ModelAzure MachineLearningInputImport DataPublishWeb ServicesBuild and deploymodels in the cloudBatch ScoringResult

Azure Machine LearningReal-time PredictionsAzure ML WebServiceSendInputSend Results(Input, Output)Consume messagesSend data pointsAppReceiveOutputEvent HubStream AnalyticsPower BI

Azure Machine LearningBuilt-in Features

Azure Machine LearningAlgorithms Cheat Sheet

Azure Machine LearningML Studio

Azure Machine LearningWeb Service

Azure Machine LearningStream Analytics Integration

Azure Machine LearningAzureML R Library

Microsoft R Server

Microsoft R ServerR in Microsoft WorldMicrosoft R Open (MRO) Based on latest Open Source R (3.2.2.) - Built, tested, and distributed by Microsoft More efficient and multi-threaded computation Enhanced by Intel Math Kernel Library (MKL) to speed up linear algebra functions Compatible with all R-related software

Microsoft R ServerComparisonCRANMROMRSData sizeIn-memoryIn-memoryIn-memory & diskEfficiencySingle threadedMulti-threadedMulti-threaded, parallelprocessing 1:N serversSupportCommunityCommunityCommunity CommercialFunctionality7500 innovative analyticpackages7500 innovative analyticpackages7500 innovative packages commercial parallel highspeed functionsLicenceOpen SourceOpen SourceCommercial license.

Microsoft R ServerComponents and Compute ContextsMS R ClientMicrosoft R ServerScale & Deploy Installed on Windows or Linux ScaleR - Optimized for parallel execution onBig Data, to eliminate memory limitations. ConnectR – Provides access to local filesystems, hdfs, hive, sqlserver, Teradata, etc. DistributeR - Adaptable parallel executionframework to enable running on different(distributed) compute contexts. Operationalization (msrdeploy) – Deploythe model as a Web API.CRAN & MS R perationalization(msrdeploy)Different Compute ContextsRStudio RTVS

Microsoft R ServerMicrosoft R Server – ScaleR ExampleCheck EnvironmentLoad XDFPrepare Data – Process XDFBuild Predictive ModelPerform Prediction

Microsoft R ServerMicrosoft R Server – ScaleR Functionality

SQL Server (in-database)R Services

SQL Server R ServicesIn-database Analytics R Services (in-database) – Keep your analytics close to the data T-SQL Script – Can be encapsulated in Stored Procedures Models are built, trained, saved as part of the ETL process (SSIS)Limitations Used for batch prediction (as part of the ETL process) Not supported in Azure SQL DB/DW, yet Visual Studio SQL Database Project, Source Controlled, etc. Not suitable for Interactive Data Science Uses Microsoft ScaleR libraries Only R, no python, yet.Data SourcesProcessDataTrain RModelSerializeStore ModelsMaintainModelsTraining PipelineEXECUTE sp execute external scriptETL Using SSISProcessDataLoad ModelPerformPredictionStore ResultsPrediction Pipeline

SQL Server R ServicesT-SQL ScriptConfigureBuild and Save ModelModel SummaryPredictionPrediction Output

Microsoft Analysis ServicesData Mining

SQL Server Analysis ServicesData Mining Process data from many OLEDB and ODBC data sources Easy to build, interpret, deploy, and productionizeLimitations SSIS Support – Tasks to Train & Predict Limited Extensibility Interactive Visuals for model interpretation Limited Algorithms & Functionalities Excel Integration – Data Mining Add-in No Azure PaaS ServiceAzure SQL DW/DBSQL ServerAnalysis ServicesBuild ModelOnline AppsDMX QueryResultBatch ScoringRetrain ModelExplore/Interpret Model

SQL Server Analysis ServicesOverviewData Source ViewMining Structure Decision Tress Naïve-Bayes Linear Regression Neural Networks Association Rules Clustering Sequence Clustering Time SeriesMining AlgorithmMining Model

SQL Server Analysis ServicesVisualizing Models

SQL Server Analysis ServicesExcel Data Mining Add-in

Azure Cognitive Services

Azure Cognitive ServicesReady-to-use Intelligence

Azure Cognitive ServicesSetup a Cognitive Services APIahttps://www.microsoft.com/cognitive-services/

Cognitive Features in Azure DataLake Analytics

Azure Data Lake AnalyticsCognitive Features Pre-built intelligence – Text & Image Analysis Integrated with your data processing pipelines (DLA) Used for batch recognition (not singleton real-time) Scheduled & Automated using Azure Data FactoryLimitations R & Python Extensions! Limited Features Scalable – Suitable for Big Data Not suitable for real-time scoringData Processing & PattenRecognitionSource Data(Text, Images, etc.)Enterprise Data WarehouseInputOutputPolybaseIngestData LakeAnalytics JobsAzure SQL DWData Lake StoreData Lake StoreAzure Data Factory

Azure Data Lake AnalyticsFirst-time Installation

Azure Data Lake AnalyticsU-SQL Script

Azure Data Lake AnalyticsExecution & Output

Spark ML on HDInsight

Spark ML on HDInsightScalable ML for Big Data Rich Spark ML Libraries Scalable, distributed, in-memory Extensible – Python, R, Java, Scala Suitable for Big Data - Batch Model Training and ScoringLimitations Spark Streaming for Real-time predictions Expensive to keep it up & running Scheduled & Automated Using Azure Data Factory Slow to spin-upEnterprise Data WarehouseIngestPolybaseLoadSaveSource DataHDInsightAzure Data Factory Process DataBuild ModelSave ModelLoad ModelPerform PredictionsSave ResultsAzure SQL DW

Spark ML on HDInsightSpark ML PipelinesSpark ML standardizes APIs for machine learning algorithms to make it easier to combinemultiple task into a single pipeline, or workflow. Transformers – used for data pre-processing. Input: DataFrame - Output:DataFrame Estimators – ML algorithm used to build a predictive model. Input: DataFrame - Output: Model. Parameters – Configurations for Transformers and Estimators Pipeline – Chains Transformers and EstimatorsML PipelineDataset(DataFrame)Transformer A(pre-processing) Transformer Z(pre-processing)ParametersEstimator(ML LearningAlgorithm)ModelEvaluation

Spark ML on HDInsightSpark ML FunctionalityEstimators (supervised)TransformersText Feature Extraction TF-IDF (HashingTF and IDF) Word2Vec CountVectorizer Tokenizer StopWordsRemover n-gramFeature Selection VectorSlicer RFormula ChiSqSelectorDimensionality Reduction PCAFeatures Vector Preparation VectorAssembler VectorIndexer StringIndexer IndexToStringFeature Type Conversion Binarizer Discrete Cosine Transform (DCT) OneHotEncoder Bucketizer QuantileDiscretizerFeature Scaling Normalizer StandardScaler MinMaxScalerFeature Construction SQLTransformer ElementwiseProduct PolynomialExpansionClassification Decision Trees – Ensembles Naïve-Bayes SVMRegression Linear Regression SVMOther (Unsupervised) ClusteringCollaborative FilteringFrequent Pattern Mining

Spark ML on HDInsightSpark ML - Example

Spark ML on HDInsightBigDL – Intel’s Distributed Deep Learning ning/

Concluding RemarksInteractive Data ScienceStudioExtensibility Azure ML Spark on HDI Azure ML Microsoft R ServerPre-built IntelligenceML Pipelining Azure Cognitive Services Azure Data LakeAnalytics Spark on HDI Azure Data LakeAnalytics SQL Server R Services Data Mining SSASBuilt-inFeatures Azure ML Spark on HDIIntegration withOperational Apps Azure ML Azure Cognitive Services Microsoft ROperationalizationRich Model Interpretability SSAS Data Mining Microsoft R ServerScalability (Big Data) Microsoft R Server Spark on HDI

My BackgroundApplying Computational Intelligence in Data Mining Honorary Research Fellow, School of Computing , University of Kent. Ph.D. Computer Science, University of Kent, Canterbury, UK. 28 published journal and conference papers in the fields of AI and MLhttps://www.researchgate.net/profile/Khalid 2017

Azure Machine Learning Build and deploy models in the cloud Import Data Publish Result Input Web Services Batch Scoring Retrain Model . Azure Machine Learning Real-time Predictions App Event Hub Stream Analytics Power BI Azure ML Web Service Send data points Consume messages Send Input Receive Output Send Results (Input, Output) Azure Machine Learning Built-in Features . Azure Machine Learning .