Microsoft Advanced Analytics - Grupo De Usuarios De R De Madrid

Transcription

MicrosoftAdvanced AnalyticsJuan Carlos Rodriguez Garcíajurodr@microsoft.comData Platform Solution Architect

VALORFuente: GartnerDIFICULTAD

BancaOmnicanalMAXMaximizar el Tiempo,Todo el Conductualen Tienda

Los desplegables“tontos” van adesaparecerLas metodologías Agilepermiten aplicar conocimientoa los productos muyrápidamente

Microsoft R ServerR OpenMicrosoft R ServerDevelopRDeployRAzure Machine LearningCognitiveServices

Retos de R?Volumende DatosParalelizaciónSoporte?Despliegue

Soluciones de Microsoft R Server?Streaming deDatosSingle SourceMulti ThreadMulti NodoSoporteMSFT?DespliegueSobre Clustery Cloud

ComunidadR OpenComercialSQL ServerR ServicesWindowsR ServerLinuxHadoop Teradata

R CRAN Open source R interpreter R 3.1.2 Freely-available huge range of Ralgorithms Algorithms callable by RevoR Embeddable in R scripts 100% Compatible with existing R scripts,functions and packagesR OpenMicrosoft R ServerDevelopR DeployRScaleRRevoR Performance enhanced Rinterpreter Based on open source R Adds high-performancemath library to speed uplinear algebra functions Ready-to-Use high-performancebig data big analytics Fully-parallelized analytics Data prep & data distillation Descriptive statistics & statistical tests Range of predictive functions User tools for distributing customized R algorithmsacross nodes Wide data sets supported – thousands of variablesConnectR High-speed & directconnectorsAvailable for: High-performance XDF SAS, SPSS, delimited & fixedformat text data files Hadoop HDFS (text & XDF) Teradata Database & Aster EDWs and ADWs ODBCDistributedR Distributed computing framework Delivers cross-platform portability

Data StepStatistical TestsVariable SelectionData import – Delimited, Fixed, SAS, SPSS, OBDCChi Square TestStepwise RegressionVariable creation & transformationKendall Rank CorrelationRecode variablesFisher’s Exact TestSimulationFactor variablesStudent’s t-TestSimulation (e.g. Monte Carlo)SamplingParallel Random Number GenerationSubsample (observations & variables)Cluster AnalysisMissing value handlingSort, Merge, SplitAggregate by category (means, sums)Descriptive StatisticsRandom SamplingK-MeansMin / Max, Mean, Median (approx.)Predictive ModelsQuantiles (approx.)Sum of Squares (cross product matrix for set variables)ClassificationStandard DeviationQuantiles (approx.)Decision TreesVarianceDecision ForestsCovarianceGeneralized Linear Models (GLM) exponential family distributions:binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standardlink functions: cauchit, identity, log, logit, probit. User defineddistributions & link functions.Sum of Squares (cross product matrix for set variables)Covariance & Correlation MatricesPairwise Cross tabsLogistic RegressionCombinationRisk Ratio & Odds RatioClassification & Regression TreesrxDataStepCross-Tabulation of Data (standard tables & long form)Predictions/scoring for modelsrxExecMarginal Summaries of Cross TabulationsResiduals for all modelsPEMA-R API Custom AlgorithmsCorrelationGradient Boosted Decision TreesNaïve BayesNew

Procesamiento Paralelo Local### SETUP LOCAL ENVIRONMENT VARIABLES ###myLocalCC - “localpar”Se establecedónde seejecutará elmodelo### LOCAL COMPUTE CONTEXT ###rxSetComputeContext(myLocalCC)### CREATE LINUX, DIRECTORY AND FILE OBJECTS ###localFS - RxNativeFileSystem()AirlineDataSet - RxXdfData(“AirlineDemoSmall.xdf”,fileSystem localFS)Modelofuncional, noafectado por loscontextos### ANALYTICAL PROCESSING ###### Statistical Summary of the datarxSummary( ArrDelay DayOfWeek, data AirlineDataSet, reportProgress 1)### CrossTab the datarxCrossTabs(ArrDelay DayOfWeek, data AirlineDataSet, means T)### Linear Model and plothdfsXdfArrLateLinMod - rxLinMod(ArrDelay DayOfWeek 0 , data AirlineDataSet)plot(hdfsXdfArrLateLinMod coefficients)Copyright Microsoft Corporation. All rights reserved.

Procesamiento Paralelo Local### SETUP LOCAL ENVIRONMENT VARIABLES ###myLocalCC - “localpar”Se establecedónde seejecutará elmodelo### LOCAL COMPUTE CONTEXT ###rxSetComputeContext(myLocalCC)### CREATE LINUX, DIRECTORY AND FILE OBJECTS ###localFS - RxNativeFileSystem()AirlineDataSet - RxXdfData(“AirlineDemoSmall.xdf”,fileSystem localFS)Modelofuncional, noafectado por loscontextosProcesamiento DistribuidomyHadoopCC - RxHadoopMR()rxSetComputeContext(myHadoopCC)hdfsFS - RxHdfsFileSystem()hdfsFS### ANALYTICAL PROCESSING ###### Statistical Summary of the datarxSummary( ArrDelay DayOfWeek, data AirlineDataSet, reportProgress 1)### CrossTab the datarxCrossTabs(ArrDelay DayOfWeek, data AirlineDataSet, means T)### Linear Model and plothdfsXdfArrLateLinMod - rxLinMod(ArrDelay DayOfWeek 0 , data AirlineDataSet)plot(hdfsXdfArrLateLinMod coefficients)Copyright Microsoft Corporation. All rights reserved.

ices/en-us/apis

Juan Carlos Rodriguez Garcíajurodr@microsoft.comData Platform Solution Architect

SAS, SPSS, delimited & fixed format text data files Hadoop HDFS (text & XDF) Teradata Database & Aster EDWs and ADWs ODBC ScaleR Ready-to-Use high-performance big data big analytics Fully-parallelized analytics Data prep & data distillation Descriptive statistics & statistical tests Range of predictive functions