Using The SplunkMachine Learning . - .conf21 Splunk

Transcription

Copyright 2016 Splunk Inc.Using the Splunk Machine Learning Toolkitto Create Your Own Custom ModelsDr. Adam OlinerDirector of Engineering, Data Science, SplunkManish SainaniPrincipal Product Manager, Splunk

DisclaimerDuring the course of this presentation, we may make forward looking statements regarding futureevents or the expected performance of the company. We caution you that such statements reflect ourcurrent expectations and estimates based on factors currently known to us and that actual events orresults could differ materially. For important factors that may cause actual results to differ from thosecontained in our forward-looking statements, please review our filings with the SEC. The forward-lookingstatements made in the this presentation are being made as of the time and date of its live presentation.If reviewed after its live presentation, this presentation may not contain current or accurate information.We do not assume any obligation to update any forward looking statements we may make. In addition,any information about our roadmap outlines our general product direction and is subject to change atany time without notice. It is for informational purposes only and shall not, be incorporated into anycontract or other commitment. Splunk undertakes no obligation either to develop the features orfunctionality described or to include any such feature or functionality in a future release.2

Who are we?Dr. Adam Oliner– Director of Engineering, Data Science & Machine Learning– Splunker for 2 years– Embarrassingly overeducatedManish Sainani– Principal Product Manager, Machine Learning– Splunker for 2 years– First ML hire at Splunk!3

What are we doing here?Overview of Machine LearningThe Assistants: Guided Machine es– DIY Anomaly Detector– Customer Applications4

Overview of ML at SplunkCore Platform SearchPackaged PremiumSolutionsCustom MLPlatform for Operational Intelligence

Splunk Machine Learning ToolkitExtends Splunk platform functions and provides a guided modeling environmentAssistants:Guide model building, testing,& deploying for common objectivesShowcases: Interactive examples for typicalIT, security, business, IoT use casesAlgorithms: 25 standard algorithms availableprepackaged with the toolkitSPL ML Commands: New commands tofit, test and operationalize modelsPython for Scientific Computing Library: 300 open source algorithms available for useBuild custom analytics for any use case

What’s New since our 0.9 Beta Release (last year’s .conf)? New name and abbreviation ;-) No event limits (removal of 50K limiton fitting models) Configurable resource caps via mlspl.conf Search head clustering support Distributed / streaming apply Scheduled fit New algorithms (next slide)– Feature engineering and selection– Stochastic gradient descent (e.g.)– ARIMA 7Multi-algorithm support acrossAssistantsScatterplot matrix vizAlertingTooltipsIn-app toursCluster Numeric Events assistantVideos videos videos for eachassistant across IT, Security, IoT andBusiness AnalyticsML-SPL Cheat Sheet

Algorithms supported (v2.0, .conf2016)

The Assistants:Guided Machine Learning

Machine LearningA process for generalizing from examplesExamples–––––A, B, #A, B, . aXpast Xfuturelike with like Xpredicted – Xactual ing)(anomaly detection)10

Machine Learning luateExplore/VisualizeModel11

Machine Learning Process with nf,transforms.conf,DatamodelsAdd-ons from Splunkbase, isualizeML ToolkitModel12Pivot, Table UI, SPL

Custom Machine Learning – Success FormulaSet business/opsprioritiesDomainExpertiseIdentify use casesDrive decisions(IT, Security, )Statistics / math backgroundSPLData prepOperational m selectionModel buildingSplunk ML Toolkitfacilitates and simplifiesvia examples & guidance

Guided ML with the AssistantsGuides you through various analytics– Prepare, fit, validate, and deployAutomatically generates all the relevant SPL14

Assistants: Fit15

Assistants: Validate16

Assistants: Deploy17

The Assistants1.Predict Numeric Fields2.Predict Categorical FieldsDetect Numeric Outliers3.5.Detect Categorical OutliersForecast Time Series6.Cluster Numeric Events4.18

Predict Numeric FieldsAlgorithms– LinearRegressionê including Lasso, Ridge, and ssorRandomForestRegressorSGDRegressorValidation– Four visualizations of prediction error– R2 and RMSE19

Predict Categorical lassifierSVMNaïve Bayesê BernoulliNB and GuassianNBValidation– Precision, recall, accuracy, F1– Confusion matrix20

Detect Numeric OutliersMethods– Standard deviation– Median absolute deviation– Interquartile rangeValidation:21

Detect Categorical OutliersStatistical methodsValidation:22

Forecast Time SeriesAlgorithms– State-space method using Kalman filter– ARIMAValidation23

Cluster Numeric ralClusteringValidation– Scatterplot Matrix viz24

Prepare

Data Gathering and PrepSource: CrowdFlower26

Splunk!Leading platform for collecting, cleaning, and transforming dataInteractive Field ExtractorDatamodelsHundreds of add-ons from Splunkbasetransforms.confprops.confetc.27

Feature EngineeringTFIDF (term-frequency x inverse document-frequency)– Transform free-form text into numeric attributesStandardScaler (i.e. normalization)FieldSelector (i.e. choose k best features for regression/classification)PCA and KernelPCA

Preprocessing in the Assistants29

Fit

Fit: What’s NewNo event limitsConfigurable resource caps (ml-spl.conf)Search head clustering supportScheduled fitNew algorithms31

Fit: What’s New32

Validate

Validate / Apply: What’s NewConfigurable resource capsSearch head clustering supportDistributed / streaming applyScatterplot matrix viz34

Scatterplot Matrix Viz35

Deploy

Deploy anywhere in Splunk!Scheduled trainingAlertingReports and dashboardsAugmented search resultsetc.37

Deploy: What’s NewDistributed Apply– Apply models to indexed data– StreamingScheduled trainingAlerting38

What’s New: Scheduled Fit39

What’s New: Alerting40

Example:DIY Anomaly Detector

Let’s Build an Anomaly Detector!We’ll use two Assistants– Predict Numeric Fields– Detect Numeric OutliersShow automatically-generated intermediate SPL42

Fit a Predictive Model43

Set up Scheduled Training44

Open Residuals in Search45

Open Detect Numeric Outliers Assistant46

Detect Outliers (Large Prediction Errors)47

Schedule an Alert48

Schedule an Alert49

Schedule an Alert50

Manage Your New Anomaly Detector51

The Assistant Generated the SPL for You52

The Assistant Generated the SPL for You53

You Built an Anomaly Detector!You built a predictive model of AC PowerWhen the prediction error from this model is an outlier compared topast errors, you generate an alertThis predictive model automatically retrains itself on a schedule youcontrolYou didn’t have to type any SPL54

#winning

Machine Learning Customer SuccessNetwork OptimizationDetect & Prevent Equipment FailureSecurity / Fraud PreventionEntertainmentCompanyPrevent Cell Tower FailureOptimize Repair OperationsPrioritize Website Issuesand Predict Root CauseMachine Learning Consulting ServicesPredict Gaming OutagesFraud PreventionAnalytics App built on ML ToolkitOptimizing operations and business results15

Machine Learning Toolkit Customer Use CasesReducing customer service disruption with early identification of difficult-to-detect network incidentsMinimizing cell tower degradation and downtime with improved issue detection sensitivitySpeeding website problem resolution by automatically ranking actions for support engineersEnsuring mobile device security by detecting anomalies in ID authenticationEntertainmentCompanyPredicting and averting potential gaming outage conditions with finer-grained detectionPreventing fraud by Identifying malicious accounts and suspicious activitiesImproving uptime and lowering costs by predicting/preventing cell tower failures andoptimizing repair truck rolls57

Detect Network OutliersReduced downtime increased service availability better customer satisfactionML Use CaseMonitor noise rise for 20,000 cell towers to increase service and deviceavailability, reduce MTTRTechnical overview A customized solution deployed in production based on outlier detection. Leverage previous month data and voting algorithms“The ability to model complex systems and alert on deviations is where IT and securityoperations are headed Splunk Machine Learning has given us a head start.”58

Reliable website updatesProactive website monitoring leads to reduced downtimeML Use Case Very frequent code and config updates (1000 daily) can cause site issues Find errors in server pools, then prioritize actions and predict root causeTechnical overview Custom outlier detection built using ML Toolkit Outlier assistant Built by Splunk Architect with no Data Science background“Splunk ML helps us rapidly improve end-user experience by ranking issue severitywhich helps us determine root causes faster thus reducing MTTR and improving SLA”59

What Now?http://tiny.cc/splunkmlappGet the Machine Learning Toolkit from SplunkbaseGo watch Machine Learning Videos on Splunk Youtube Channel http://tiny.cc/splunkmlvideosGo to Machine Learnings talks:––Advanced Machine Learning in SPL with the Machine Learning Toolkit by Jacob LeverichExtending SPL with Custom Search Commands and the Splunk SDK for Python by Jacob LeverichSeveral Customers and Partner Talks–Cisco, Scianta Analytics, Asian Telco, etc.Early Adopter And Customer Advisory Program : mlprogram@splunk.comProduct Manager: Manish Sainani ms@splunk.comField Expert: Andrew Stein astein@splunk.com60

THANK YOU

Splunk Machine Learning Toolkit Assistants: Guide model building, testing, & deploying for common objectives Showcases:Interactive examples for typical IT, security, business, IoT use cases Algorithms: 25 standard algorithms available prepackaged with the toolkit SPL ML Commands: New commands to fit, test and operationalize models