Transcription
Copyright 2016 Splunk Inc.Using the Splunk Machine Learning Toolkitto Create Your Own Custom ModelsDr. Adam OlinerDirector of Engineering, Data Science, SplunkManish SainaniPrincipal Product Manager, Splunk
DisclaimerDuring the course of this presentation, we may make forward looking statements regarding futureevents or the expected performance of the company. We caution you that such statements reflect ourcurrent expectations and estimates based on factors currently known to us and that actual events orresults could differ materially. For important factors that may cause actual results to differ from thosecontained in our forward-looking statements, please review our filings with the SEC. The forward-lookingstatements made in the this presentation are being made as of the time and date of its live presentation.If reviewed after its live presentation, this presentation may not contain current or accurate information.We do not assume any obligation to update any forward looking statements we may make. In addition,any information about our roadmap outlines our general product direction and is subject to change atany time without notice. It is for informational purposes only and shall not, be incorporated into anycontract or other commitment. Splunk undertakes no obligation either to develop the features orfunctionality described or to include any such feature or functionality in a future release.2
Who are we?Dr. Adam Oliner– Director of Engineering, Data Science & Machine Learning– Splunker for 2 years– Embarrassingly overeducatedManish Sainani– Principal Product Manager, Machine Learning– Splunker for 2 years– First ML hire at Splunk!3
What are we doing here?Overview of Machine LearningThe Assistants: Guided Machine es– DIY Anomaly Detector– Customer Applications4
Overview of ML at SplunkCore Platform SearchPackaged PremiumSolutionsCustom MLPlatform for Operational Intelligence
Splunk Machine Learning ToolkitExtends Splunk platform functions and provides a guided modeling environmentAssistants:Guide model building, testing,& deploying for common objectivesShowcases: Interactive examples for typicalIT, security, business, IoT use casesAlgorithms: 25 standard algorithms availableprepackaged with the toolkitSPL ML Commands: New commands tofit, test and operationalize modelsPython for Scientific Computing Library: 300 open source algorithms available for useBuild custom analytics for any use case
What’s New since our 0.9 Beta Release (last year’s .conf)? New name and abbreviation ;-) No event limits (removal of 50K limiton fitting models) Configurable resource caps via mlspl.conf Search head clustering support Distributed / streaming apply Scheduled fit New algorithms (next slide)– Feature engineering and selection– Stochastic gradient descent (e.g.)– ARIMA 7Multi-algorithm support acrossAssistantsScatterplot matrix vizAlertingTooltipsIn-app toursCluster Numeric Events assistantVideos videos videos for eachassistant across IT, Security, IoT andBusiness AnalyticsML-SPL Cheat Sheet
Algorithms supported (v2.0, .conf2016)
The Assistants:Guided Machine Learning
Machine LearningA process for generalizing from examplesExamples–––––A, B, #A, B, . aXpast Xfuturelike with like Xpredicted – Xactual ing)(anomaly detection)10
Machine Learning luateExplore/VisualizeModel11
Machine Learning Process with nf,transforms.conf,DatamodelsAdd-ons from Splunkbase, isualizeML ToolkitModel12Pivot, Table UI, SPL
Custom Machine Learning – Success FormulaSet business/opsprioritiesDomainExpertiseIdentify use casesDrive decisions(IT, Security, )Statistics / math backgroundSPLData prepOperational m selectionModel buildingSplunk ML Toolkitfacilitates and simplifiesvia examples & guidance
Guided ML with the AssistantsGuides you through various analytics– Prepare, fit, validate, and deployAutomatically generates all the relevant SPL14
Assistants: Fit15
Assistants: Validate16
Assistants: Deploy17
The Assistants1.Predict Numeric Fields2.Predict Categorical FieldsDetect Numeric Outliers3.5.Detect Categorical OutliersForecast Time Series6.Cluster Numeric Events4.18
Predict Numeric FieldsAlgorithms– LinearRegressionê including Lasso, Ridge, and ssorRandomForestRegressorSGDRegressorValidation– Four visualizations of prediction error– R2 and RMSE19
Predict Categorical lassifierSVMNaïve Bayesê BernoulliNB and GuassianNBValidation– Precision, recall, accuracy, F1– Confusion matrix20
Detect Numeric OutliersMethods– Standard deviation– Median absolute deviation– Interquartile rangeValidation:21
Detect Categorical OutliersStatistical methodsValidation:22
Forecast Time SeriesAlgorithms– State-space method using Kalman filter– ARIMAValidation23
Cluster Numeric ralClusteringValidation– Scatterplot Matrix viz24
Prepare
Data Gathering and PrepSource: CrowdFlower26
Splunk!Leading platform for collecting, cleaning, and transforming dataInteractive Field ExtractorDatamodelsHundreds of add-ons from Splunkbasetransforms.confprops.confetc.27
Feature EngineeringTFIDF (term-frequency x inverse document-frequency)– Transform free-form text into numeric attributesStandardScaler (i.e. normalization)FieldSelector (i.e. choose k best features for regression/classification)PCA and KernelPCA
Preprocessing in the Assistants29
Fit
Fit: What’s NewNo event limitsConfigurable resource caps (ml-spl.conf)Search head clustering supportScheduled fitNew algorithms31
Fit: What’s New32
Validate
Validate / Apply: What’s NewConfigurable resource capsSearch head clustering supportDistributed / streaming applyScatterplot matrix viz34
Scatterplot Matrix Viz35
Deploy
Deploy anywhere in Splunk!Scheduled trainingAlertingReports and dashboardsAugmented search resultsetc.37
Deploy: What’s NewDistributed Apply– Apply models to indexed data– StreamingScheduled trainingAlerting38
What’s New: Scheduled Fit39
What’s New: Alerting40
Example:DIY Anomaly Detector
Let’s Build an Anomaly Detector!We’ll use two Assistants– Predict Numeric Fields– Detect Numeric OutliersShow automatically-generated intermediate SPL42
Fit a Predictive Model43
Set up Scheduled Training44
Open Residuals in Search45
Open Detect Numeric Outliers Assistant46
Detect Outliers (Large Prediction Errors)47
Schedule an Alert48
Schedule an Alert49
Schedule an Alert50
Manage Your New Anomaly Detector51
The Assistant Generated the SPL for You52
The Assistant Generated the SPL for You53
You Built an Anomaly Detector!You built a predictive model of AC PowerWhen the prediction error from this model is an outlier compared topast errors, you generate an alertThis predictive model automatically retrains itself on a schedule youcontrolYou didn’t have to type any SPL54
#winning
Machine Learning Customer SuccessNetwork OptimizationDetect & Prevent Equipment FailureSecurity / Fraud PreventionEntertainmentCompanyPrevent Cell Tower FailureOptimize Repair OperationsPrioritize Website Issuesand Predict Root CauseMachine Learning Consulting ServicesPredict Gaming OutagesFraud PreventionAnalytics App built on ML ToolkitOptimizing operations and business results15
Machine Learning Toolkit Customer Use CasesReducing customer service disruption with early identification of difficult-to-detect network incidentsMinimizing cell tower degradation and downtime with improved issue detection sensitivitySpeeding website problem resolution by automatically ranking actions for support engineersEnsuring mobile device security by detecting anomalies in ID authenticationEntertainmentCompanyPredicting and averting potential gaming outage conditions with finer-grained detectionPreventing fraud by Identifying malicious accounts and suspicious activitiesImproving uptime and lowering costs by predicting/preventing cell tower failures andoptimizing repair truck rolls57
Detect Network OutliersReduced downtime increased service availability better customer satisfactionML Use CaseMonitor noise rise for 20,000 cell towers to increase service and deviceavailability, reduce MTTRTechnical overview A customized solution deployed in production based on outlier detection. Leverage previous month data and voting algorithms“The ability to model complex systems and alert on deviations is where IT and securityoperations are headed Splunk Machine Learning has given us a head start.”58
Reliable website updatesProactive website monitoring leads to reduced downtimeML Use Case Very frequent code and config updates (1000 daily) can cause site issues Find errors in server pools, then prioritize actions and predict root causeTechnical overview Custom outlier detection built using ML Toolkit Outlier assistant Built by Splunk Architect with no Data Science background“Splunk ML helps us rapidly improve end-user experience by ranking issue severitywhich helps us determine root causes faster thus reducing MTTR and improving SLA”59
What Now?http://tiny.cc/splunkmlappGet the Machine Learning Toolkit from SplunkbaseGo watch Machine Learning Videos on Splunk Youtube Channel http://tiny.cc/splunkmlvideosGo to Machine Learnings talks:––Advanced Machine Learning in SPL with the Machine Learning Toolkit by Jacob LeverichExtending SPL with Custom Search Commands and the Splunk SDK for Python by Jacob LeverichSeveral Customers and Partner Talks–Cisco, Scianta Analytics, Asian Telco, etc.Early Adopter And Customer Advisory Program : mlprogram@splunk.comProduct Manager: Manish Sainani ms@splunk.comField Expert: Andrew Stein astein@splunk.com60
THANK YOU
Splunk Machine Learning Toolkit Assistants: Guide model building, testing, & deploying for common objectives Showcases:Interactive examples for typical IT, security, business, IoT use cases Algorithms: 25 standard algorithms available prepackaged with the toolkit SPL ML Commands: New commands to fit, test and operationalize models