Forecasting For Unified Communications Networks S9385

Transcription

S9385 AI-Based Anomaly Detections and ThreatForecasting for Unified Communications NetworksKevin Riley – CTO, RibbonTim Thornton - Director Software Engineering, Ribbon

About RibbonRibbon is a global leader in secure real-time communicationsproviding software, cloud , core, and edge networkinfrastructure solutions to service providers and enterprises.2Ribbon Communications Confidential and Proprietary

About RibbonFour Decades of Combined Leadership Experience in Real TimeCommunications 2,300 Employees and Doing Business in 100 countries1,000 Service Provider and Enterprise Customers Globally#1 in VoIP Switching, #1 E-SBC, #2 CSP SBC, #1 in Media Gateways800 Patents WorldwidePublicly Traded Company on NASDAQLeadership Ranking Source: IHS Research and ExactVentures 3Q-2018 Market share data (Ribbon includes GENBAND, Sonus, and Edgewater)3Ribbon Communications Confidential and Proprietary

Where You Will Find UsMore than 350U.S. Departmentof Defense LocationsThe World'sLeading Tier OneService ProvidersThe Largest Banks,Airlines, Retailersand Manufacturersacross the Globe 4Ribbon Communications Confidential and Proprietary

Ribbon ProtectBig-data Analytics to Secure Communications NetworksUse CasesToll FraudRTC SecurityContinuous MonitoringImprove OperationsAccelerate InvestigationsAnalyticsBig DataAdded context to investigations,visualization, multi sourced datacollection, automation, drill downProtectHigh-speeddata ingestionIntelligent OperationsThreat Intel SharingHadoopConsolidate RTC tools, NW Policyenforcement, active monitoring,troubleshooting, SOC/SIEM integrationDataEnrichmentIncidentML /ManagementBehavior AnalyticsCommunications NetworkSensors /Enforcers53rd Party SBCFirewallIP-PBXGPUAcceleration

Goals: Use Deep Learning to model user / networkCall signaturesAnalysis& PolicyBig DataAutomationSelf-HealingPredictionReal-time Communications Networks6

Modeling calls in a real-time Communication NetworkChallengesNetworkComplexityNetwork behavior variesgreatly between operators.Machine learning modelsmust be built and trained withoperators data to capture theunique characteristics of theirnetwork.Feature significance vary fromoperator to operator and maychange over time7Data DimensionalityAnalytics ScaleInput sources contain highdimensional, text based datathat results in large featuressetsCall rates per sec(in 10’s of thousands)pose challenges for real-timebased modeling and detectionMetrics(KPI’s) used forbehaviors models can numberfrom 10’s to 1000’s whichpresents significant resourcechallenges.Billions of records per day foranalytical processingSecurity incidents andoperational events can takesignificant time to detect

The ApproachParameterizeApply machine learning techniques to create features for call flows, userbehavior and endpoint informationModelInitial KeyFocusAreasLeverage deep learning to model typical or normative behavior such thatanomalies can be readily identified and acted onOperationalForecasting and thresholding network KPI’sIdentifying anomalous behaviors on network resourcesSecurityBehavioral modeling of subscribers usage and network calling patternsIdentifying security anomalies of subscribers actions8

SIP Call SignatureHypothesisApplicationsUse Call signaling informationto create a “signature” Service Assurance (Operational)–––Understand types of devices on networkOnboarding new devicesDetermining distribution of devices Network Security–Identity Management User activity monitoring (think bank and credit card) Changes in user features as compared to corpus Changes in user and device relationshipsDatasetsML Algorithms–Behavioral Changes in users calling patterns Changes in network usageFeatureEngineeringFeatureScalingData Preparation9ModelingEvaluationand TuningDeployment

Unified Communications Data SourcesCDR – Call Detail Records Created at the beginning and end ofcalls (ATTEMPT, START, STOP) CSV format with 300 columns Contains summary information aboutthe calls (duration, quality, packets). Typically used for operator billingLogs/pCap (SIP Messages) Unstructured text Much higher data volume than CDR Requires protocol parsing toparameterize Minimum of 4 messages per callChallenge in building machine learning solutionLack of labelled data Getting access to enoughtraining data Diversity of device types10Scope of data attributes Device types, call types, deviceconfigurations/options, networkmodifications

Session Initiated Protocol (SIP) OverviewINVITE sip: 17325551234@10.2.0.1:5060 SIP/2.0Via: SIP/2.0/UDP 192.168.1.1:0;branch z9hG4bK-14243-27817-0From: 13155559999 sip: 13155559999 @192.168.1.1:0 ;tag 14243SIPpTag0027817To: 17325551234 sip: 17325551234@10.2.0.1:5060 Call-ID: 387A9EFB@192.168.1.1CSeq: 1 INVITEContact: sip: 13155559999 @192.168.1.1:0Max-Forwards: 70Subject: Performance TestContent-Type: application/sdpContent-Length: 137v 0o user1 53655765 2353687637 IN IP4 192.168.1.1s c IN IP4 192.168.1.1t 0 0m audio 6001 RTP/AVP 0a rtpmap:0 PCMU/800011What is SIP Text based protocolSimilar to HTTP“Soft” standard- Syntax- Parameters- ExtensibilityLends to vendor specificimplementations which we canleverage

SIP Message – Device featuresIdentify “what” is making a callINVITE sip: 17325551234@10.2.0.1:5060 SIP/2.0Via: SIP/2.0/UDP 192.168.1.1:0;branch z9hG4bK-14243-27817-0From: 13155559999 sip: 13155559999 @192.168.1.1:0 ;tag 14243SIPpTag0027817To: 17325551234 sip: 17325551234@10.2.0.1:5060 Call-ID: 387A9EFB@192.168.1.1CSeq: 1 INVITEContact: sip: 13155559999 @192.168.1.1:0Max-Forwards: 70Subject: Performance TestHeader inclusion/exclusionContent-Type: application/sdpFormat, parametersContent-Length: 137v 0o user1 53655765 2353687637 IN IP4 192.168.1.1s c IN IP4 192.168.1.1t 0 0m audio 6001 RTP/AVP 0a rtpmap:0 PCMU/800012Header OrderSyntax

SIP Message – User featuresIdentify “who” is making this callINVITE sip: 17325551234@10.2.0.1:5060 SIP/2.0Via: SIP/2.0/UDP 192.168.1.1:0;branch z9hG4bK-14243-27817-0From: 13155559999 sip: 13155559999 @192.168.1.1:0 ;tag 14243SIPpTag0027817To: 17325551234 sip: 17325551234@10.2.0.1:5060 Call-ID: 387A9EFB@192.168.1.1CSeq: 1 INVITEContact: sip: 13155559999 @192.168.1.1:0Max-Forwards: 70Subject: Performance TestUser identificationContent-Type: application/sdpUser parametersContent-Length: 137v 0o user1 53655765 2353687637 IN IP4 192.168.1.1s c IN IP4 192.168.1.1t 0 0m audio 6001 RTP/AVP 0a rtpmap:0 PCMU/800013Route (via)IP information

SIP Message – Destination featuresIdentify “where” the call is goingINVITE sip: 17325551234@10.2.0.1:5060 SIP/2.0Via: SIP/2.0/UDP 192.168.1.1:0;branch z9hG4bK-14243-27817-0From: 13155559999 sip: 13155559999 @192.168.1.1:0 ;tag 14243SIPpTag0027817To: 17325551234 sip: 17325551234@10.2.0.1:5060 Call-ID: 387A9EFB@192.168.1.1CSeq: 1 INVITEContact: sip: 13155559999 @192.168.1.1:0Max-Forwards: 70Destination informationSubject: Performance TestContent-Type: application/sdpType of callContent-Length: 137Media informationv 0o user1 53655765 2353687637 IN IP4 192.168.1.1s c IN IP4 192.168.1.1t 0 0m audio 6001 RTP/AVP 0a rtpmap:0 PCMU/800014

SIP Message – Call featuresIdentify details of this callINVITE sip: 17325551234@10.2.0.1:5060 SIP/2.0Via: SIP/2.0/UDP 192.168.1.1:0;branch z9hG4bK-14243-27817-0From: 13155559999 sip: 13155559999 @192.168.1.1:0 ;tag 14243SIPpTag0027817To: 17325551234 sip: 17325551234@10.2.0.1:5060 Call-ID: 387A9EFB@192.168.1.1CSeq: 1 INVITEContact: sip: 13155559999 @192.168.1.1:0Max-Forwards: 70Subject: Performance TestIdentify of specific callContent-Type: application/sdpCalling,CalledContent-Length: 137v 0o user1 53655765 2353687637 IN IP4 192.168.1.1s c IN IP4 192.168.1.1t 0 0m audio 6001 RTP/AVP 0a rtpmap:0 PCMU/800015Call idenfication attributescallId, Tags, RoutingType of callStatistics (duration, etc)

Creating Machine Learning FeaturesData PreparationExample of a few techniques used to create features from SIP messages: Header Presence – for each header in message identify number of occurrencesHeader Sequence – identifies the sequence or order of a header in the messageHeader Syntax – the original message syntax for the header name (upper/lower)Masks – creates a format mask for implementing specific SIP parameters. Typically helpful to identify a device specific implementation Where: N – numericU – upper caseL – lower caseS – spaceX – special characterZ – other Example: Encoding tag value contained in from header From: ;tag 14243SIPpTag0027817 - NNNNNUUULULLNNNNNNN16

Choosing a Machine Learning ModelWhat can we do with the data we have ? Limited to an Unsupervised Learning model Looked at various clustering models Neural networks generative models promising Autoencoder seems to fit our problem-Tested with several autoencoder configurationsVariational autoencoder provided best resultsAutoencoders are a specific type of feedforward neural networks where theinput is the same as the output. They compress the input into alower-dimensional code and then reconstruct the output from thisrepresentation. The code is a compact “summary” or “compression” of theinput, also called the latent-space representation. Use SIP featured to train multiple autoencoders Device, User, Destination, Call models Use latent-space(compressed) as a digitalsignature for each feature17Latent-space

Implementing an AutoencoderCreating a “signature”Training phase Autoencoder minimizes loss between inputfeatures and output features of the trainingdata Latent-space layer compresses the learnedinformation from the input perational phase Trained model uses only Encoder portion ofnetwork Latent-space vector becomes the ‘signature’Lossx1x2s1x3s2.vector [s1.n].sn.xn18.Signatureyn x1.n – y1.n

Service AssuranceUsing AI to enhance network operations Provide visibility into operators network With Device features: Mapping devices in network- Metadata provides visibility into device attributes Determine density of device types Notification of new device types in network With User and Destination features: Identify call flow patterns Operational actions Onboarding new device types Identify network interoperability requirements Network Resource Management Capacity planning and forecasting19

Device AutoencoderSignature visualization20

Service AssuranceApplication ExamplesUCUCMessagesMessagesEnriched- Message- dapterProtocolconverterInteropSignature DB-ExtractFeaturesEncoderSignature DBGPUGPUOnboarding ApplicationEnrichment ApplicationDevice InfoVendorSoftwareBig Data(Analytics)

Identity ManagementUsing AI to protect network and users Provide insight to endpoints and users Security actions 22With Device features: Identify malicious or misbehaving devices Reporting new or unknown types of deviceWith Device with User features: Verification of user and device signatures Location (geo) Device type with this user Detecting concurrent user instancesBlock malicious devices and usersIdentify security vulnerabilities in network devicesFeed anomalies into fraud applicationsGenerate incidents to SIEM

User AutoencoderSignature visualization23

Detecting AnomaliesCombining signatures for more advanced ancefrom NormUserSignaturesUserHistory24Anomaly

Detecting Anomalies - exampleCombining device and user signaturesKnown GoodUser signatureDevice SignatureIncoming message signature distances arenear known good signatures25Incoming MessageUser signatureDevice signatureIncoming message signaturedistances are anomalistic fromknown good signaturesNormalAnomalyMinimum distance [0.2661066981234524, 0.08135181163868711]Minimum distance [3.211263703324535, 0.9243276898181685]

Identity ManagementApplication dentManagementMitigation back loop – labelling dataDevice/UserDBSecurity Applications26Behavioral

Hypothesis to DeploymentPerformance demands require GPU’s Scaling to production volume is a significant challenge 10’s of thousand calls/sec, 10’s of MB of data/sec Ingestion, extraction and predicting/encoding are all bottlenecks AI model complexity increase processing demands27

Maximizing System ResourcesChoosing the right tool for the job How we split AI pipeline Ingestion optimization andenrichment throughdistributed CPU nodesData Preparation and filteringas a common CPU serviceModel Predictions/Encodingthrough GPUResults processed by CPUbased applications Nothing comes for free 28Data movement becomesnew bottleneckUsing GPU, CPU memoryconsumption increasesCPU Only CPU/GPU

Actual performance impact using a GPUIncreasing the volume -“Turn it up to 11” Hardware Software Python 3.6.6Tensorflow 1.11.0KerasAI pipeline performance I7-8700K 3.7G 6 Core32 GB Memory1T SSD HDNvidia GTX-1080Model with 2.7M networkparametersVaried encode batch sizefrom 1-8000Results 29Optimal batch size – 1000GPU 15,366 encodes/secCPU 9,433 encodes/sec

Summary Ribbon is using AI to create new applications for Service Providers Initial focus on Service Assurance and Identify Management Signatures for call flowsAI is enabling innovative solutions and advanced analytic capabilities Anomaly detection ForecastingBuilding knowledge through system deployment Labeling data Ribbon & NVIDIA 30NVIDIA GPU’s enhance Ribbon Protect to meet the scaling requirements of ourcustomers and applicationsNVIDIA resources (tools and libraries) address many or our development hurdles Lets Ribbon focus on value added applications NVIDIA Kubernetes distribution & NVIDIA Container runtime – easy integration intoRibbon Protect RAPIDS – researching how RAPIDS can to improve our AI pipeline processingNVIDIA has been a great partner in teaching, listening, and supporting Ribbon in it’s pathdown AI

Thank You

Unified Communications Data Sources CDR – Call Detail Records Created at the beginning and end of calls (ATTEMPT, START, STOP) CSV format with 300 columns Contains summary information about the calls (duration, quality, packets). Typically used for oper