UBA With ML - Final NO Comments

Transcription

Machine Learningfor User Behavior Anomaly DetectionEUGENE NEYOLOV, HEAD OF R&D

2

AUTHOREugene NeyolovCurrent InterestsHEAD OF R&DSecurity engineer and analyst leading appliedresearch projects in security monitoring,threat detection and user behavior analytics. Building products for Cyber security with Data science and Hype3

OUTLINE Whyo ERP Securityo User Behavior Analyticso Machine Learning Whato Static Anomalieso Temporal Anomalies HowoooooData PreparationSecurity AnalyticsSecurity Data ScienceMachine LearningAnomaly Detection4

ERPSecurity

ERP SECURITYBlind Spot Endpoint securityNetwork securityApplication securityIntrusion detectionIdentity and access governanceBusiness applications securityInfrastructure focusedprevention/detectionWhere a real ERP attack happens6

ERP SECURITYSweet TargetEnterprisesHR ManagementFinancial AccountingSales and DistributionMaterials ManagementQuality ManagementProduction PlanningPlant MaintenanceSupply Chains.Attackers7

User BehaviorAnalytics

USER BEHAVIOR ANALYTICSWhy? Legacy threat modelso Users are the easiest attack vector Legacy incident monitoringo Infrastructure security focused analysis Legacy security alerts analysiso No business context enrichment9

USER BEHAVIOR ANALYTICSWhat? User security monitoringUser-focused alert prioritizationAdvanced context enrichmentUser behavior vs. fraud analysiso UBA is about facts in the technical context-Developer must work with development server A but have accessed server B owned by the finance department-Salesman signs a contract with company A and not company B, because A is managed by a friendo Fraud is about intentions in a business context10

USER BEHAVIOR ANALYTICSHow? Create a user-centered threat modelIdentify user-related data sourcesBuild a user behavior baseline?PROFIT!!!11

MachineLearning

MACHINE LEARNINGWhy? Escape postmortem rules and signatures Self-adjusted dynamic behavior patterns Find hidden patterns in user behavior13

MACHINE LEARNINGWhat? ML y detection. Learning patterns from dataoooooSupervised learning with labeled dataUnsupervised learning without labeled dataSemi-supervised learning with tips from data or humansReinforcement learning with a performance feedback loop.14

MACHINE LEARNINGWhat? ML modelooooCodebaseFeatures structureModel parameters (learned)Model hyperparameters (architecture) ML featuresooooooCategorical (classes)Statistical (counts)Empirical (facts)ContinuousBinary.15

MACHINE LEARNINGHow?Data PreparationSecurity AnalyticsSecurity Data ScienceMachine LearningCollect event dataNormalize eventsEnrich eventsCategorize eventsBuild threat modelsMap events to threatsMap threats to algorithmsSelect and encode featuresDefine quality requirementsBuild a modelTrain a modelOptimize model parametersIncident AnalysisAnomaly DetectionUser behavior analysisPeer group analysisThreat classificationFeed a real dataDetect anomaliesPrioritize anomalies16

DataPreparation

DATA SOURCES APIsLog filesDatabasesLog archivesLog management toolsSecurity monitoring tools.18

DATA FORMATS SyslogCustom messRandom key-valueProprietary key-value (CEF, LEEF, .)Other terrible options (JSON, CSV, .)19

DATA NORMALIZATION Understand that messo When, Who, did What, Where from, Where to, on What Bring all formats to the same conventiono Implement a built-in convertor for each format as a part of the solution (inside)o Create a separate convertor tool and treat it as the data source for the model (outside)o Build event storage that allows event fields mapping, like Splunk or ELK (infrastructure) Find duplicates and missing fieldso One action generates several entrieso System doesn’t identify itself in its own logso User’s name is recorded, but not its IP (or vice versa)20

DATA NORMALIZATION: BEFORESAP Security Audit Log ABAP2AU520180313113209000030400001D1nsalab 00001D1nsalab SAP*SAPMSSY10001SLO6&SAPLSLO6&RSAU READ SAP*SESSION MANAGER 703002315800004D4MacBookSAP*SESSION MANAGER 4703002315800004D4MacBook-SAP*SESSION MANAGER 13114703002315800004D4MacBook-SAP*SESSION MANAGER RSRZLLG0 ACTUAL0011RSRZLLG0 200008D8MacBook-SAP*SE16SAPLSMTR 02&02&passedMacBook-ProNursulta21

DATA NORMALIZATION: AFTERSAP Security Audit Log ABAPTimeTitleUserDeviceActionContext 1Context 2Context 33/13/18 11:32RFC/CPIC Logon SuccessfulSAP*nsalabAU5F03/13/18 11:32Successful RFC CallSAP*nsalabAUKSLO6SAPLSLO63/13/18 11:46Logon FailedSAP*MacBook-Pro-NursultaAU2A13/13/18 11:47Logon SuccessfulSAP*MacBook-Pro-NursultaAU1A0P3/13/18 11:51Transaction StartedSAP*MacBook-Pro-NursultaAU3SE163/13/18 11:51Read TableSAP*MacBook-Pro-NursultaDU9USR022passedRSAU READ FILE22

SecurityAnalytics

ERP SECURITY LOGGING Common business application loggingoooooEvent timeEvent typeServer infoUser info.24

ERP SECURITY LOGGING SAP tracks 50 fields across 30 log formatsooooooooSAP system ID (business entity)client number (company sandbox inside a system)names of processes, transactions, programs or functions (runtime data)affected user, file, document, table, program or system (context data)amount of inbound and outbound traffic (network data)severity, outcome and error messages (status data)device forwarded the event (infrastructure data).25

ERP SECURITY LOGGINGSAP Security Audit Log ABAP Short list of important fieldsooooooTimeEvent type, classSystem type (log source)System ID, server hostname and IPUser name, device hostname and IPExecuted program name (transaction, report, remote call)26

THREAT MODELUse Cases 10 Categories (why)o Data Exfiltration, Account Compromise, Regular Access Abuse, Privileged Access Abuse, . 30 Classes (what)o Data Transfer, Account Sharing, Password Attack, Privilege Escalation, Lateral Movement, . 100 Scenarios (how)o Login from multiple hosts, User upgrades its own privileges, Cover tracks via user deletion, .27

SecurityData Science

ANOMALY TYPES Static anomalieso Unusual action (new or rare event)o Unusual context (server, device, .)o . Temporal anomaliesooooUnusual timeUnexpected eventHuge events volume.29

ANOMALIES VS. THREATS Many anomalies are not malicious Anomalies are statistical deviations Big infrastructures always have anomalies30

ANOMALIES VS. THREATSMatrix ExampleThreat ModelCategoryTemporal AnomaliesStatic AnomaliesClassUnusual actionUnusual time Unusual volume New actionNew serverNew deviceUnauthorized AccesshighmediumlowhighmediumlowAccount SharinglowmediumhighlowmediumhighPassword AttackmediumlowhighlowhighhighPrivilege EscalationhighmediumlowhighmediumlowAccess EnumerationhighlowmediumhighmediumlowData TransferlowmediumhighlowhighmediumRegular Access AbuseAccount CompromiseData Exfiltration31

StaticAnomalies

STATIC ANOMALY DETECTIONPlan Context building Context matching Anomaly analysisEvents StorageScoring EngineContext MatchingContext BuildingContext StorageAnomalies Storage33

CONTEXT BUILDING Whitelist known values for all users Define anomaly scores for all fieldsEvents StorageScoring EngineContext MatchingContext BuildingContext StorageAnomalies Storage34

CONTEXT THRESHOLD Problemo Log poisoning attackso Anomalies in user context Solutiono Importance amplificationo Mean of squared 11118,25835

CONTEXT MATCHING Compare new events with the user context field by field Assign individual anomaly scores for unknown fieldsEvents StorageScoring EngineContext MatchingContext BuildingContext StorageAnomalies Storage36

ANOMALY ANALYSIS Get a total event anomaly score from all its fields Get a total user anomaly score from all its eventsEvents StorageScoring EngineContext MatchingContext BuildingContext StorageAnomalies Storage37

TemporalAnomalies

TEMPORAL ANOMALY DETECTION Establish a normal behavior baseline Train to predict normal user actions Analyze incorrectly predicted actionsEvents StorageFeatures EncodingRNN EngineAnomaly DetectionModel TrainingWeights StorageAnomalies Storage39

FEATURE ENGINEERING Feature selection Feature encodingEvents StorageFeatures EncodingRNN EngineAnomaly DetectionModel TrainingWeights StorageAnomalies Storage40

FEATURE SELECTIONDataTimeTitleUserDeviceActionContext 1Context 2Context 33/13/18 11:32RFC/CPIC Logon SuccessfulSAP*nsalabAU5F03/13/18 11:32Successful RFC CallSAP*nsalabAUKSLO6SAPLSLO63/13/18 11:46Logon FailedSAP*MacBook-Pro-NursultaAU2A13/13/18 11:47Logon SuccessfulSAP*MacBook-Pro-NursultaAU1A0P3/13/18 11:51Transaction StartedSAP*MacBook-Pro-NursultaAU3SE163/13/18 11:51Read TableSAP*MacBook-Pro-NursultaDU9USR022passedRSAU READ FILE41

FEATURE ENCODINGVectorTimeTitleUserDeviceActionContext 1Context 23/13/18 11:32RFC/CPIC Logon SuccessfulSAP*nsalabAU5F03/13/18 11:32Successful RFC CallSAP*nsalabAUKSLO6SAPLSLO6[ 0.192488425925925940.7110773240660063Context 3RSAU READ FILE0.8366013071895425 ]42

FEATURE ENCODINGKnowledge Base On-the-fly KB Security-focused KB Application-focused KBo Static (1/100000 scale)o Mapping (1/100 scale)43

MachineLearning

MODEL IMPLEMENTATION Find the right algorithm for a task Implement a model and its environment Optimize the model for the best accuracyEvents StorageFeatures EncodingRNN EngineAnomaly DetectionModel TrainingWeights StorageAnomalies Storage45

MODEL MEMORY Recurrent neural networkso Simple RNN-Forgets longer dependencies-Proven track record-LSTM simplified-RNN on steroidso Long Short-Term Memoryo Gated Recurrent Unito Neural Turing Machineo .46

MODEL nPredictProgram47

MODEL PARAMETERS Architectureo Layers number, Neurons number, Activation function, Loss function, Optimizer, . Datao Features, Knowledge base, Sequence length, Normalization, . Trainingo Epochs, Bach size, Threshold, Distance, Smoothing, .Events StorageFeatures EncodingRNN EngineAnomaly DetectionModel TrainingWeights StorageAnomalies Storage48

SEQUENCE LENGTH ABC DEFGHACKEDABCD EFGHACKEDABCDE FGHACKEDABCDEFGHAC KED49

KNOWLEDGE BASE SORTING Alphabet Criticality FrequencySorted by AlphabetSorted by Frequency50

ADAPTIVE THRESHOLD Error scoreo Distance-based-Predicted value (blue)Actual value (green) Thresholdo Max training error score Sensitivityo As iso Coefficient51

ANOMALY DETECTION Predict a potential user activity Report incorrectly predicted events above thresholdEvents StorageFeatures EncodingRNN EngineAnomaly DetectionModel TrainingWeights StorageAnomalies Storage52

ANOMALY DETECTIONPrediction53

ANOMALY DETECTIONMetrics Accuracy 95%o True Positives 71%o True Negatives 97% Errors 5%o False Positives 3%o False Negatives 29%54

CONCLUSIONS Security analytics is more important than machine learningML-driven solutions must help analysts and not replace themAdjust accuracy and tolerance to false positives for your situationBuild an ecosystem of ML models and advanced analytics on top of it55

AI BLESS YOUEugene NeyolovHead of R&Dneyolov@erpscan.comRead our blogerpscan.com/category/press-center/blog/Join our ubscribe to our newsletterseepurl.com/bef7h1USA:228 Hamilton Avenue, Fl. 3, Palo Alto, CA. 94301Phone 650.798.5255EU:Luna ArenA 238 Herikerbergweg, 1101 CM AmsterdamPhone 31 20 8932892erpscan.cominbox@erpscan.comEU:Štětkova 1638/18, Prague 4 - Nusle,140 00, Czech Republic56

3/13/18 11:32 Successful RFC Call SAP* nsalab AUK SLO6 SAPLSLO6 RSAU_READ_FILE 3/13/18 11:46 Logon Failed SAP* MacBook-Pro-Nursulta AU2 A 1 3/13/18 11:47 Logon Successful SAP* MacBook-Pro-Nursulta AU1 A 0 P 3/13/18 11:51 Transaction Started SAP* MacBook-Pro-Nursulta AU3 SE16 3/13/18 11:51 Read Table SAP* MacBook-Pro-Nursulta DU9 USR02 2 passed