Root Cause Analysis Methods - USALearning

Transcription

Root Cause Analysis MethodsTable of ContentsRoot Cause Analysis Methods. 25 Whys. 45 Whys – Example . 5Another Approach. 8Ishikawa “Fishbone” Diagram . 10Fishbone – Example . 11Ishikawa “Fishbone” Diagram . 12Fishbone – Example . 13Next Steps . 16Using Root Cause Analysis Results. 17Notices . 19Page 1 of 19

Root Cause Analysis MethodsRoot Cause Analysis MethodsThere are many different approaches, methods, and techniques forconducting root cause analysis in other fields and disciplines.The analysis method used to identify the root cause(s) of acybersecurity incident depends on the circumstances of the incident the information that is available/discoverable your specific incident taxonomy and types/categories of causes(or threat vectors)Adapt your root cause analysis method to the incident beinganalyzed, as needed. You might use more than one method, or a hybrid approach.A cause analysis process can guide analysts through the multiplequestions and paths to identify the initiating cause(s) and threatvector(s) that enabled an attack to occur.[Distribution Statement A] This material has been approved for public release and unlimiteddistribution.12**012 So there are a variety ofdifferent approaches, methods,techniques that you can use forconducting root cause analysis, andwe can adapt some of the ones thatare used in other fields outside ofinformation security and see how wecan apply these to cybersecurityincident root cause analysis.So generally the analysis method isgoing to depend on specifically thetype of incident that we're looking at.Root cause analysis of a privilegedcompromise incident is going to bedifferent from a denial-of-serviceattack incident. So it depends on thetype of activity that's occurring, whatPage 2 of 19

information is available to analyze, oraccessible, or even in existence. Ifan intruder has gained privilegedaccess on a particular vulnerablesystem, they may have deleted ortampered with some of the logs orevidence or information that'savailable to analyze on that system.And then it's also going to depend,again, on your type of categories ofthe incidents or root causes or threatvectors in your particular process.So again, there's a variety of differentways, different approaches thatyou're going to need to adapt asneeded, depending on the type ofincident, and even some of theexamples that we used from otherfields here in this example, you mighthave to use a hybrid or combinationapproach depending on the specificcircumstances of that incident you'reanalyzing.So the most important thing iscoming up with some kind of processto help guide you or your analyststhrough the various questions thatthey need to address that areimportant to them for going to thenext step and providing anappropriate response and course ofactions to identify the underlyingcause of the particular threat thatallowed the incident to occur, andtherefore provide a follow-upresponse appropriate and relevant tothat particular underlying cause.Page 3 of 19

5 Whys5 WhysThe 5 Whys (a.k.a. Five Whys) method iteratively asks “Why ?” toidentify the root cause of a problem. This method is used in many cause analysis techniques,including the Analyze phase of Six Sigma.Continue asking “Why ?” questions until the root cause isidentified or until no further data/information is available (i.e., thecause is “unknown”). This method may require more or fewer than five iterations ofquestions, depending on the problem.To answer the Why questions, you often also need to answerrelated What and How questions (as well as Who and When).[Distribution Statement A] This material has been approved for public release and unlimiteddistribution.13**013 So one method that's beenused in a number of different fields iscalled the Five Whys approach, theFive Whys method, and basicallywhat it does is repeatedly, iterativelyasks the question why somethingoccurred until you can identify theultimate cause of the particularproblem. So this technique is used,like I said, in other areas, such as theSix Sigma, the Analyze phase, andbasically you just keep on askingthose questions until the cause iseither unknown or until you get tothe answer. And even though thetechnique is called Five Whys, that'sjust kind of a generalized number. Inmany cases, you may be able to getPage 4 of 19

to the answer in fewer than fiveiterations, or in many cases, it maytake more than five questions toaddress the particular type ofanswers for the root cause of thisparticular incident.And as we mentioned earlier, Toanswer some of these why questions,maybe you can reword these orparaphrase them, or they might berelated to the what and howquestions, or even sometimes thewho and the when.5 Whys – Example5 Whys – ExampleWhy did the SIEM tool alert?- It detected a large amount of outgoing PII data.Why was the PII data being sent?- The data was coming from a local desktop workstation.Why was the system sending PII?- It was a result of malware, running as a hidden process.Why was the malware running on the system?- Antivirus was disabled.Why was malware installed?- The user clicked on a link in a phishing email.[Distribution Statement A] This material has been approved for public release and unlimiteddistribution.14**014 So here's an example ofapplying the Five Whys approach to aPage 5 of 19

cybersecurity incident. Your securityincident and event management toolmight set off an alert, and the firstwhy is why did this alert go off, andthen looking at the details of it. TheSIEM alert detected some amount ofpersonally identifiable information,some data being exfiltrated or goingout, was detected by some rule set,and it set off this alarm.So then further analysis, diggingdown into why did this alert go off,was the threshold set correctly, andyou find that it was indeed a correctpositive and that data was comingfrom a local workstation within ournetwork and it set off this threshold,this trigger-- so then the nextquestion is why was this systemsending PII data, and upon furtheranalysis-- again, if those resourcesare available to you-- you mayeventually discover that the PII wasbeing sent by a hidden maliciouscode, malware process, that wasundetected, and this malwareprogram was actually sending out thePII data across the network.So the next question may be: Well,how did this malware get on there?Why was it running undetected onthe system? And further analysiswas that for some reason theantivirus program that was expectedto be running on this had beendisabled. Maybe it was part of theinstallation of the malware or someother process, or maybe the intrudermanually went in and disabled theantivirus product.Page 6 of 19

And then another why question is:Well, why or how did the malwareget installed in the first place? Andagain, further analysis of the differentdata sources, or perhaps talking withthe users involved, ultimatelyidentified that the user had clicked ona phishing email that they receivedthat contained a link, and byfollowing this link they unknowinglydownloaded and installed thismalware, which caused all the otherprocesses, which triggered the causeand effect, ultimately leading to theinitial detection of the SIEM toolsetting off the alert.So this is just one kind of simplifiedexample of how you might keep onasking questions until you eventuallyget to the underlying, initiating causeof the problem, so then you canaddress all the different phases, thedifferent steps in the process, to fullymitigate against the problem.Page 7 of 19

Another ApproachAnother ApproachAsk the 5W H (Who, What, Where, When, Why, How) questions: What happened?- Outgoing PII data was detected by the SIEM tool. Where did the traffic originate?- It originated on a local desktop workstation. Who/what was sending the data?- Malware, running as hidden process, sent the data. Why/how was malware installed?- The user clicked on a link in a phishing email.[Distribution Statement A] This material has been approved for public release and unlimiteddistribution.15**015 Another approach, using avariation of Five Whys, is maybe theFive W's: who, what, where, when,why, and how questions. So again,this is just a slight variation, butagain, asking questions to try toidentify a particular cause.So applying this approach to theprevious scenario: What happened?Well, there was PII data that wasdetected by your SIEM tool. Wheredid that particular traffic originate?In this case, the where, we'reidentifying it as a particular hostworkstation within our network. Whoor what was sending the data? Now,the who, the person, sometimes thisPage 8 of 19

may never be known fully.Attribution of who the intruders arecan often be difficult, and maybe thebest you can hope is you might beable to trace it back to a particular IPaddress or host name coming from alocation, and that may just be onelink in a chain of other systems thatthe intruder may have used. But inthis case, for what we're sending, is aparticular piece of malicious code,this malware that was sending thedata, and it was running as a hiddenprocess on the system.And then asking the question of howor why did the malware get installed,again, we come back to the answerthat it was due to user involvement.They received a phishing email andthey clicked on a malicious link inthere to install the malicious code.So that's applying a slight variation ofasking different questions to come toidentify the underlying root cause.Page 9 of 19

Ishikawa “Fishbone” DiagramIshikawa “Fishbone” DiagramThe Ishikawa diagram is also known as a cause-and-effectdiagram.The primary “bones” in the diagram are categories of relatedcauses.Each bone in the diagram can branch out into furthercategories/subdivisions, down to the specific root causes.For example, these are categories used in service industries: policies procedures people technology[Distribution Statement A] This material has been approved for public release and unlimiteddistribution.16**016 Another approach, usingother types of fields, sectors, andservice industries is called the causeand-effect, or Ishikawa, or fishbonediagram, and the reason it's called afishbone diagram, as we'll see in thenext slide, is it looks like the outlineof a skeleton of a fish. So theprimary bones in the particularcause-and-effect diagram arecategories of related causes, andthen each bone in that category canthen have subcauses and branch outinto further subdivisions and subtreesin this diagram, and depending on,again, how you define your differentthreats or causes--Page 10 of 19

Fishbone – ExampleFishbone – chnologyCausesEffect[Distribution Statement A] This material has been approved for public release and unlimiteddistribution.17**017 Will affect how youaddress or use this process.Page 11 of 19

Ishikawa “Fishbone” DiagramIshikawa “Fishbone” DiagramThe Ishikawa diagram is also known as a cause-and-effectdiagram.The primary “bones” in the diagram are categories of relatedcauses.Each bone in the diagram can branch out into furthercategories/subdivisions, down to the specific root causes.For example, these are categories used in service industries: policies procedures people technology[Distribution Statement A] This material has been approved for public release and unlimiteddistribution.16**016 In many service industries,an example is the high-level-- theydistinguish between different policies,procedures, people and technology,and then they have, underneath eachof these higher-level categories,different detailed subdescriptions onhow these different causes might beused to identify various effects.Page 12 of 19

Fishbone – ExampleFishbone – chnologyCausesEffect[Distribution Statement A] This material has been approved for public release and unlimiteddistribution.17**017 So applying the fishbone rootcause analysis methodology to acybersecurity incident, we can stilltake those same high-level categoriesof policies, procedures, people andtechnology and map differentinformation assurance or informationsecurity characteristics to these.So you might have in yourorganization different policies thatprohibit sharing of user accounts.You have to have your own account,use a password, and you're notsupposed to share that with otherpeople. You may have a policysaying that you're not supposed toconnect unauthorized devices to thePage 13 of 19

corporate network or the enterprise,so you wouldn't be able to connectyour own laptop or connect to theorganization's wireless network, orperhaps you might not be allowed toconnect USB or flash drives to aparticular system. So you may havepolicies that are in place, and thepolicies may have been violated.There may be other types ofprocedural factors that are in place,such as you typically would have apatch management program, but forsome other reason the procedurewasn't followed or it failed, andparticular security patches weren'tinstalled on a system that allowed itto be vulnerable and exploited.Perhaps account managementprocedures are in place, but a failureto follow those procedures or aviolation of those procedures couldallow different accounts to be createdor set up or to be not tracked ormonitored appropriately, and thiscould cause an incident to occur.People, the people factor. There maybe a variety of different interactionsor involvement that they might have.It could be that they were sociallyengineered or received, again, aphishing email message or someother way deceived or impersonated,and they took some action to allowthe initial foothold into the systemsthat caused the incident.Or it may be not a malicious but anaccidental action or inaction that aperson committed which could havePage 14 of 19

inadvertently leaked information orcaused some other problem, thatallowed the incident to occur.And then there's a whole variety ofdifferent technological issues that youwill want to try to categorize andidentify, things from such asconfiguration problems ormisconfigurations that could allow anintrusion to occur, differentvulnerabilities that were undetectedand how these are categorized andmitigated against them. So this is,again, just one way of puttingtogether a higher-lever taxonomy orapproach for addressing, identifyingas much information as you canabout the various factors that couldallow an incident to occur, and thenidentifying what that underlyingcause is.Page 15 of 19

Next StepsNext StepsAfter identifying the root cause, an appropriate course of action can betaken to mitigate the incident, leverage new indicators, and improve futuredetection. Remember that failure to mitigate the root cause(s) can allow new orrepeat incidents to occur.The following are response actions: Mitigate the root cause(s) (e.g., patches, workarounds, changes). Recover and secure the affected system(s). Communicate/coordinate with others. Track any follow-up information. Close the incident. Conduct a post mortem, lessons learned meeting.The following are appropriate prevention/detection actions: Implement mitigation actions on other vulnerable systems. Add new attack signature/indicators to existing prevention/detectionprocesses.[Distribution Statement A] This material has been approved for public release and unlimiteddistribution.18**018 So once you've identified whatthe causes are, the next steps are to feedinto the response process and sometimes,like we mention also, it can also feed backinto the preventative and detectionprocesses too. So if you do not take acomprehensive approach at addressing,mitigating, eliminating the particularvulnerabilities, you may have the intrudercome back and repeat the incident oroccurrence. So making sure that if youdon't understand what the underlying-- theroot cause of the problem was, a superficialapproach, such as changing the administratorroot password, is not going to lock theintruder out. It may require, again,taking the system offline, completelyrebuilding it, restoring data, patchingPage 16 of 19

it, installing other security tools, andsome of the other types of activitiesor courses of action that you woulddo in various types of responseincidents.And then feeding back to identifyingthe information-- if this is a new typeof attack or a zero day vulnerabilityhas been exploited, feeding that backinto new indicators for detectingfuture incidents or preventingincidents from occurring in theprevent and detect processes.Using Root Cause Analysis ResultsUsing Root Cause Analysis ResultsUse the results of the root cause analysis to assist yourconstituency. Example tasks include the following:1. Assist your constituency by explaining the method used for rootcause analyses.2. Assist your constituency by distilling actions from the analysisand identifying improvements in the infrastructure, processes,and designs.3. Use the findings of the analysis to enrich publications for yourconstituency.(Source: [DRAFT] FIRST SIRT Services Framework, Tasks and Sub-Tasks for Function 2.4 Vulnerability/Exploitation Analysis – Sub-Function 2.4.2Root cause analysis (Task 2.4.2.2)[Distribution Statement A] This material has been approved for public release and unlimiteddistribution.19**019 So again, Looking at theFIRST Services Framework, some ofPage 17 of 19

the follow-up actions that mighthappen after the root cause analysisare helping-- perhaps one of thethings you're doing is to advise yourown constituency on how to performroot cause analysis, especially ifyou're a coordinating CSIRT, and youdon't have access to a lot of the datasources, giving them some guidanceon how they can perform a rootcause analysis locally.In addition, if they do perform theirown local root cause analysis, youmay be able to help the constituentsin providing, again, recommendedcourse of actions and response foreradicating and cleaning up after theincident, recovering from it, as wellas future improvements such asdetecting and preventing incidentswith their own systems themselves.And then you can also use the resultsof root cause analysis in providingbetter communications and generalinformation and guidance to yourconstituents as far as outreach andcommunications.Page 18 of 19

NoticesNoticesCopyright 2016 Carnegie Mellon University[Distribution Statement A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-USGovernment use and distribution.This material is based upon work funded and supported by Department of Homeland Security under Contract No. FA8721-05-C-0003 with CarnegieMellon University for the operation of the Software Engineering Institute, a federally funded research and development center sponsored by theUnited States Department of Defense.NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN“AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANYMATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, ORRESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KINDWITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.This material is distributed by the Software Engineering Institute (SEI) only to course attendees for their own individual study. Except for the U.S.government purposes described below, this material SHALL NOT be reproduced or used in any other manner without requesting formal permissionfrom the Software Engineering Institute at permission@sei.cmu.edu.The U.S. Government's rights to use, modify, reproduce, release, perform, display, or disclose this material are restricted by the Rights in TechnicalData-Noncommercial Items clauses (DFAR 252-227.7013 and DFAR 252-227.7013 Alternate I) contained in the above identified contract. Anyreproduction of this material or portions thereof marked with this legend must also reproduce the disclaimers contained on this slide.Although the rights granted by contract do not require course attendance to use this material for U.S. Government purposes, the SEI recommendsattendance to ensure proper understanding.Carnegie Mellon , CERT and CERT Coordination Center are registered marks of Carnegie Mellon University.DM-0003588[Distribution Statement A] This material has been approved for public release and unlimiteddistribution.2Page 19 of 19

Root Cause Analysis Methods [Distribution Statement A] This material has been approved for public release and unlimited distribution. 12. Root Cause Analysis Methods. There are many different approaches, methods, and techniques for conducting root cause analysis in other fields and disciplines. The analysis method used to identify the root .File Size: 798KBPage Count: 19