Introduction To Malware Analysis - Zeltser

Transcription

My popular SANS Institute malware analysiscourse has helped IT administrators, securityprofessionals, and malware specialists fightmalicious code in their organizations. In thisbriefing, I introduce the process of reverseengineering malicious software. I coverbehavioral and code analysis phases, to makethis topic accessible even to individuals with alimited exposure to programming concepts.You'll learn the fundamentals and associatedtools to get started with malware analysis.Security incident responders benefit fromknowing how to reverse-engineer malware,because this process helps in assessing theevent's scope, severity, and repercussions. It alsoassists in containing the incident and in planningrecovery steps. Those who perform forensicinvestigations also benefit from mastering thistopic, because they learn how to understand keycharacteristic of malware present oncompromised systems.Copyright 2009-2010 Lenny Zeltser1

How relevant malware has become in the contextof computer intrusions! Almost every data breachannounced publically, it seems, involves some formof malicious software, such as backdoors, trojans,network worms, exploits, and so on.In this session, I will introduce you to theapproaches for analyzing malware, so you can turnmalicious executable inside out to understandtheir inner-workings.Copyright 2009-2010 Lenny Zeltser2

When such an intrusion occurs at yourorganization, will you be able to quickly assess thethreat? Knowing how to analyze malware can helpyou understand the context of the incident, itsseverity and repercussions. It can help you planyour response to contain the incident’s scope and,in some cases, understand what entities might bebehind the intrusion.Perhaps that is why the individuals who arelooking to acquire malware analysis skills are nolonger just anti-virus and threat researchers, butalso system and network administrators, as well asgeneral security professionals. More and moreoften, these individuals are being asked tounderstand the capabilities of malware that theirorganizations discover.Copyright 2009-2010 Lenny Zeltser3

Knowing how to analyze malware can bring anelement of control into an otherwise chaoticenvironment that exists around a security incident.It’s also a critical aspect of modern forensicanalysis actions, because it’s all too frequent forinvestigators to discover malware on thecompromised systems.Copyright 2009-2010 Lenny Zeltser4

The approach to reverse-engineering that hasworked for many analysts involves two key phases:behavioral analysis and code analysis. Duringbehavioral analysis, we examine how the specimeninteracts with its environment. The code analysisphase allows us to learn about the specimen’scapabilities by examining the code from which theprogram is comprised.You’ll see this approach in action in the upcomingslides.Copyright 2009-2010 Lenny Zeltser5

I find that the best way to learn malware analysis isby going through examples. The maliciousexecutable from which we’ll learn in this session iscaptured on this slide. It’s a trojan copy ofWindows Live Messenger—a fake instantmessenger client that was being distributed tovictims via email. Many such trojans have thecapability of capturing the victims’ logoncredentials, and may have other “undocumented”features.Let’s see what capabilities are built into thismalicious executable. As I lead you through theanalysis, I’ll introduce the tools and techniquesthat will help with the reverse-engineeringprocess.Note that in this example, as with the majority ofmalicious incidents you’ll probably encounter, we’llbe examining a compiled Windows executable forwhich we have no source code.Copyright 2009-2010 Lenny Zeltser6

I typically start examining a malicious executablewith behavioral analysis, because it comes moreeasily to me than code analysis. If your strength isin programming and x86 assembly, then you mayprefer to start with the code analysis phaseinstead.When performing behavioral analysis, we’re goingto infect a laboratory system with the specimen.Then we’ll observe how the malicious executableaccesses the file system, the registry, and thenetwork. As we learn about the program’sexpectations of its runtime environment, we willslightly adjust the laboratory infrastructure toevoke additional behavior from the program. Wewill also attempt to interact with the program todiscover additional characteristics it may exhibit.Copyright 2009-2010 Lenny Zeltser7

When performing malware analysis, it’s convenient touse virtualization software when setting up your lab.Such tools typically simulate the underlying hardware,allowing you two run multiple instances of “virtual”machines simultaneously. For instance, you could useWindows 7 as your base OS, while having a separateinstance of Windows XP running in another window,and a Linux instance running in another window.Each virtual machine behaves mostly as “real” physicalsystems, in that it has its own set of I/O peripherals,RAM, network settings, and so on. All these aspects ofthe virtual machine are, well, virtualized.The convenience of a virtualized lab comes, in part,from the flexibility of having multiple instances ofvarious operating systems available to you within asingle physical system. Virtualization software caneven emulate a network, so that your lab doesn’t needto be connected to a physical network at all. Yet, thevirtual machines will be able to communicate witheach other over the simulated network, blissfullyunaware that the network is not “real.”I typically use VMware for virtualization. Other choicesinclude Microsoft Virtual PC, Sun VirtualBox, etc.Copyright 2009-2010 Lenny Zeltser8

One of the most convenient aspects of usingvirtualization software is its support for snapshots.They allow you to preserve the current state of thevirtual machine with a click of a button, and returnto it with another click. VMware Workstationsupport multiple snapshots, which comes in veryhandy for “bookmarking” different stages of youranalysis, so you can move back and forth duringyour experiments without losing importantruntime details.Snapshot capabilities are also very useful forreverting back to the system’s pristine state afteryou’ve completed your research and want preparethe lab for your next analysis. Save the state of thevirtual machine after you’ve installed the OS,patched it, and set up the necessary tools. Onceyou’re done with your analysis, click a button torevert to that state. Very convenient!Malware may have defenses that prevent it fromexecuting properly in a virtualized environment. Inthese cases, the easiest step might be to use a setof physical systems, instead. To mimic snapshotfunctionality when you’re unable to usevirtualization software, use disk cloning tools suchas dd and Norton Ghost.Copyright 2009-2010 Lenny Zeltser9

Any malware analysis lab carries the risk ofmalware finding a way to escape from yoursandbox. This risk is greater with a virtualized lab,because the isolation it provides is not as reliableas the literal air gap between physical systems.Since virtualization software is written by humanbeings, it will have bugs in it. Some of these bugsare vulnerabilities that malicious software may usein an attempt to escape the sandbox around yourlaboratory system. To address this risk, I suggestdedicating a single physical system to yourvirtualized lab: run several virtual machines in it,but don’t use that system for another purpose.Also, don’t connect the laboratory box to yourproduction network unless required for performingspecific tasks.It’s also very important to keep your virtualizationsoftware up to date on security patches.Sometimes they’re a pain to download and install.If you notice anything suspicious in the labenvironment when performing your analysis,restore the physical system from a backup copy,and keep a close eye on the environment.Copyright 2009-2010 Lenny Zeltser10

Let’s see this approach in action. Let’s say you havea suspicious executable that you’d like to analyze.You bring it into your lab, possible via a removableUSB disk and place it on the desktop of the virtualmachine you’re about to infect. Now what?First, take a snapshot of the state of the machine’sfile system and the registry. This will allow you toquickly see what major changes have occurred onthe system after you infect it.I like the free tool called RegShot for this purpose(http://sourceforge.net/projects/regshot). To useit, enable the “Scan dir1” option, and in thecorresponding window type “C:\”. This will allowthe tool to scan the registry and the full C: drive.Click “1st shot”. After RegShot takes the firstsnapshot, launch the malicious executable. Interactwith it a bit (e.g., try logging into it). Then kill theprocess, if you can. Next, click the “2nd shot”button in RegShot, and click the “Compare”button. You’ll see a report that describes the majorchanges to the system’s state. In this case, we seethat two files were added to the system.Copyright 2009-2010 Lenny Zeltser11

The two files that appeared on the system after weinfected it are pas.txt and msnsettings.dat. Take alook at them using notepad.It looks like pas.txt has captured the logoncredentials we used when logging into themalicious executable. That makes sense, becausewe received reports that this executable is a trojancopy of Windows Live Messenger.The msnsettings.dat file looks like a configurationfile of some sort.Copyright 2009-2010 Lenny Zeltser12

Another free tool that can help us understand howthe malicious program interacted with the filesystem and the registry is Process nals/bb896645.aspx).To use Process Monitor, run it while infecting thesystem. I typically launch the tool right after takingthe first RegShot snapshot. Remember to pausecapture in Process Monitor before taking thesecond RegShot snapshot.Process Monitor records API calls it observes onthe system that deal with file system and registryaccess. It shows the details of how programscreate, delete, read or modify the localenvironment. In the screen shot on this slide, yousee attempts by our malware specimen to createpas.txt file and to locate the msnsettings.dat file.Process Monitor’s log is very comprehensive.However, it is also very noisy. I use RegShot tomake sure that I don’t miss anything critical, whileI rely on Process Monitor to present acomprehensive perspective on the specimen’sinteractions with the file system and the registry.Copyright 2009-2010 Lenny Zeltser13

Reverse-engineering malware can help youbecome better at incident response and forensicanalysis. In our scenario, we have alreadydiscovered that Windows Live Messenger trojanmakes use of the msnsettings.dat file. Now youknow to look for it on the compromised system,even if you didn’t initially realize that this file wasimportant.Once you have a copy of msnsettings.dat, you canopen it to see whether it reveals additional detailsabout the program. On this slide, I’ve highlightedseveral lines from that file.One is a string “test,” which we may be able to uselater when trying to understand how the trojanprocesses the msnsettings.dat file. Another line,“gsmtp185.google.com” specifies an SMTP mailserver; this suggests that our specimen has theability to send email. The file also includes anemail address, “mastercleanex@gmail.com”. Thismay be the recipient of the information that thetrojan might attempt to send out. Of course, theseare just theories at this point. We’ll need toconfirm or deny them during subsequent analysissteps.Copyright 2009-2010 Lenny Zeltser14

It helps to have several tools to observe themalicious program’s interactions with itsenvironment. Another very useful and free tool I’dlike to tell you about is reBAT is similar to Process Monitor in that itrecords local processes’ interactions with theirenvironment. CaptureBAT’s logs tend to be lessnoisy than those created by Process Monitor. Thisis because CaptureBAT comes with filters thateliminate the majority of standard, non-maliciousactivities from the logs. You can customize thesefilters to your liking, as they are text files located inthe directory where you install CaptureBAT.If you launch CaptureBAT with the “-c” parameter,it will capture any files deleted in the background,allowing you to look at and restore even those filesthat the Windows Recycle Bin cannot capture.Launching CaptureBAT with the “-n” parametertells the tool to capture network traffic, like asniffer would, saving the result into a local .cap file.As you can see on this slide, CaptureBAT confirmedour earlier findings about the malware specimen.Copyright 2009-2010 Lenny Zeltser15

You can load the .cap file created by CaptureBATinto a full-feature network sniffer, such asWireshark (http://www.wireshark.org). If you don’tlike using CaptureBAT, you could also useWireshark to capture traffic direct off thelaboratory network.As you can see on this slide, the sniffer shows thatthe infected system has issued a DNS query,attempting to resolve the hostname“gsmtp185.google.com”. The “smtp” in thehostname suggests that the malware specimen islooking for a mail server to connect to, reinforcingour earlier theory of how the trojan might use thishostname.Copyright 2009-2010 Lenny Zeltser16

To confirm how the specimen wishes to use“gsmtp185.google.com”, allow the trojan toresolve this hostname. Once it can resolve it, it willpresumably attempt connecting to it, and you willbe able to use a network sniffer to see whatservice the specimen is trying to access.To set up name resolution, insert an entry for thehostname into the “hosts” file on the infectedsystem. A faster alternative is to use a tool calledFake DNS, available as part of the Malcode AnalysisPack toolkit from iDefense at the following more malcode analysis packFake DNS is a DNS server that you can configure toanswer any DNS query with a single IP address ofyour choice. Which IP address should you use? Isuggest picking an IP address of some system inyour lab on which you can run the service thatmalware may look for. This will redirect theconnection to the host where you’d set up thelistener, allowing the connection to be completedso you can learn about its purpose.In our example, captured on this slide, the networksniffer confirmed that the infected system isattempting to connect to TCP port 25 on“gsmtp185.google.com”.Copyright 2009-2010 Lenny Zeltser17

Now that you know malware is looking for anSMTP server, you can provide that service to itwithin your lab. An easy way to do this is to use theMailpot tool, which is part of the previouslymentioned Malcode Analysis Pack available ore malcode analysis packMailpot pretends to be an mail server, happilyaccepting SMTP messages from clients, but notsending them out. Instead, it stores the messageslocally for your review.To use Mailpot, run it on the host to which youhave redirected the SMTP server’s hostname usingFakeDNS, as shown on the previous slide.Now you can see the contents of the message thatthe trojan is mailing to the attacker. As highlightedon this slide, the message includes the victim’sMessenger username and password.Copyright 2009-2010 Lenny Zeltser18

How can we generalize the behavioral analysisprocess we’ve been following? As you observe acharacteristic of the specimen, you typically noticean element of the environment that the program islooking for, yet does not possess in your lab. Forinstance, the executable may be attempting toresolve a host name. To evoke new characteristics,you provide to the specimen the service it needs,thus allowing it to perform further actions to fulfilleach its true potential.With every service you add to the environment,you learn more about the specimen. Note that ifyou change too many environmentalcharacteristics at the same time, you malware mayperform too many new actions. This will speed upyour analysis at the expense of knowing exactlywhat change was responsible for which observedcharacteristic.When do you stop molding the laboratoryenvironment to match the specimen’s expectationsand dependencies? When you there are no morechanges to introduce into the lab to evokepreviously-unseen behavioral characteristics.That’s typically the point when you will want tostart the next phase of the reverse-engineeringprocess: code analysis.Copyright 2009-2010 Lenny Zeltser19

Behavioral analysis can be insightful and relativelyfast. However, it will rarely tell you everything youneed to know about malware of moderate andadvanced complexity. That’s where code analysiscan be of help. It can help reinforce yourbehavioral findings, and can shine light onadditional properties of the specimen that youmay not have discovered behaviorally.Code analysis can be tricky and time-consuming,because in the world of malware you almost neverhave the luxury of seeing the source code of theprogram you’re analysis. Instead, you need toreverse-engineer the compiled executable’sfunctionality by examining its code at the assemblylevel. A debugger and a disassembler can help youin this task. A disassembler converts thespecimen’s instructions from their binary form intothe human-readable assembly form. A debuggerlets you step through the most interesting parts ofthe code, interacting with it and observing theeffects of its instructions to understand theirpurpose.Copyright 2009-2010 Lenny Zeltser20

OllyDbg is among my favorite tools for performingcode analysis. It’s free, very powerful, and includesboth a disassembler and a debugger. You candownload OllyDbg from:http://www.ollydbg.de/A good way to start analyzing the specimen’s codeoften involves looking at the strings embedded inits executable. To do this with OllyDbg, first loadthe malicious executable into OllyDbg via File Open. Then, right-click on the code you will see inthe disassembler window, and select Search for All referenced text strings.OllyDbg will then bring up a new window that willshow the strings it discovered, as you can see onthis slide. Notice that we have seen some of thesestrings during behavioral analysis! Some of themlook like contents of the default msnsettings.datfile that our specimen creates when infecting thesystem.Copyright 2009-2010 Lenny Zeltser21

The reason we may be interested in looking at theembedded strings is because the string listingmight include a reference to a maliciouscharacteristic or a behavioral trait that we wouldlike to understand. In this case, consider thescreenshot on this slide. We got here byhighlighting one of the instances of“msnsettings.dat” strings, as shown on theprevious slide, and pressing Enter. Now, OllyDbgshows us how the program makes use of thisstring.If we wanted to pursue this path of analysisfurther, we could now set a breakpoint on thiscommand, run the trojan in the debugger, and seewhat it does. We’re not going to investigate thisparticular aspect of the malicious program,because I want to show you another, moreinteresting technique.Copyright 2009-2010 Lenny Zeltser22

You may recall that the version of msnsetting.daton the victim’s system was slightly different fromthe version that the trojan created on ourlaboratory system when we first ran it. Specifically,in our case, the file contained the string “hello”,while the victim’s version had the string “test”instead. What’s that about?The string “test” is not visible anywhere within thebody of the malicious executable when it’s notrunning. That’s probably because the trojan loadsthis string from msnsettings.dat during run time.To understand how the trojan uses the string“test,” we will search for it in the memory of therunning trojan.Once we locate the string in the trojan’s memory,we will set an access breakpoint there. Abreakpoint is a condition that tells the debuggerwhen to pause the normal execution of thedebugged program. Once the execution is paused,the debugger will give us a chance to review thedebugged program’s run time environment tounderstand what it is doing. This is probably themost useful feature of a debugger in the context ofreverse-engineering malware.Copyright 2009-2010 Lenny Zeltser23

To make use of this technique, load the maliciousprogram into OllyDbg, then run it. Once the trojanis running, press Alt M to bring up the memorymap in OllyDbg. This shows the listing of thememory segments mapped and used by thecurrently-debugged executable. To search theexecutable’s memory for a particular string, pressCtrl B in OllyDbg; then, enter your string. In thiscase, we’ll enter “test” in the ASCII field of thedialog box. Then press Enter.It is possible that your string will be located inseveral memory areas. The one you’re interestedin won’t necessarily be the fist one. To repeat yoursearch, click on the memory map window, thenpress Ctrl L. (Don’t forget to click on the memorymap window!)In the case of our example, we’ll need to performthe initial search via Ctrl B. This will find us aninstance of “test” that is not promising. We willrepeat the search by pressing Ctrl L once.Copyright 2009-2010 Lenny Zeltser24

Now that we’ve located the string “test” in thetrojan’s memory, we can set a breakpoint there. Inthis case we’ll be setting a memory accessbreakpoint, so that OllyDbg pauses the program’sexecution whenever it attempts to access thisparticular memory area. Effectively, this will allowus to catch the trojan while it is attempting to usethe “test” string; we will then be able to see how itmakes use of the string.To set the brakpoint, highlight the exact charactersof the string “test”, then right-click and click“Breakpoint” “Memory, on access”.The trojan will continue to run. Now we can eitherwait for it to try using the sting, or attemptinteracting with the program to try to cause it touse the string.We can try interacting with the trojan by typingsome text into its first field, the one labeled “Email address”. If you type any character there aftersetting our memory breakpoint, you willimmediately trigger the breakpoint, as you can seeon the next slide.Copyright 2009-2010 Lenny Zeltser25

As you can see on the left side of this slide, I entered acharacter into the field. I picked a letter at random:“g”. Right away, OllyDbg comes to the foreground,because we just triggered an attempt by the trojan tosomehow use the string “test”. You can now interactwith the code, looking at its environment, and evenrunning it as slowly as one instruction at a time.To execute one instruction, press F8. To examine therun-time environment of the program, look at itsregistersin the top right corner of the OllyDbg window.A register is a specialized location on the CPU that canstore data and that is very fast.What’s going on in this part of the code? Don’t worryif you don’t understand much of the assembly codeyou see there: this is just an introduction to malwareanalysis, so I’ll walk you through the most importantparts. OllyDbg has highlighted the instruction that willbe executed next by the program, “CMP CL, BL”. Thiscompares contents of two registers, CL and BL. CLpoints to the lowest byte of ECX; BL points to thelowest byte of EBX, so it’s an efficient way ofcomparing parts of ECX and EBX registers.Copyright 2009-2010 Lenny ZeltserDouble-click the registers to see their contents. ECXcontains the character we entered, “g”. EBX containsthe string that our input is being compared to, “test”(it’s stored backwards).26

Press F9 to continue executing the trojan. Deletethe “g” character you’ve entered previously. Thistime, let the program match the first character ofthe “test” string, and see how it compares thesecond character. To do this, enter “ta” in the “Email address” box. If you keep triggering thebreakpoint, press F9 to continue. You want topause right after you’ve had a chance to type “ta”.Press F8 to execute one instruction after you’vetriggered the breakpoint, just like you didpreviously. This time, if you look at contents of ECXand EBX registers, you’ll notice that the trojan iscomparing the character “a” that we entered tothe character “e” that it seems to expect. That’sbecause the CH register points to the secondlowest byte of ECX; the BH register points to thesecond lowest byte of EBX.Copyright 2009-2010 Lenny Zeltser27

So, the trojan seems to be looking for the string“test” in the “E-mail address” field. Exit thedebugger, launch the trojan by itself, and enter“test” to see what happens.Copyright 2009-2010 Lenny Zeltser28

Voila! When you enter “test”, the trojan brings youto a brand new screen that seems to allow you toconfigure the trojan’s operation. As you can see onthis slide, the configuration options let you definethe passphrase to activate this string, the addresswhere the trojan will send captured logoncredentials, etc.Copyright 2009-2010 Lenny Zeltser29

It’s time to wrap up our analysis. What have welearned about the trojan through the steps Idemonstrated? We established that the malwarespecimen captures the victim’s Windows Livecredentials entered into the trojan version ofWindows Live Messenger. It saves the usernameand password to a local file, and then sends it tothe attacker via Gmail. We also identified a file,msnsettings.dat, which the trojan uses to store itsconfiguration. The attacker can customize theconfiguration by typing “test” into the “E-mailaddress” field of the trojan; this keyword is basedon the previously-saved contents ofmsnsettings.dat.Copyright 2009-2010 Lenny Zeltser30

Great, we learned a bunch of details about amalware sample. What’s the point? My goal wasnot to teach you about this particular trojan’scapabilities. Instead, I wanted to use it as thecontext for introducing you to the key conceptsbehind reverse-engineering malicious software.The results of malware analysis are very useful forsecurity, systems, and network professionals. Thefindings can help during incident response andforensic investigation. They can also help you finetune your defensive mechanisms, and help youcreate intrusion detection signatures for locatingthe specimen across your enterprise.Copyright 2009-2010 Lenny Zeltser31

The general malware analysis approach, which Idescribed in this presentation, included behavioral andcode analysis phases.We began by observing the specimen’s behavior in anisolated lab using several monitoring tools. We usedour observations to determine how to interact withthe trojan, which produced additional results. Wewere able to evoke additional malicious characteristicsby gradually molding the laboratory environment tomatch the world within which the specimen expectedto operate.Armed with an initial understanding of the program’scapabilities, we employed code analysis to furtherunderstand the program’s characteristics. We beganthis phase by looking how the program usesinteresting strings, and employed memory accessbreakpoints to identify areas of the code worthexamining further.The best way to reinforce the techniques I discussedhere is to try the analysis on your own. This documentincludes links to the tools I used. You can alsodownload a copy of the trojan on the website thathosts this presentation and the correspondingwebcast: http://tinyurl.com/malcast. The full versionof the URL is: -webcast.html.Copyright 2009-2010 Lenny Zeltser32

To help you master malware reverse-engineeringskills, I created a one-page cheat sheet, which youcan download and customize freely. It’s available athttp://tinyurl.com/reverse-malware-sheet. The fullversion of the URL re-cheat-sheet.html.You may also find my other security cheat sheetsuseful. You’ll find them at:http://zeltser.com/cheat-sheets.Copyright 2009-2010 Lenny Zeltser33

My hope is that you’ll find this topic as fascinatingas I do. If you’d like to learn more about how toreverse-engineer malware, consider taking thecourse, which I teach at SANS Institute. It’s calledReverse-Engineering Malware: Malware AnalysisTools and Techniques, and you can read all about itat: http://LearnREM.com.The REM course teaches how to understand keycharacteristics of malware that runs on or targetsMicrosoft Windows systems. This includes bothexecutable files compiled to run natively onWindows, as well as browser-based malware, suchas malicious JavaScript or Flash files.If you decide to sign up, you’re welcome to use my10% discount code: COINS-LZ.Copyright 2009-2010 Lenny Zeltser34

If you have any questions about malware analysis,please get in touch with me—I’ll be glad to hearfrom you! If you’re interested in malware, youmight like the updates I post on Twitter--you canfind me there at http://twitter.com/lennyzeltser.Copyright 2009-2010 Lenny Zeltser35

About The Author:Lenny Zeltser leads the security consulting practice at Savvis. He isalso a board of directors member at SANS Technology Institute, aSANS faculty member, and an incident handler at the Internet StormCenter. Lenny frequently speaks on information security and relatedbusiness topics at conferences and private events, writes articles, andhas co-authored several books.Lenny is one of the few individuals in the world who have earned thehighly-regarded GIAC Security Expert (GSE) designation. He also holdsthe CISSP certification. Lenny has an MBA degree from MIT Sloan anda computer science degree from the University of Pennsylvania. Formore information about his projects, see www.zeltser.com.Copyright 2009-2010 Lenny Zeltser36

Any malware analysis lab carries the risk of malware finding a way to escape from your sandbox. This risk is greater with a virtualized lab, because the isolation it provides is not as reliable as the literal air gap between physical systems. Since virtualization software is written by