AutoFuzz: Automated Network Protocol Fuzzing Framework

Transcription

IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010239AutoFuzz: Automated Network ProtocolFuzzing FrameworkSerge Gorbunov and Arnold Rosenbloomserge.gorbunov@utoronto.ca, arnold@cs.toronto.eduDepartment of Mathematical and Computational Sciences,University of Toronto Mississauga,Mississauga, Ontario, Canada L5L 1C6AbstractAssessing software security involves steps such as code review,risk analysis, penetration testing and fuzzing. During the fuzzingphase, the tester‟s goal is to find flaws in software by sendingunexpected input to the target application and monitoring itsbehavior. In this paper we introduce the AutoFuzz [1] extendable, open source framework used for testing networkprotocol implementations. AutoFuzz is a „smart‟, man-in-themiddle, semi-deterministic network protocol fuzzing framework.AutoFuzz learns a protocol implementation by constructing aFinite State Automaton (FSA) which captures the observedcommunications between a client and a server [5]. In addition,AutoFuzz learns individual message syntax, including fields andprobable types, by applying the bioinformatics techniques of [2].Finally, AutoFuzz can fuzz client or server protocolimplementations by intelligently modifying the communicationsessions between them using the FSA as a guide. AutoFuzz wasapplied to a variety of File Transfer Protocol (FTP) serverimplementations, confirming old and discovering newvulnerabilities.Key words:Automated Fuzzing, Software Security, Vulnerability Detectionclients or servers. However, 'dumb' fuzzing is measured tobe 50% less effective than 'smart' fuzzing [11]. Oneexample of a 'dumb' fuzzer is ProxyFuzz [17]. ProxyFuzzis a man-in-the-middle non-deterministic network fuzzer.It randomly changes the network traffic [17] between aconnected client and server. Fuzzers of the second type,'smart' fuzzers, have a pre-programmed understanding ofthe protocol implemented by the targets they fuzz. Theytypically understand the protocol‟s state machine,messages syntax and field types and use this to efficientlyfuzz deep into target implementation code. Peach is anexample of a „smart‟ fuzzer [16]. Disadvantages of „smart‟fuzzers include their reliance on the availability of aprotocol‟s specification documents and the degree towhich a target implementation conforms to the publishedspecification. In addition, „smart‟ fuzzers require manualadaptation to customize them for each new protocol theyare to apply to. Therefore, its application to new protocolsis labour intensive and tedious.1.2 Previous Work1. Introduction1.1 BackgroundFlaws in the implementations of network protocols aresome of the most serious security problems. One such flawcould allow a malicious user to attack vulnerable systemsremotely over the Internet. Approximately 85% of allvulnerabilities reported by the National VulnerabilityDatabase [15] in the last 3 years can be exploitedremotely.A fuzzer is a tool used to discover implementation flawsby sending the target implementation unusual inputs inhopes of producing unexpected behavior. A protocolfuzzer can be classified as 'smart' or 'dumb' depending onits knowledge of the network protocol implemented by itstargets. A 'dumb' fuzzer sends random inputs to its target.It has no knowledge of the communication protocolimplemented by the target. „Dumb‟ fuzzers are easy todevelop and are immediately applicable to any protocolsA number of attempts have been made to automaticallyextract protocol specifications for „smart‟ fuzzers[2][4][5]. In [5] the automatic extraction of the protocol‟sspecification is based on synthesizing an abstractbehavioral model of a protocol implementation. Thebehavioral model is realized as a Finite State Automaton(FSA) constructed from the recorded conversationsbetween a client and a server. The FSA represents, in asuccinct way, the key states and transitions of a protocolimplementation and can be used to systematically guidethe flaw detection process. The main algorithm proposedin [5] for synthesizing an abstract behavioral model of aprotocol implementation is based on passive synthesiswith partial FSA reduction. Given a large collection ofnetwork traces the algorithm constructs and minimizes aFSA. The construction of a FSA relies on an abstractionfunction. An abstraction function is a simple function usedto map similar messages to a unique abstractrepresentation. For example, SMTP client requests can be

240IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010abstracted to their first four characters. That is, :account@test.com‟ are abstracted to „mail‟. Also, SMTPserver replies can be abstracted to their first threecharacters. For example, messages “550 Permissiondenied”, “221 Bye!” and “230 User anonymous loggedin” are abstracted to “550”, “221” and “230”respectively. The tester must supply two abstractionfunctions, one for the input messages to the target beingfuzzed, the other for the output messages. In [4], theauthors focus on automated protocol specificationextractions by constructing the protocol‟s FSA anddetermining message types. However, their technique ofFSA construction is substantially different from thetechnique presented in [5]. Their final system can be usedto extract protocol specifications. However, to the best ofour knowledge, neither of the systems [4] nor [5] isavailable publically for future development or research. In[2], the authors try to determine fields of individualprotocol messages by using bioinformatics algorithms. Inorder to determine message fields, similar messagesamples are aligned using multiple string alignmentalgorithms and their consensus sequences are analyzed tounderstand the beginning and the end of fields in themessage [2]. Their open-source tool can be used todetermine message fields for a collection of protocolmessages.1.3 The New Fuzzing FrameworkThis paper introduces the AutoFuzz. This open sourcefuzzing framework is a „smart‟, man-in-the-middle fuzzer.For simplicity in the discussion that follows we assumethat AutoFuzz is used to fuzz the server side of a networkprotocol implementation. More specifically, the messagescoming from the client to the server are denoted as inputmessages, and the messages coming from the server to theclient are denoted as output messages. However, AutoFuzzcan be applied with equal effectiveness to fuzz the clientside. First, AutoFuzz extracts specifications of a networkprotocol implementation from conversations recorded byacting as a man-in-the-middle between server/clientsessions. As in [5] AutoFuzz constructs a FSA whichcaptures the sampled conversations, and so, understandsthe protocol at a high level. AutoFuzz can be extended tounderstand any protocol by importing appropriateabstraction functions. Then, using the techniquespresented in [2], AutoFuzz finds the fields of individualmessages. In addition, it derives the type information ofthe variable data fields of individual messages, and so,understands the protocol at a lower level.Morespecifically, for each message of the sampledconversations, AutoFuzz associates a Generic MessageSequence (GMS) that is used to capture the syntaxinformation of the message. A GMS is a representation ofa message that separates static from variable data fieldsand associates variable data fields with type and lengthinformation. By using GMSs, AutoFuzz eliminates theneed for protocol specific fuzzing functions as required by[5]. Fuzzing functions can now be performed on GMSrepresentations instead of individual messages and bebased on the derived type or length information of thestatic or variable data fields. AutoFuzz can also beextended with new fuzzing functions. Finally, AutoFuzzintelligently fuzzes server or client network protocolimplementations acting as a man-in-the-middle and usingthe constructed FSA as a guide during the vulnerabilitydetection process. AutoFuzz was successfully applied toseveral File Transfer Protocol (FTP) implementationswhere it found both existing and new vulnerabilities.2. Framework Overview2.1 Main ComponentsThe main components of AutoFuzz are (1) AutoFuzzGraphical User Interface (GUI), (2) Proxy Server, (3)Protocol Specifications Extractor and (4) FuzzingEngine. We elaborate on each below.(1) AutoFuzz GUI allows testers to easily interact withthe fuzzer and control its actions. It is constructed usingthe JAVA Swing library [13]. To visualize a protocol‟sFSA AutoFuzz uses JUNG graphing library [14].(2) Proxy Server. AutoFuzz works as a proxy serverbetween a client and a server. It records and modifies theapplication level traffic to extract protocol specificationsand perform fuzzing operations. The proxy server is basedon the JAVA Socks server [6], but has been modified toallow direct manipulation of the application level traffic.Original inputModified inputServerAutoFuzzOutputClientOutputFigure 1. AutoFuzz Proxy Model(3) Protocol Specifications Extractor. The specificationsextractor extracts the FSA of a network protocolimplementation from a sample of communication sessionsbetween a client and a server. AutoFuzz can understandany application level protocol implementation afterappropriate input/output abstraction functions are importedin it. It also extracts GMSs using the algorithm outlined inthe Generic Message Sequence Construction section tounderstand to the syntax of individual messages.

IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010(4) Fuzzing Engine. The fuzzing engine modifies thecommunication traffic between a server and a client byapplying fuzzing functions. We elaborate more on how thetraffic is modified in the Fuzzing Algorithm Section. Thecurrent set of fuzzing functions contains both deterministicand non-deterministic functions. Deterministic functionsinsert preprogrammed data into the GMSs such as largestrings, maximum/minimum integer values and others.Non-deterministic functions randomly skip static orvariable data fields of GMSs, take random transitions inthe FSA and insert random data into the GMSs. Thefuzzing engine can be extended with new fuzzingfunctions. All actions during the fuzzing process arerecorded in the logs files. This allows testers to determinethe state in the communication and the exact inputmessage modifications that were performed during theunexpected application behavior.2.2 Process Work FlowThe process flow involved in fuzzing using AutoFuzz ispresented in Figure 2.Step 1: Protocol traces are recorded using AutoFuzz‟sbuilt-in proxy server. The traces can manually be edited bythe tester, exported or imported at any point of time.Step 2: Protocol‟s behavior model is constructed based onthe passive synthesis with partial Finite State Automaton(FSA) reduction proposed in [5].Step 3: Individual message syntax is extracted and storedin GMS. We extend the use of the abstraction functionfrom [5] to generate clusters of input messages for GMSconstruction. Hence, each cluster represents a collection ofsimilar input messages. The detailed algorithm ispresented in Generic Message Sequence constructionsection. Intuitively, given the abstraction function for theinput messages, similar input messages are clusteredtogether using this abstraction function. Next, sequencealignment algorithms are applied to generate GMS foreach cluster. Finally, we traverse the protocol‟s FSA andassociate each transition with the appropriate GMS.Step 4: Fuzzing functions are applied by modifying livecommunication sessions between the client and the server.The fuzzing engine is responsible for assigning a fuzzingfunction. Which fuzzing function is performed isdetermined by the current state in the FSA, input messageand which functions have already been applied. Thecomplete algorithm is presented in the fuzzing algorithmsection.1. Collect largenumber of traces2. Construct andminimize FSA3. ConstructGeneric MessageSequencesFigure 2. AutoFuzz Fuzzing Processes4. Perform fuzzingfunctions on eachtransition in the FSA2413. Generic Message Sequence ConstructionWe present a complete algorithm used to extract GenericMessage Sequences (GMSs). Remember, GMS is arepresentation of a message that separates static fromvariable data fields and associates variable data fields withtype and length information. A cluster is denoted as acollection of similar messages. Step 1: Similar messagesare clustered together using a new clustering technique.Step 2: Multiple sequence alignment algorithm describedin [2] is performed on each cluster. Step 3: GMS isconstructed for each cluster. Step 4: Each transition in theprotocol‟s FSA is associated with the corresponding GMS.Step1: First, we present a new technique used to clustersimilar messages. Remember, that for simplicity, wedenote all messages coming from the client to the server asinput messages, and all messages coming from the serverto the client as output messages.Define a set of input messages as. Letdenote the abstraction function on the inputmessages. The algorithm returns clusters of similar inputmessages using thefunction. Morespecifically, for all, define as follows:The algorithm returns.Consider the following set of sample input messages ofthe Simple Mail Transfer Protocol (SMTP). LetFor any input message in , letreturn itsfirst four characters. Applying the algorithm on thisexample it returns a set of two clusterswhereand.Step 2: After input messages are clustered we performmultiple sequence alignment algorithm on each clusterproposed in [2]. For each cluster the algorithm returns alist of aligned messages. Alignment of the input messagesis performed using the Needleman-Wunsch algorithm [8]based on the progressive alignment technique.For example, applying the algorithm on the cluster ,presented in Figure 3, we obtain three aligned inputmessages presented in Figure 4. The result is three inputmessages that have the same length where “-” represents asequence gap.

242ma i lma i lma i lIJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010f r o m : a n o n y m o u s @d o m a i n . c a m a i l f r o m : s @ . Figure 6. The final GMS obtained from 3 SMTP messages presentedf r o m : s a m p l e @ p c . r u in Figure 4. Consecutive “ ” s correspond to alpha-numeric variable dataf r o m : t e s t @t e s t . c o m fields.Figure 3. Sample SMTP Input Messagesma i l f r o m: a n o n y mo u s - - - - - @d o ma i n . c - a m a i l f r o m : - - - - - - - - s a mp l e @ - - - - pc . - r u m a i l f r o m : - - - - - - t e s - - - - t @ - - t e s t . c om Figure 4. Aligned Sample SMTP Client Requests. “-“ representssequence gap.Step 4: Finally, we traverse the protocol‟s FSA,abstracting each message at a transition and assigning itthe corresponding GMS. We now have the protocol‟s FSAgenerated from the large sample of network traces wherewith each transition has a specific GMS assigned.4. Fuzzing AlgorithmStep 3: Next, we construct a Generic Message Sequence(GMS) for each cluster. On the implementation level aGMS is an array list of message blocks, where a blockcorresponds to either static or variable data field.First, we identify the beginning and the end of the staticand variable data fields. Intuitively the algorithm looks atcharacters at the same position across all messages and, ifall characters are the same, it marks that position as staticposition in the resulting GMS, otherwise variable position.Consecutive static and variable positions in the GMS aredenoted as static and variable data fields, respectively.More formally, defineas a setof aligned messages where for all,isan input message and for all,is its‟th character. Define for allas the‟th symbol in the GMS. We defineas follows:Once the network protocol specifications are extracted byconstructing its FSA and GMSs, the fuzzing is started. Inaddition to the FSA and associated GMSs the fuzzingengine is loaded with an extendable list of fuzzingfunctions. Initially, the fuzzing engine sets its state to theroot of the protocol‟s FSA. It then monitors the inputtraffic, making appropriate transitions and applyingfuzzing functions. The abstract version of the algorithm ispresented in Figure 7.Note, that the current implementation does not comparethe server output messages to the modified responsesagainst the associated transition in the FSA. Ideally, theoutput messages should be compared to the outputmessages associated with the transition in the FSA todetermine whether a specific type of an unexpectedbehavior has occurred. For that, the FSA should be awareof the typical negative server responses, such as invalidsyntax.5. Experimental ResultsThe algorithm returns.Note,should be replaced by some unique character notseen otherwise in any of the sequences. Consecutivesin the resulting GMS correspond to variable data fields.Applying the algorithm on the aligned SMTP inputmessages presented in Figure 4, we obtain the GMSpresented in Figure 5.ma i l f r o m : µ µ µ µ µ µ µ µ s µ µ µ µ µ @µ µ µ µ µ µ . µ µ µ Figure 5. The intermediate GMS obtained from 3 SMTP messagespresented in Figure 4.Next, for each variable data field, identified as consecutives in the GMS, we associate the type information bylooking over each character at those positions in thealigned sequences and checking which type set theycorresponds to. The final GMS applied to our example ispresented in Figure 6.We applied AutoFuzz to extract protocol specification ofthe File Transfer Protocol (FTP) and fuzz multiple FTPserver implementations. This section provides an overviewof FTP, describes the setup environment and our findings.5.1 File Transfer ProtocolFile Transfer Protocol (FTP) is an application Internet Protocol (TCP/IP) networks for fileexchange. The original specifications of FTP wereproposed in 1971 [3], but have been modified many timessince then. Most commonly, the FTP is implemented asfollows. First, a client connects to the server on port 21,called the control port. The client requests, including thelogin process, are sent using this socket in ASCII. Whenthe client requests to transfer data, a new socket is

IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010typically opened on port 20 with the server. Port 20 iscalled the data connection port.Most client requests to the FTP server consist of a fourletter message type followed by the actual message.Commands CWD, RWD, MKD and PWD are the onlythree letter message type commands [10]. The serverresponses are also in ASCII with first three digitscorresponding to a status code following by an optionalmessage.Fuzzing Algorithm FlowchartInput: Protocol‟s FSA with each transition associated witha GMSBEGINLoad the fuzzingfunctions5.2 Setup EnvironmentStep 1: We write and import the abstraction functions forthe FTP server implementations. All input messagescoming from the client to the server are abstracted to itsfirst four characters, except for the messages beginningwith CWD, RWD, MKD and PWD, which are abstractedto its first three characters. All output messages comingfrom the server to the client are abstracted to its first threecharacters.Step 2: We install an FTP server implementation that willbe fuzzed, such as Firezilla FTP Server [12].Step 3: We setup a proxifier to redirect all traffic ofWindows ftp.exe client to AutoFuzz proxy server. (Note,since AutoFuzz works as a proxy server between theserver and the client, the client connections must beencapsulated in SOCKS5 sessions [7]. One can run aProxifier [18] on a process to encapsulate its traffic inSOCKS5 protocol and redirect it to a specific SOCKS5proxy server.)Step 4: Next, we run AutoFuzz and start its proxy server.Step 5: We manually connect to the FTP server usingftp.exe client and perform common FTP requests. Forexample, we connect to the server using different logincredentials, download and upload different files, createand remove directories. Each session is identified as aseparate network trace. In total, we record 23 networktraces.Step 6: We build the FSA corresponding to the networktraces, which is presented in Figure 8. We also constructGMSs and associate them with the appropriate FSAtransitions (Figure 9).Step 7: We start the fuzzing engine. Finally, we run asmall FTP client, written in JAVA, to automaticallyperform multiple sessions with the server and executevarious requests, while AutoFuzz automatically followsthe fuzzing algorithm presented in Figure 9.243ENDNOIs the fuzzer turnedON?YESSet the currentstate to the root ofthe protocol’s FSAYESReset the currentstate to the root ofthe protocol’s FSA?NORead the inputmessage from theclient to the serverDoes the current state have atransition for the abstractrepresentation of the input?NOYESModify the inputmessage by applyingthe next fuzzingfuncitonUpdate the currentstateSend the inputmessage to theserverFigure 7. Fuzzing Algorithm Flowchart. The fuzzer is turned on/off bythe tester. The tester also sets when the current state should be reset to theroot of the protocol‟s FSA.

244IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010unexpected behavior instances involves crashing Open andCompact FTP Server by sending arbitrary long stringsprior to the authentication on USER, PASS and PORTcommands, and sending „\r\n‟ string prior to or afterauthentication at any state of the server. The first denial ofservice attack was already known to the public, while thesecond attack was new. Another set of unexpectedbehavior instances involves arbitrary command executionon the server prior to authentication. This attack is evenmore dangerous since a malicious user does not need toknow how to write any shellcode to completely gaincontrol over the server. This attack was also unknown. Thedevelopers of Open and Compact FTP Server 1.2 werenotified of both vulnerabilities.0USER1PASSLEDEDKXMDW DXPMXRDCW2USERPORTPORT3NLSTRETRS TORCWDXMKD DELE4QUIT5Figure 8. FTP Finite State Automaton constructed from 23 networktraces.StateID012Input QUITGMSUSERPASSPORT192,168,192,1XRMDXMKDSTOR test.txt 2,1QUIT4,4XMKDXMKD4DELEDELE4CWDCWD5USERUSERFigure 9. FTP Generic Message Sequences. Consecutivecorrespondsto a variable data field of any type. Consecutivecorresponds to avariable data field of a Long Integer.5.3 ResultsWe applied AutoFuzz to automatically fuzz three differentFTP server implementations: Firezilla FTP Server 0.9.34,Open and Compact FTP Server 1.2 and Wing FTP Server3.5.2 [12][9][19]. We were unable to find any unexpectedbehavior instances of Firezilla or Wing FTP servers, butwere able to find numerous unexpected behavior instancesof Open and Compact FTP Server 1.2. A first set of6. Conclusion and Future WorkThis paper presented a new framework intended toautomatically extract specifications of network protocolimplementations and test it for implementation flaws. Weexplained how the framework extracts protocolspecifications by learning its behavior model andconstructing a corresponding FSA. The framework alsoextracts individual message syntax allowing abstractingthe set of fuzzing functions to apply to any protocolimplementation. The framework was applied to multipleFTP server implementations and succeeded in finding oldand new vulnerabilities.There is still a lot of work to be done towards creating afully automated fuzzing system. Our framework can beextended by incorporating new abstraction and fuzzingfunctions. It can also be extended by implementingadditional fuzzing models. For example, the proxy servercan be improved to automatically replay previouslyrecorded traffic. In addition, the framework should betested on other than ASCII protocol implementations andcompared with other fuzzing tools. Different automatedsolutions aiming to replace the abstraction function shouldbe considered, such as use of similarity scoring techniquesof sequence alignment algorithms.In addition, the framework can be used as a start towardsautomated honeypot construction. That is, using ourframework it is possible to automatically extract protocolspecifications which can be incorporated with a separatetool that uses these specifications to mimic real protocolimplementations, hence interacting with potentialattackers.

IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 12][13][14][15][16][17][18][19]AutoFuzz: Automated Network Protocol Fuzzing Framework.http://autofuzz.sourceforge.net/M. A. Beddoe, Network Protocol Analysis Using BioinformaticsAlgorithms. Available: http://www.4tphi.net/ awalters/PI/pi.pdf.A.K. Bhushan, “Request for Comments: 114”, Network WorkingGroup, 1971. Available: http://www.faqs.org/rfcs/rfc114.html.P. M. Comparetti, G. Wondracek, C. Kruegel, E. Kirda, “Prospex:Protocol Specification Extraction”, Proceedings of the 2009 30thIEEE Symposium on Security and Privacy, p.110-125, May 17-20,2009.Y. Hsu, G. Shu and D. Lee, “A Model-based Approach to SecurityFlaw Detection of Network Protocol Implementation”, IEEE ICNP,2008.JAVA SOCKS Server. http://jsocks.sourceforge.net/.M. Leech, M. Ganis, Y. Lee, R. Kuris, D. Koblas and L. Jones,“Request for Comments: 1928”, Network Working Group, 1996.Available: http://www.ietf.org/rfc/rfc1928.txt.S. B. Needleman and C. D. Wunsch. “A general method applicableto the search for similarities in the amino acid sequence of twoproteins.” Journal of Molecular Biology, 48:444-453, 1970.Open & Compact FTP Server. http://sourceforge.net/projects/openftpd/.J. Postel and J. Reynolds, “Request for Comments: 959”, NetworkWorking Group, 1985. akanen , J. DeMott and C. Miller, "Fuzzing for SoftwareSecurity Testing and Quality Assurance", Artech House, Inc.,Norwood, MA, 2008The Firezilla Project. http://filezilla-project.org/.The JAVA Swing x/swing/packagesummary.html.The Java Universal Network/Graph Framework (JUNG).http://jung.sourceforge.net/.The National Vulnerability Database. http://web.nvd.nist.gov.The Peach Project. http://peachfuzzer.com/.The ProxyFuzz Project. http://theartoffuzzing.com/.Windows Proxifier. http://www.proxifier.com/Wing FTP Server. http://www.wftpserver.com/.Serge Gorbunov is working towards B.S. atthe University of Toronto Mississauga,specializing in Information Security. Heworked at IBM Canada lab as a SoftwareDeveloper Intern during summer of 2009 andhas been a member of the Canadian HoneynetProject since May of 2009.Arnold Rosenbloom is a Senior Lecturer atthe Department of Mathematical andComputational Sciences, University ofToronto at Mississauga, home of the UofTInformation Security Program. His interestsrange from Information Security to WebProgramming,fromComputationalComplexity to first year pedagogy.245

applied to a variety of File Transfer Protocol (FTP) server implementations, confirming old and discovering new vulnerabilities. Key words: Automated Fuzzing, Software Security, Vulnerability Detection 1. Introduction 1.1 Background Flaws in the implementations of network protocols are som