An Intrusion-Detection Model - Carnegie Mellon University

Transcription

AN INTRUSION-DETECTIONDorothyMODELE. DenningSRI International333 Ravenswood Ave.Menlo Park, CA 94025.A model of a real-time intrusion-detectionexpertsystem capable of detecting break-ins, penetrations,andother forms of computer abuse is described.The model isbased on the hypothesis that security violations can bedetected by monitoring a system’s audit records forabnormal patterns of system usage. The model includesprofiles for representingthe behavior of subjects withrespect to objects in terms of metrics and statisticalmodels, and rules for acquiring knowledge about thisbehavior from audit records and for detecting anomalousbehavior.The model is independent of any particularsystem, application environment,system vulnerability,ortype of intrusion, thereby providing a framework for ageneral-purposeintrusion-detectionexpert system. 1. IntroductionThis paper describes a model for a real-timeintrusion-detectionexpert system that aims to detect awide range of security violations ranging from attemptedbreak-ins by outsiders to system penetrationsand abusesby insiders. The development of a real-time intrusiondetection system is motivated by four factors: (1) mostexisting systems have security flaws that render themsusceptible to intrusions, penetrations,and other forms ofabuse; finding and fixing all these deficiencies is notfeasible for technical and economic reasons; (2) existingsystems with known flaws are not easily replaced bysystems that are more secure -- mainly because thesystems have attractive features that are missing in themore-secure systems, or else they cannot be replaced foreconomic reasons; (3) developing systems that areabsolutely secure is extremely difficult, if not generallyimpossible; and (4) even the most secure systems arevulnerable to abuses by insiders who misuse theirprivileges. Penetration by legitimate user -- A userattemptingto penetrate the securitymechanisms in the operating system mightexecute different programs or trigger moreprotection violations from attempts to accessunauthorizedfiles or programs.If his attemptsucceeds, he will have access to commands andfiles not normaily permitted to him.remote 118IEEEMasquerading or successful break-in -Someone logging into a system through anunauthorizedaccount and password mighthave a different login time, location, orconnection type from that of the account’slegitimate user. In addition, the penetrator’sbehavior may differ considerably from that ofthe legitimate user; in particular,he mightspend most of his time browsing throughdirectories and executing system statuscommands, whereas the legitimate user mightconcentrateon editing or compiling andlinking programs.Many break-ins have beendiscovered by security officers or other userson the system who have noticed the allegeduser behaving strangely.o Leakage by legitimate user -- A user trying toleak sensitive documents might log into thesystem at unusual times or route data toThe model is based on the hypothesis thatexploitation of a system’s vulnerabilitiesinvolvesabnormal use of the system; therefore, security violationscould be detected from abnormal patterns of systemusage. The following examples illustrate:CH2292-l/86/OOOO/Ol 18 01.0001986Attempted break-in -- Someone attemptingtobreak into a system might generate anabnormally high rate of password failures withrespect to a single account or the system as awhole.printersnot normallyused.Inference by legitimate user -- A userattemptingto obtain unauthorizeddata froma database through aggregation and inferencemight retrieve more records than usual.

c Trojan horse -- The behavior of a Trojanhorse planted in or substitutedfor a programmay differ from the legitimate program interms of its CPU time or 1/0 activity. o Activity rules -- actions taken when somecondition is satisfied, which update profiles,detect abnormal behavior, relate anomalies tosuspected intrusions, and produce reports.T4rus -- A virus planted in a system mightcause an increase in the frequency ofexecutable files rewritten, storage used byexecutable files, or a particular program beingexecuted as the virus spreads.The model can be regarded as a rule-baaed patternmatching system. When an audit record is generated, itis matched against the profiles. Type information in thematching profiles then determines what rules to apply toupdate the profiles, check for abnormal behavior, andreport anomalies detected.The security officer assists inestablishing profile templates for the activities to monitor,but the rules and profile structures are largely systemindependent.Denial-of-Service -- An intruder able tomonopolize a resource (e.g., network) mighthave abnormally high activity with respect tothe resource, while activity for all other usersis abnormally low.The basic idea is to monitor the standard operationson a target system: logins, command and programexecutions, file and device accesses, etc., looking only fordeviations in usage. The model does not contain anyspecial features for dealing with complex actions thatexploit a known or suspected security flaw in the targetsystem; indeed, it has no knowledge of the target system’ssecurity mechanisms or its deficiencies.Although a flawbased detection mechanism may have some value, itwould be considerably more complex and would be unableto cope with intrusions that exploit deficiencies that arenot suspected or with personnel-relatedvulnerabilities.By detecting the intrusion, however, the security officermay be better able to locate vulnerabilities.Of course, the above forms of aberrant usage canalso be linked with actions unrelated to security.Theycould be a sign of a user changing work tasks, acquiringnew skills, or making typing mistakes; software updates;or changing workload on the system. An importantobjective of our current research is to determine whatactivities and statistical measures provide the bestdiscriminatingpower; that is, have a high rate ofdetection and a low rate of false alarms.2. Overviewof ModelThe model is independent of any particular system,application environment,system vulnerability,or type ofintrusion, thereby providing a framework for a generalpurpose intrusion-detectionexpert system, which we havecalled IDES. A more detailed description of the designand applicationThe remainder of this paper describescomponents of the model in more detail.3. Subjectsof IDES is given in our final reportl. Subjects -- initiators of activitysystem -- normally users.on a targetObjects -- resources managed by the system -files, commands, devices, etc.Audit records -- generated by the targetsystem in response to actions performed orattempted by subjects on objects -- user Iogin,command execution, file access, etc.Object.e are the receptors of actions and typicallyinclude such entities as files, programs, messages, records,terminals, printers, and user- or program-createdstructures.When subjects can be recipients of actions(e.g., electronic mzi]), then those subjects are alsoconsidered to be objects in the model. Objects aregrouped into classes by type (program, text file, etc.).Additional structure may also be imposed, e.g., recordsnmy be grouped into files or database relations; files maybe grouped into directories.Different environmentsmayrequire different object granularity;e.g., for somee Profiles-- structures that characterizethebehavior of subjects with respect to objects interms of statistical metrics and models ofobserved activity.Profiles are automaticallygenerated and initialized from templates. Anomaly records -- generatedbehavior is detected.ObjectsSubjects are the initiators of actions in the targetsystem. A subject is typically a terminal user, but mightalso be a process acting on behalf of users or groups ofusers, or might be the system itself. All activity arisesthrough commands initiated by subjects.Subjects may begrouped into different classes (e.g., user groups) for thepurpose of controlling access to objects in the system.User groups mty overlap.The model has six main components: andthewhen abnormaldatabase119applications,granularityat the record level may

be desired, whereasthe file or directory4. Auditfor most applications,level may suffice.granularityMost operations on a system involve multipleobjects. For example, file copying involves the copyprogram, the original file, and the copy. Compilinginvolves the compiler, a source program file, an objectprogram file, and possibly intermediatefiles andadditional source files referenced through “include”statements.Sending an electronic mail message involvesthe mail program, possibly multiple destinations in the“To” and “cc” fields, and possibly “include” files.atRecordsAudit Records are 6-tuples representingperformed by subjects on objects:actions Subject, Action, Objcct, Exception-Condition,Resource-Usage, Time-stamp where Our model decomposes all activity into single-objectactions so that each audit record references only oneobject. File copying, for example, is decomposed into anexecute operation on the copy command, a read operationon the source file, and a write operation on thedestination file. The following illustrates the auditrecords generated in response to a commandAction -- operation performed by the subjecton or with the object, e.g., login, logout, read,execute.Exception -Condit ion-- denotes which, if any,exception condition is raised on the return.This should be the actual exception conditionraised by the system, not just the apparentexception condition returned to the subject.COPYGAME.EXE TO Library GAME.EXEissued by user Smithto copyinto the Librarydirectory;because Smith an executablethe copyGAMEfileis aborteddoes not have write permissiontoResource-l[sage -- list of quantitativeelements,where each element gives the amount used ofsome resource, e.g., number of lines or pagesprinted, number of records read or written,CPU time or 1/0 units used, session elapsedtime. Library :Time-stamp -- unique time/datestampidentifying when the action took place.Decomposing complex actions has three advantages:First, since objects are the protectableentities of asystem, the decompositionis consistent with theprotectio:l mechanisms of systems. Thus, IDES canpotentially discover both attemptedsubversions of theaccess controls (by noting an abnormalityin the numberof exception conditions returned) and successfulsubversions (by noting an abnormalityin the set ofobjects accessible to the subject).Second, single-objectaudit records greatly simplify the model and itsapplication.Third, the audit records produced byexisting systems generally contain a single object, thoughsome systems provide a way of linking together the auditrecords associated with a “job step” (e.g., copy orcompile) so that all files accessed during execution of aprogram can be identified.(Smith, execute, tLibrary COPY. EXE, O,CPU OOO02, 11058521678)zSmith GAME.EXE,O,(Smith, read,RECORDS O, 11058521679)(Smith, write,fLibraryJGAME.EXE, write-viol,RECORDS O,11058521680)We assume that each field is self-identifying,eitherimplicitly or explicitly; e.g., the action field either impliesthe type of the expected object field or else the objectfield itself specifies its type. If audit records are collectedfor multiple systems, then an additional field is needed fora system identifier.Since each audit record specifies a subject andobject, it is conceptually associated with some cell in an‘(audit matrix” whose rows correspond to subjects andcolumns to objects. The audit, matrix is analogous to the“access-matrix”protect ion model, which specifies therights of subjects to access objects; that is, the actionsthat each subject is authorized to perform on each object.Our intrusion-detectionmodel differs from the accessmatrix model by substitutingthe concept of “actionperformed” (as evidenced by an audit record associatedwith a cell in th matrix) for “action authorized”(asspecified by an access right in the matrix cell). indeed,since activity is observed without regard forauthorization,there is an implicit assumption that theaccess controls in the system permitted an action tooccur. The task of intrusion detection is to determinewhether activity is unusual enough to suspect anintrusion.Every statistical measure used for this purposeis computed from audit records associated with one ormore cells in the matrix.The target system is responsible for auditing and fortransmittingaudit records to the intrusion-detectionsystem for analysis (it may also keep an independentaudit trail). The time at which anlit records aregenerated determines what type of data is available.Ifthe audit record for some action is generated at the timean action is requested, it is possible to measure bothsuccessful and unsuccessful attempts to perform theactivity, even if the action should abort (e.g., because of aprotection violation) or cause a system crash. If it isgenerated when the action completes, it is possible tomeaaure the resources consumed by the action andexception conditions that may cause the action to120

terminate abnormally (e.g., because of resource overflow).Thus, auditing an activity after it completes has theadvantage of providing more information,but thedisadvantageof not allowing immediate detection ofabnormalities,especially those related to break-ins andsystem crashes. Thus, activities such as login, executionof high risk commands (e. g., to acquire special“superuser”privileges), or access to sensitive data shouldbe audited when they are attemptedso that penetrationscan be detected immediately;if resource-usagedata arealso desired, additional auditing can be performed oncompletion as well. For example, access to a databasecontaining highly sensitive data may be monitored whenthe access is attemptedand then again when it completesto report the number of records retrieved or updated.Most existing audit systems monitor session activity atboth initiation (login), when the time and location of loginare recorded, and termination(logout), when theresources consumed during the session are recorded.They do not, however, monitor both the start and finishof command and program execution or file accesses.modifying the software that produces the audit records inthe target system, or by writing a filter that translates therecords into a standard format.5. ProfilesAn activity profile characterizesthe behavior of agiven subject (or set of subjects) with respect to a givenobject (or set thereof), thereby serving as a signature ordescription of normal activity for its respective subject(s)and object(s).Observed behavior is characterizedinterms of a statistical metric and model. A metric is arandom variable x representinga quantitativemeasureaccumulatedover a period. The period may be a fixedinterval of time (minute, hour, day, week, etc.), or thetime between two audit-relatedevents (i.e., between loginand Iogout, program initiation and program termination,file open and file close, etc.). Observations(samplepoints) xi of s obtained from the audit records are usedtogether with a statistical model to determine whether anew observation is abnormal.The statistical modelmakes no assumptions about the underlying distributionof ; all knowledge about x is obtained from observations.IBM’s System h[anagernentFacilities (SMF)2, for example,audit only the completion of these activities.Although the auditing mechanisms of existingsystems approximatethe model, they are typicallydeficient in terms of the activities monitored and recordBefore describing the structure, generation, andapplication of profiles, wc shall first discuss statisticalmetrics and models.structures generated.For example, Berkeley 4.2 UNfX3monitors command usage but not file accesses or fileprotection violations.Some systems do not record alllogin failures. Programs, including system programs,invoked below the command level are not explicitlymonitored (their activity is included in that for the mainprogram).The level at which auditing should take place,however, is unclear, since too much auditing couldseverely degrade performance on the target system oroverload the intrusion-detectionsystem.5.1. Metrics\Ve define three types of metrics: Deficiencies in the record structures are also present.h40st SMF audit records, for example, do not contain asubject field; the subject must be reconstructedby linkingtogether the records associated with a given job.Protectionviolations are sometimes provided throughseparate record formats rather than as an exceptioncondition in a common record; VM password failures atlogin, for example, are handled this way (there areseparate records for successful Iogins and passwordfailures). Event Counter -- x is the number of auditrecords satisfying some property occurringduring a period (each audit record correspondsto an event). Examples are number of Ioginsduring an hour, number of times somecommand is executed during a login session,and number of password failures during aminute.Interval Timer -- x is the length of timebetween two related events; i.e., the differencebetween the time-stampsin the respectiveaudit records. An example is the length oftime between successive logins into anaccount. Resource Measure -- x is the quantity ofresources consumed by some action during aperiod as specified in the Resource-Usagefieldof the audit records. Examples are the totalnumber of pages printed by a user per dayand total amount of CPU time consumed bysome program during a single execution.Notethat a resource measure in our intrusion-Another problem with existing audit records is thatthey contain little or no descriptive information toidentify the values contained therein.Every record typehas its own structure,and the exact format of each recordtype must be known to interpret the values. A uniformrecord format with self-identifyingdata would bepreferable so that the intrusion-detectionsoftware can besystem-independent.This could be achieved either by121

mean for some parameterdetection model is implementedas an eventcounter or interval timer on the target system.For example, the number of pages printedduring a login session is implementedon thetarget system as an event counter that countsthe number of print events between login andlogout; CPU time consumed by a program asan interval timer that runs between programinitiation and termination.Thus, whereasevent counters and interval timers measureevents at the audit-recordlevel, resourcemeasures acquire data from events on thetarget system that occur at a level below theaudit records. The ltesource-Usagefield ofaudit records thereby provides a means ofdata reduction so that fewer events need beexplicitly recorded in audit records.5.2. Statisticalmean dx stdevBy Chebyshev’s inequality, the probability ofa value falling outside this interval is at. mostl/d2; for d 4, for example, it is at most.0625. Note that O (or null) occurrencesshould be included so as not to bias the data.This model is applicable to event counters,interval timers, and resource measuresaccumulatedover a fixed time interval orbetween two related events. It has twoadvantages over an operationalmodel: First,it requires no prior knowledge about normalactivity in order to set limits; instead, it learnswhat constitutes normal activity from itsobservations,and the confidence intervalsautomaticallyreflect this increased knowledge.%cond, because the confidence intervalsdepend on observed data, what is consideredto be normai for one user can be considerablydifferent from another.ModelsGiven a metric for a random variable x and nobservationsZl, . Zn, the purpose of a statistical modelof is t,. determine whet, her a new observation ZtL l isabnormalfollowingwith respect to the previous observations.models may be included in IDES:TheA slight variation on the mean and standarddev; ation model is to weight the computations,with greater weights placed on more recentvalues.1. Operational Mode!. This model is based onthe operationalassumption that abnormalitycan be decided by comparing a newobservation of z against fixed limits.Although the previous sample points for z arenot used, presumably the limits aredetermined from prior observations of thesame type of variable.The operational modelis most applicable to metrics where experiencehas shown that certain values are frequentlylinked with intrusions.An example is anevent counter for the number of passwordfailures during a brief period, where more than10, say, suggests an attemptedbreak-in.3. Multivariafe ,Ifodel. This model is similar tothe mean and standard deviation model exceptthat it is based on correlations among two ormore metrics.This model would be useful ifexperimentaldata show that betterdiscriminatingpower can be obtained fromcombinationsof related measures rather thanindividually -- e.g., CPU time and 1/0 unitsused by a program, login frequency andsession elapsed time (which may be inverselyrelated).2. Mean and Standard Deviation Model. Thismodel is based on the assumption that all weknow about xl, . Xn are mean and standarddeviationmoments:as determined4. Markov Process Model. This model, whichapplies only to event counters, regards eachdistinct type of event (audit record) as a statevariable, and uses a state transition matrix tocharacterizethe transition frequencies betweenstates (rather than just the frequencies of theindividual states -- i.e., andit records -- takenseparately).A new observation is defined tobe abnormal if its probability as determinedby the previous state and the transition matrixis too low. This model might be useful forlooking at transitions between certaincommands where command sequences wereimportant.from its first twosum xl . xnsumsquares x; . x:mean sumjnsumsquares‘tdev ‘qrt(A new observation(n-1)- ‘ean2)Z% l is defined to beabnormal if it falls outsideinternal that is d standardd:a confidencedeviations from the122

5. Time Series Model. This model, which usesan interval timer together with an eventcounter or resource measure, takes intoaccount the order and inter-arrivaltimes ofthe observationsxl, . Zn, as well as theire Threshold -- parameter(s)defining limit(s)used in statistical test to determineabnormality.This field and its interpretationis determined by the statistical model(Variable-Type).For the operational model, itis an upper (and possibly lower) bound on thevalue of an observation;for the mean andstandard deviation model, it is the number ofstandard deviations from the mean.values. A new observation is abnormal if itsprobability of occurring at that time is toolow. A time series has the advantage ofmeasuring trends of behavior over time anddetecting gradual but significant shifts inbehavior, but the disadvantageof being morecostly than mean and standard deviation.Subject-Value -- value of currentStructureobservationpreviousSubject and Object-Independent A profilea profileVariable-Name -- name of variable. Action-Pattern -- pattern that matches zero ormore actions in the audit records, e.g., ‘login’,‘read’, ‘execute’.identifiedby Variable-Name,1PExamplesAll componentsoffor Value.leaves unspecifiedthe exactthe followingas being useful:string of characterswild card matching any stringmatch any numeric string.match any string in list.the string matched by p isassociated with namematch pattern pl followed by p2.match pattern pl or p2.match pattern pl and p2.match anything but pattern p.pl p2pl / p2pl , p2Period -- time interval for measurement,e.g.,day, hour, minute (expressed in terms of clockunits). This component is null if there is nofixed time interval; i.e., the period is theduration of the activity.are count,no parameters.we have identifiedconstructs#IN(list)p nameResource-Usage-Pattern -- pattern thatmatches on the itesource-Usagefield of anaudit record.requiresexceptthe model‘string’*onmodelare invariantfor patterns,of(first two moments).and Object-Pattern.Althoughformatdistributionthese parametersis uniquelySubject-Pattern,recent)For the mean and standardmodel,The operationalon theused by theto representsum, and sum-of-squaresComponents:Exception-Pattern -- pattern that matchesthe Exception-Conditionfield of an auditrecord.modelvalues.deviationSNOBOL-like (mostand parametersstatisticalAn activity profile contains information thatidentifies the statistical model and metric of a randomvariable, as well as the set of audit events measured bythe variable.The structure of a profile contains 10components, the first 7 of which are independentof thespecific subjects and objects measured: Variable-Name, ttern,Period, Variable-Type,Threshold, Sub j ect-Pattern,Object-Pattern,Value on theObject-Pattern -- pattern that matchesObject field of audit records. 5.3. ProfileComponents:Svbject-Pattern -- pattern that matchesSubject field of audit records. Other statistical models can be considered, forexample, models that use more than the first twomoments but less than the full set of values.and Object-Dependentof patternsare:‘Smith’* -i User -- match any string and assign to User‘ Library * ‘ -- match filesm Library directory-- match filesin Special-FilesIN(Speclal-Files)‘ CPU.’ // Amount -- match string ‘ CPU ’ followedby Integer; assign integer to AmountVariable-Type -- name of abstract data typethat defines a particular type of metric andstatistical model, e.g., event counter withmean and standard deviation model.The followingquantitybasis.resourcemodel.123is a sampleof outputThe variablemeasureprofileto user Smith’sfor measuringterminaltheon a sessiontype ResourceByActivityusing the mean and standarddenotesadeviation

SessionOutput‘logout old:Subject-Pattern:Object-Pattern:Value:o‘ SessionOutput ’# amount ResourceByActivity4‘Smith’*record of . . .Whenever the intrusion-detectionsystem receives anaudit record that matches a variable’s patterns, it updatesthe variable’s distributionand checks for abnormality.The distributionof values for a variable is thus derived -i.e., learned -- as audit records matching the profilepatterns are processed.5.4. ProfilesSubject -Object: actions performed by singlesubject on single object -- e.g., user Smithfile Foo. for ClassesProfiles can bc defined for individual subject-objectpairs (i.e., where the Subject and Object patterns matchspecific names, e.g, Subject ‘Smith’ and Object ‘Foo’), orfor aggregates of subjects and objects (i.e., where theSubject and Object patterns match sets of names) asshown in Figure 5-1. For example, file-activity profilescould be created for pairs of individual users and files, forgroups of users with respect to specific files, for individualusers with respect to classes of files, or for groups of userswith respect to file classes. The nodes in the lattice areinterpretedas follows:Subject -Object Class: actions performed bysingle subject aggregated over all objects inthe class. Theclass ofobjectsmightberepresentedas a pattern match on a subfieldof the Object field thatspecifiestheobject’stype (class), as a pattern match directly on theobject’s name (e. g., the pattern ‘*.EXE’ for allexecutable files), or as a pattern match thattests whether the object is in some list (e.g.,“IN(hit-list)”).Subject Class - Object: actions performed onsingle object aggregated over all subjects inthe class-- e.g., privileged users- directory file Library , nonprivilegedusers - directoryfile Library .o Subject Class - Object Class: actionsaggregated over all subjects in the class andobjects in the class -- privileged users - systemfiles, nonprivilegedusers - systemfiles. Sz/bject: actions performed by single subjectaggregatedover all objects -- e.g., user sessionactivity. Object:actionsaggregatedperformedover all subjectson a single object-- e.g., passwordfile activity.Figure5-1:Hierarchyof Subjectsand Objects. System/ \\/ObjectClassSubj ectClassI/\II\/IObjectSubject\/I\/II\/II ssSubjectClass-Object/\\/Subject-Object Subject Class: actions aggregated over allsubjects in the class -- e.g., privileged useractivity, nonprivilegeduser activity.Object Class: actions aggregated over allobjects in the class -- e.g., executable fileactivity.System: actionsand objects.aggregatedover all subjectsTbe random variable represented by a profile for aclass can aggregate activity for the class in two ways: 124Class-as-a-u,hole activity -- The set of allsubjects or objects in the class is treated as asingle entity, and each observation of therandomvariablerepresents aggregate activityfor the entity. Anexampleisaprofilefor theclass of all users representingthe averagenumber of logins into the system per day,where all users are treated as a single entity.

Ag reyate individual actizity-- The subjectsor objects in the class are treated as distinctentities, and each observation of the randomvariable represents activity for some memberof the class, An exarnpleis a profile for theclass of all users characterizingthe averagenumber of logins by any one user per day.Thus, theprofilerepresents a’typical’mernberof the class.The first approach has the obvious disadvantageofrequiring manual interventionon the part of the securityofficer, The second approach overcomes thisdisadvantage,but introduces two others. The first is thatit does not automaticallydeal with startup conditions,where there will be many existing subjects and objects.The second is that it requires a subject-objectprofile tobe generated for any pair that is a candidate formonitoring, even if the subject never uses the particularobject. This could cause many more profiles thannecessary to be generated.For example, suppose fileaccesses are monitored at the level of individual users andfiles. Consider a system with 1000 users, where each userhas an average of 200 files, giving 200,000 files total and200,000,000 possible combinationsof user

general-purpose intrusion-detection expert system. 1. Introduction This paper describes a model for a real-time intrusion-detection expert system that aims to detect a wide range of security violations ranging from attempted break-ins by outsiders to system penetrations and abuses by insiders. The development of a real-time intrusion-