Introduction To Static Analysis - Carnegie Mellon University

Transcription

Introduction to Static Analysis17-654: Analysis of Software ArtifactsJonathan Aldrich2/21/201117-654: Analysis of Software ArtifactsStatic AnalysisFind the Bug!1Source: Engler et al., Checking System RulesUsing System-Specific, Programmer-WrittenCompiler Extensions, OSDI ’00.disable interruptsERROR: returningwith interrupts disabledre-enable interrupts2/21/201117-654: Analysis of Software ArtifactsStatic Analysis21

Metal Interrupt AnalysisSource: Engler et al., Checking System RulesUsing System-Specific, Programmer-WrittenCompiler Extensions, OSDI ’00.enable err(double enable)is enabledenabledisableis disabledend path err(end pathwith/intrdisabled)disable err(double disable)2/21/201117-654: Analysis of Software ArtifactsStatic AnalysisApplying the Analysis3Source: Engler et al., Checking System RulesUsing System-Specific, Programmer-WrittenCompiler Extensions, OSDI ’00.initial state is enabledtransition to is disabledfinal state is disabled: ERROR!transition to is enabledfinal state is enabled is OK2/21/201117-654: Analysis of Software ArtifactsStatic Analysis42

Outline Why static analysis? The limits of testing and inspection What is static analysis?How does static analysis work?AST AnalysisDataflow Analysis2/21/201117-654: Analysis of Software ArtifactsStatic Analysis52/21/201117-654: Analysis of Software ArtifactsStatic Analysis63

Process, Cost, and QualityProcess intervention,testing, and inspectionyield first-ordersoftware qualityimprovementSlide: William ScherlisAdditional technologyand tools are needed toclose the gapPerfection(unattainable)Critical SystemsAcceptabilitySoftwareQualityCMM:12S&S, Agile, RUP, etc: less rigorous2/21/20113.45ProcessRigor, Costmore rigorous17-654: Analysis of Software ArtifactsStatic Analysis7Root Causes of Errors Requirements problems Don’t fit user needsStatic Analysis ContributionsHard Design flaws Does design achieve goals?Is design implemented right?Implementation errors Security HardLacks required qualities Is data initialized?Is dereference/indexing valid?Are threads synchronized?Are interface semantics followed?Are invariants maintained?Taxonomy: [Chillarege et al., Orthogonal Defect Classification]2/21/201117-654: Analysis of Software ArtifactsStatic Analysis84

Existing Approaches Testing: is the answerright? Verifies features workFinds algorithmicproblems Inspection: is the qualitythere? Missing requirementsDesign problemsStyle issuesApplication logic2/21/2011LimitationsNon-local interactionsUncommon pathsNon-determinismStatic analysis: will I getan answer? Verifies non-localconsistencyChecks all pathsConsiders all nondeterministic choices17-654: Analysis of Software ArtifactsStatic Analysis9Static Analysis Finds “Mechanical” Errors Defects that result from inconsistently following simple,mechanical design rules Security vulnerabilities Memory errors Resource leaks Violations of API or framework rules Exceptions Encapsulation violations Race conditions 2/21/2011Buffer overruns, unvalidated inputKNull dereference, uninitialized dataKMemory, OS resourcesKe.g. Windows device drivers; real time libraries; GUI frameworksArithmetic/library/user-definedAccessing internal data, calling private functionsKTwo threads access the same data without synchronization17-654: Analysis of Software ArtifactsStatic Analysis105

Empirical Results on Static Analysis Nortel study [Zheng et al. 2006] 3 C/C projects3 million LOC totalEarly generation static analysis tools Conclusions 2/21/2011Cost per fault of static analysis 61-72% comparedto inspectionsEffectively finds assignment, checking faultsCan be used to find potential securityvulnerabilities17-654: Analysis of Software ArtifactsStatic Analysis11Empirical Results on Static Analysis InfoSys study [Chaturvedi 2005] 5 projects Average 700 function pointseach Compare inspection with andwithout static analysis Conclusions Fewer defects Higher productivityAdapted from [Chaturvedi 2005]2/21/201117-654: Analysis of Software ArtifactsStatic Analysis126

Quality Assurance at Microsoft (Part 1) Original process: manual code inspection Effective when system and team are smallToo many paths to consider as system grewEarly 1990s: add massive system and unit testing Tests took weeks to run Inefficient detection of common patterns, security holes Diversity of platforms and configurationsSheer volume of testsNon-local, intermittent, uncommon path bugsWas treading water in Windows Vista developmentEarly 2000s: add static analysis More on this later2/21/201117-654: Analysis of Software ArtifactsStatic Analysis13Outline Why static analysis? What is static analysis? Abstract state space exploration How does static analysis work? What do practical tools look like? How does it fit into an organization?2/21/201117-654: Analysis of Software ArtifactsStatic Analysis147

Static Analysis Definition Static program analysis is the systematicexamination of an abstraction of a program’sstate space Metal interrupt analysis Abstraction 2 states: enabled and disabled All program information—variable values, heap contents—isabstracted by these two states, plus the program counterSystematic Examines all paths through a function Each path explored for each reachable state 2/21/2011What about loops? More laterKAssume interrupts initially enabled (Linux practice)Since the two states abstract all program information, theexploration is exhaustive17-654: Analysis of Software ArtifactsStatic Analysis15Outline Why static analysis? What is static analysis? How does static analysis work? Termination AST Analysis Dataflow Analysis2/21/201117-654: Analysis of Software ArtifactsStatic Analysis168

How can Analysis Search All Paths? How many paths are in a program? Exponential # paths with if statementsInfinite # paths with loopsHow could we possibly cover them all? Secret weapon: Abstraction Finite number of (abstract) statesIf you come to a statement and you’ve alreadyexplored a state for that statement, stop. The analysis depends only on the code and the currentstateContinuing the analysis from this program point and statewould yield the same results you got beforeIf the number of states isn’t finite, too bad Your analysis may not terminate2/21/201117-654: Analysis of Software ArtifactsStatic Analysis17Example1. void foo(int x) {2.if (x 0)3.bar(); cli();4.else5.baz(); cli();6.while (x 0) {7.sti();8.do work();9.cli();10.}11.sti();12. }2/21/2011Path 1 (before stmt): true/no loop2: is enabled3: is enabled6: is disabled11: is disabled12: is enabledno errors17-654: Analysis of Software ArtifactsStatic Analysis189

Example1. void foo(int x) {2.if (x 0)3.bar(); cli();4.else5.baz(); cli();6.while (x 0) {7.sti();8.do work();9.cli();10.}11.sti();12. }2/21/2011Path 2 (before stmt): true/1 loop2: is enabled3: is enabled6: is disabled7: is disabled8: is enabled9: is enabled11: is disabledalready been here17-654: Analysis of Software ArtifactsStatic Analysis19Example1. void foo(int x) {2.if (x 0)3.bar(); cli();4.else5.baz(); cli();6.while (x 0) {7.sti();8.do work();9.cli();10.}11.sti();12. }2/21/2011Path 3 (before stmt): true/2 loops2: is enabled3: is enabled6: is disabled7: is disabled8: is enabled9: is enabled6: is disabledalready been here17-654: Analysis of Software ArtifactsStatic Analysis2010

Example1. void foo(int x) {2.if (x 0)3.bar(); cli();4.else5.baz(); cli();6.while (x 0) {7.sti();8.do work();9.cli();10.}11.sti();12. }2/21/2011Path 4 (before stmt): false2: is enabled5: is enabled6: is disabledalready been hereall of state space has beenexplored17-654: Analysis of Software ArtifactsStatic Analysis21Outline Why static analysis?What is static analysis?How does static analysis work?AST Analysis Abstract Syntax Tree Representation Simple Bug Finders: FindBugs Dataflow Analysis2/21/201117-654: Analysis of Software ArtifactsStatic Analysis2211

Representing Programs To analyze software automatically, wemust be able to represent it precisely Some representations 2/21/2011Source codeAbstract syntax treesControl flow graphBytecodeAssembly codeBinary code17-654: Analysis of Software ArtifactsStatic Analysis23Abstract Syntax Trees A tree representation of source code Based on the language grammar One type of node for each production S :: x : a : xa S :: while b do S whilebS2/21/201117-654: Analysis of Software ArtifactsStatic Analysis2412

Parsing: Source to AST Parsing process (top down)1. Determine the top-level production to use2. Create an AST element for that production3. Determine what text corresponds to eachchild of the AST element4. Recursively parse each child Algorithms have been studied in detail For this course you only need the intuitionDetails covered in compiler courses2/21/201117-654: Analysis of Software ArtifactsStatic Analysis25Parsing Exampley : x;z : 1;while y 1 doz : z * y;y : y – 1 Top-level production? What are the parts?2/21/201117-654: Analysis of Software ArtifactsStatic Analysis2613

Parsing Exampley : x;z : 1;while y 1 doz : z * y;y : y – 1 Top-level production? ;S1; S2What are the parts?2/21/201117-654: Analysis of Software ArtifactsStatic Analysis27Parsing Example;y : x;z : 1;while y 1 doz : z * y;y : y – 1 z : 1; whileKTop-level production? y : xS1; S2What are the parts? 2/21/2011y : xz : 1; while K17-654: Analysis of Software ArtifactsStatic Analysis2814

Parsing Exampley : x;z : 1;while y 1 doz : z * y;y : y – 1 : yz : 1; whileKxTop-level production? ;S1; S2What are the parts? y : xz : 1; while K2/21/201117-654: Analysis of Software ArtifactsStatic Analysis29Parsing Example;y : x;z : 1;while y 1 doz : z * y;y : y – 1 y;x z : 1whileKTop-level production? : S1; S2What are the parts? 2/21/2011y : xz : 1; while K17-654: Analysis of Software ArtifactsStatic Analysis3015

Parsing Exampley : x;z : 1;while y 1 doz : z * y;y : y – 1 : y;xTop-level production? ;: zwhileK1S1; S2What are the parts? y : xz : 1; while K2/21/201117-654: Analysis of Software ArtifactsStatic Analysis31Parsing Example;y : x;z : 1;while y 1 doz : z * y;y : y – 1 yTop-level production? : ;x: zwhile1 y 1z : .S1; S2What are the parts? 2/21/2011y : xz : 1; while K17-654: Analysis of Software ArtifactsStatic Analysis3216

Parsing Exampley : x;z : 1;while y 1 doz : z * y;y : y – 1 : y;xTop-level production? ;: z1 z : .S1; S2What are the parts? whiley 1y : xz : 1; while K2/21/201117-654: Analysis of Software ArtifactsStatic Analysis33Parsing Example;y : x;z : 1;while y 1 doz : z * y;y : y – 1 yTop-level production? : ;x: z2/21/2011 ;S1; S2What are the parts? 1whiley : xz : 1; while K17-654: Analysis of Software ArtifactsStatic Analysisy 1 z: z*y y: y-13417

Parsing Exampley : x;z : 1;while y 1 doz : z * y;y : y – 1 : y;xTop-level production? ;: z1 ;S1; S2What are the parts? whiley 1y : xz : 1; while Kz2/21/2011: y: y-1z*y17-654: Analysis of Software ArtifactsStatic Analysis35Parsing Example;y : x;z : 1;while y 1 doz : z * y;y : y – 1 yTop-level production? : ;x: z2/21/20111 ;S1; S2What are the parts? whiley : xz : 1; while K17-654: Analysis of Software ArtifactsStatic Analysisy 1: z*y: y-136zy18

Parsing Exampley : x;z : 1;while y 1 doz : z * y;y : y – 1 : y;xTop-level production? ;: z1 ;S1; S2What are the parts? whiley : xz : 1; while K2/21/2011y 1: z*: yy-13717-654: Analysis of Software ArtifactsStatic AnalysiszyParsing Example;y : x;z : 1;while y 1 doz : z * y;y : y – 1 yTop-level production? : ;x: z2/21/20111 ;S1; S2What are the parts? whiley : xz : 1; while K17-654: Analysis of Software ArtifactsStatic Analysisy 1: z*: y38zyy119

Quick QuizDraw a parse tree for the function below. You can assume that the“for” statement is at the top of the parse tree.void copy bytes(char dest[], char source[], int n) {for (int i 0; i n; i)dest[i] source[i];}2/21/201117-654: Analysis of Software ArtifactsStatic Analysis39Matching AST against Bug Patterns AST Walker Analysis Walk the AST, looking for nodes of a particular typeCheck the immediate neighborhood of the node for a bug patternWarn if the node matches the patternSemantic grep Like grep, looking for simple patternsUnlike grep, consider not just names, but semantic structure ofAST Makes the analysis more preciseCommon architecture based on Visitors 2/21/2011class Visitor has a visitX method for each type of AST node XDefault Visitor code just descends the AST, visiting each nodeTo find a bug in AST element of type X, override visitX17-654: Analysis of Software ArtifactsStatic Analysis4020

Behavioral Patterns: Visitor Applicability Structure with many classes Want to perform operationsthat depend on classes Set of classes is stable Want to define newoperations Consequences Easy to add new operations Groups related behavior inVisitor Adding new elements ishard Visitor can store state Elements must exposeinterface2/21/201117-654: Analysis of Software ArtifactsStatic Analysis41Example: Shifting by more than 31 bitsclass BadShiftAnalysis extends VisitorvisitShiftExpression(ShiftExpression e) {if (type of e’s left operand is int)if (e’s right operand is a constant)if (value of constant 0 or 31)warn(“Shifting by less than 0 or morethan 31 is /201117-654: Analysis of Software ArtifactsStatic Analysis4221

Example: String concatenation in a loopclass StringConcatLoopAnalysis extends Visitorprivate int loopLevel 0;visitStringConcat(StringConcat e) {if (loopLevel 0)warn(“Performance issue: String concatenation in loop (useStringBuffer instead)”)super.visitStringConcat(e);// visits AST children}visitWhile(While e) {loopLevel ;super.visitWhile(e);// visits AST childrenloopLevel--;}// similar for other looping constructs2/21/201117-654: Analysis of Software ArtifactsStatic Analysis43Example Tool: FindBugs Origin: research project at U. Maryland Checks over 250 “bug patterns” Now freely available as open sourceStandalone tool, plugins for Eclipse, etc.Over 100 correctness bugsMany style issues as wellIncludes the two examples just shownFocus on simple, local checks Similar to the patterns we’ve seenBut checks bytecode, not AST Harder to write, but more efficient and doesn’t require 654: Analysis of Software ArtifactsStatic Analysis4422

Example FindBugs Bug Patterns Correct equals()Use of Closing streamsIllegal castsNull pointer dereferenceInfinite loopsEncapsulation problemsInconsistent synchronizationInefficient String useDead store to variable2/21/201117-654: Analysis of Software ArtifactsStatic Analysis45FindBugs Experiences Useful for learning idioms of Java Rules about libraries and interfaces e.g. equals() Customization is important Many warnings may be irrelevant, others may beimportant – depends on domain e.g. embedded system vs. web application Useful for pointing out things to examine 2/21/2011Not all are real defectsTurn off false positive warnings for future analyseson codebase17-654: Analysis of Software ArtifactsStatic Analysis4623

Outline Why static analysis?What is static analysis?How does static analysis work?AST AnalysisDataflow Analysis Control Flow Graph Representation Simple Flow Analysis: Zero/Null Values2/21/201117-654: Analysis of Software ArtifactsStatic Analysis47Motivation: Dataflow Analysis Catch interesting errors Non-local: x is null, x is written to y, y isdereferenced Optimize code Reduce run time, memory usageK Soundness required Safety-critical domain Assure lack of certain errorsCannot optimize unless it is proven safe Correctness comes before performance Automation required 2/21/2011Dramatically decreases costMakes cost/benefit worthwhile for far morepurposes17-654: Analysis of Software ArtifactsStatic Analysis4824

Dataflow analysis Tracks value flow through program Can distinguish order of operations Did you read the file after you closed it?Does this null value flow to that dereference?Differs from AST walker Walker simply collects information or checks patternsTracking flow allows more interesting properties Abstracts values Chooses abstraction particular to property Is a variable null?Is a file open or closed?Could a variable be 0?Where did this value come from?More specialized than Hoare logic 2/21/2011Hoare logic allows any property to be expressedSpecialization allows automation and soundness17-654: Analysis of Software ArtifactsStatic Analysis49Zero Analysis Could variable x be 0? Useful to know if you have an expression y/xIn C, useful for null pointer analysis Program semantics η maps every variable to an integer Semantic abstraction σ maps every variable to non zero (NZ), zero(Z), or maybe zero (MZ)Abstraction function for integers αZI : We may not know if a value is zero or not 2/21/2011αZI(0) ZαZI(n) NZ for all n 0Analysis is always an approximationNeed MZ option, too17-654: Analysis of Software ArtifactsStatic Analysis5025

Zero Analysis Exampleσ []x : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/201117-654: Analysis of Software ArtifactsStatic Analysis51Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x αZI(10)]17-654: Analysis of Software ArtifactsStatic Analysis5226

Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x NZ]17-654: Analysis of Software ArtifactsStatic Analysis53Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x NZ]σ [x NZ,y σ(x)]17-654: Analysis of Software ArtifactsStatic Analysis5427

Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x NZ]σ [x NZ,y NZ]17-654: Analysis of Software ArtifactsStatic Analysis55Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x NZ]σ [x NZ,y NZ]σ [x NZ,y NZ,z αZI(0)]17-654: Analysis of Software ArtifactsStatic Analysis5628

Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x NZ]σ [x NZ,y NZ]σ [x NZ,y NZ,z Z]17-654: Analysis of Software ArtifactsStatic Analysis57Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x NZ]σ [x NZ,y NZ]σ [x NZ,y NZ,z Z]σ [x NZ,y NZ,z Z]17-654: Analysis of Software ArtifactsStatic Analysis5829

Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x NZ]σ [x NZ,y NZ]σ [x NZ,y NZ,z Z]σ [x NZ,y NZ,z Z]σ [x NZ,y NZ,z Z]17-654: Analysis of Software ArtifactsStatic Analysis59Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x NZ]σ [x NZ,y NZ]σ [x NZ,y NZ,z Z]σ [x NZ,y NZ,z Z]σ [x NZ,y NZ,z Z]σ [x NZ,y MZ,z Z]17-654: Analysis of Software ArtifactsStatic Analysis6030

Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x NZ]σ [x NZ,y NZ]σ [x NZ,y NZ,z Z]σ [x NZ,y NZ,z Z]σ [x NZ,y NZ,z Z]σ [x NZ,y MZ,z Z]σ [x NZ,y MZ,z NZ]17-654: Analysis of Software ArtifactsStatic Analysis61Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x NZ]σ [x NZ,y NZ]σ [x NZ,y NZ,z Z]σ [x NZ,y MZ,z MZ]σ [x NZ,y NZ,z Z]σ [x NZ,y MZ,z Z]σ [x NZ,y MZ,z NZ]17-654: Analysis of Software ArtifactsStatic Analysis6231

Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x NZ]σ [x NZ,y NZ]σ [x NZ,y NZ,z Z]σ [x NZ,y MZ,z MZ]σ [x NZ,y MZ,z MZ]σ [x NZ,y MZ,z Z]σ [x NZ,y MZ,z NZ]17-654: Analysis of Software ArtifactsStatic Analysis63Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;2/21/2011σ []σ [x NZ]σ [x NZ,y NZ]σ [x NZ,y NZ,z Z]σ [x NZ,y MZ,z MZ]σ [x NZ,y MZ,z MZ]σ [x NZ,y MZ,z MZ]σ [x NZ,y MZ,z NZ]17-654: Analysis of Software ArtifactsStatic Analysis6432

Zero Analysis Examplex : 10;y : x;z : 0;while y -1 dox : x / y;y : y-1;z : 5;σ []σ [x NZ]σ [x NZ,y NZ]σ [x NZ,y NZ,z Z]σ [x NZ,y MZ,z MZ]σ [x NZ,y MZ,z MZ]σ [x NZ,y MZ,z MZ]σ [x NZ,y MZ,z NZ]Nothing more happens!2/21/201117-654: Analysis of Software ArtifactsStatic Analysis65Zero Analysis Termination The analysis values will not change, no matter howmany times we execute the loop 2/21/2011Proof: our analysis is deterministicWe run through the loop with the current analysis values,none of them changeTherefore, no matter how many times we run the loop, theresults will remain the sameTherefore, we have computed the dataflow analysis resultsfor any number of loop iterations17-654: Analysis of Software ArtifactsStatic Analysis6633

Zero Analysis Termination The analysis values will not change, no matter howmany times we execute the loop Proof: our analysis is deterministicWe run through the loop with the current analysis values,none of them changeTherefore, no matter how many times we run the loop, theresults will remain the sameTherefore, we have computed the dataflow analysis resultsfor any number of loop iterationsWhy does this work If we simulate the loop, the data values could (in principle)keep changing indefinitely There are an infinite number of data values possibleNot true for 32-bit integers, but might as well be true Counting to 232 is slow, even on today’s processorsDataflow analysis only tracks 2 possibilities!So once we’ve explored them all, nothing more will changeThis is the secret of abstractionWe will make this argument more precise later2/21/201117-654: Analysis of Software ArtifactsStatic Analysis67Using Zero Analysis Visit each division in the programGet the results of zero analysis for the divisorIf the results are definitely zero, report an errorIf the results are possibly zero, report a warning2/21/201117-654: Analysis of Software ArtifactsStatic Analysis6834

Quick Quiz Fill in the table to show how what information zeroanalysis will compute for the function given.Program StatementAnalysis Info after that statement0: beginning of program 1: x : 02: y : 13: if (z 0)4:x : x y5: else y : y – 16: w : y2/21/201117-654: Analysis of Software ArtifactsStatic Analysis69Outline Why static analysis?What is static analysis?How does static analysis work?AST AnalysisDataflow AnalysisFurther Examples and Discussion2/21/201117-654: Analysis of Software ArtifactsStatic Analysis7035

Static Analysis Definition Static program analysis is the systematic examinationof an abstraction of a program’s state space Simple model checking for data races Data Race defined:[From Savage et al., Eraser: A Dynamic Data Race Detector forMultithreaded Programs] Two threads access the same variableAt least one access is a writeNo explicit mechanism prevents the accesses from beingsimultaneousAbstraction Program counter of each thread, state of each lock Abstract away heap and program variablesSystematic Examine all possible interleavings of all threads Flag error if no synchronization between accessesExploration is exhaustive, since abstract state abstracts all concreteprogram state2/21/201117-654: Analysis of Software ArtifactsStatic Analysis71Model Checking for Data Racesthread1() {read x;}thread2() {lock();write x;unlock();}Thread 1Thread 2read xlockwrite xunlockInterleaving 1: OK2/21/201117-654: Analysis of Software ArtifactsStatic Analysis7236

Model Checking for Data Racesthread1() {read x;}thread2() {lock();write x;unlock();}Thread 1Thread 2lockwrite xunlockread xInterleaving 1: OKInterleaving 2: OK2/21/201117-654: Analysis of Software ArtifactsStatic Analysis73Model Checking for Data Racesthread1() {read x;}thread2() {lock();write x;unlock();}Thread 1Thread 2lockread xwrite xunlockInterleaving 1: OKInterleaving 2: OKInterleaving 3: Race2/21/201117-654: Analysis of Software ArtifactsStatic Analysis7437

Model Checking for Data RacesThread 1thread1() {read x;}thread2() {lock();write terleaving2/21/2011Thread 2lockwrite xread xunlock1: OK2: OK3: Race4: Race17-654: Analysis of Software ArtifactsStatic Analysis75Compare Analysis to Testing, Inspection Why might it be hard to test/inspect for: Null pointer errors? Forgetting to re-enable interrupts? Race conditions?2/21/201117-654: Analysis of Software ArtifactsStatic Analysis7638

Compare Analysis to Testing, Inspection Null Pointers, Interrupts Testing Errors typically on uncommon paths or uncommon inputDifficult to exercise these pathsInspection Non-local and thus easy to miss Object allocation vs. dereferenceDisable interrupts vs. return statement Finding Data Races Testing Cannot force all interleavingsInspection Too many interleavings to considerCheck rules like “lock protects x” instead 2/21/2011But checking is non-local and thus easy to miss a case17-654: Analysis of Software ArtifactsStatic Analysis77Sound Analyses A sound analysis never misses an error [of the relevant error category]No false negatives (missed errors)Requires exhaustive exploration of state spaceInductive argument for soundness 2/21/2011Start program with abstract state for all possible initialconcrete statesAt each step, ensure new abstract state covers all concretestates that could result from executing statement on anyconcrete state from previous abstract stateOnce no new abstract states are reachable, by induction allconcrete program executions have been considered17-654: Analysis of Software ArtifactsStatic Analysis7839

Soundness and PrecisionProgram state covered in actual executionunsound(false negative)imprecise(false positive)Program state covered by abstractexecution with analysis2/21/201117-654: Analysis of Software ArtifactsStatic Analysis79Soundness and PrecisionProgram state covered in actual executionunsound(false negative)imprecise(false positive)Program state covered by abstractexecution with analysis2/21/201117-654: Analysis of Software ArtifactsStatic Analysis8040

Abstraction and Soundness Consider “Sound Testing” [testing that finds every bug]Requires executing program on every input (and on all interleavings of threads)Infinite number of inputs for realistic programs Therefore impossible in practice Abstraction 2/21/2011Infinite state space finite set of statesCan achieve soundness by exhaustive exploration17-654: Analysis of Software ArtifactsStatic Analysis81Zero Analysis Precision1.2.3.4.5.void foo(unsigned n) {int x -1;x x 2;int y 10/x;}What will be the result of staticanalysis?Path 1 (after stmt):1: 2: x NZ3: x MZwarning: possible divide by zero atline 4False positive! (not a real error)What went wrong? Before statement 3 we only knowx is nonzero We need to know that x is -12/21/201117-654: Analysis of Software ArtifactsStatic Analysis8241

Regaining Zero Analysis Precision Keep track of exact value of variables Infinite states or 232, close enough Add a -1 state Not general enough Track formula for every variable Undecidable for arbitrary formulas Track restricted formulas Decent solution in practice 2/21/2011Presburger arithmetic17-654: Analysis of Software ArtifactsStatic Analysis83Analysis as an Approximation Analysis must approximate in practice May report errors where there are really none False positives May not report errors that really exist All analysis tools have either false negatives or falsepositives False negatives Approximation strategy Find a pattern P for correct code which is feasible to check (analysis terminates quickly),covers most correct code in practice (low false positives),which implies no errors (no false negatives) Analysis can be pretty good in practice Many tools have low false positive/negative ratesA sound tool has no false negatives 2/21/2011Never misses an error in a category that it checks17-654: Analysis of Software ArtifactsStatic Analysis8442

Attribute-Specific Analysis Analysis is specific to A quality attribute Race conditionBuffer overflow, divide by zeroUse after freeA strategy for verifying that attribute Protect each shared piece of data with a lockPresburger arithmetic decision procedure for arrayindexes, zero analysisOnly one variable points to each memory location Analysis is inappropriate for some attributes Approach to assurance is ad-hoc and follows noclear patternNo known decision procedure for checking anassurance pattern that is followedExamples?2/21/201117-654: Analysis of Software ArtifactsStatic Analysis85Soundness Tradeoffs Sound Analysis Assurance that nobugs are left Of the target errorclass Can focus otherQA resources onother errors May have morefalse positives2/21/2011 Unsound Analysis No assurance thatbugs are gone Must still applyother QAtechniques May have fewerfalse positives17-654: Analysis of Software ArtifactsStatic Analysis8643

Which to Choose? Cost/Benefit tradeoff Benefit: How valuable is the bug? How much does it cost if not found?How expensive to find using testing/inspection?Cost: How much did the analysis cost? Effort spent running analysis, interpreting results –includes false positivesEffort spent finding remaining bugs (for unsound analysis) Rule of thumb For critical bugs that testing/inspection can’t find, asound analysis is worth it As long as false positive rate is acceptableFor other bugs, maximize engineer productivity2/21/201117-654: Analysis of Software ArtifactsStatic Analysis8717-654: Analysis of Software ArtifactsStatic Analysis88Questions?2/21/201144

Additional Slides/Examples2/21/201117-654: Analysis of Software ArtifactsStatic Analysis89Static Analysis Definition Static program analysis is the systemati

Empirical Results on Static Analysis Nortel study [Zheng et al. 2006] 3 C/C projects 3 million LOC total Early generation static analysis tools Conclusions Cost per fault of static analysis 61-72% compared to inspections Effectively finds assignment, checking faults Can be used to find potential security .