On Rounding Percentages Persi Diaconis; David Freedman .

Transcription

On Rounding PercentagesPersi Diaconis; David FreedmanJournal of the American Statistical Association, Vol. 74, No. 366. (Jun., 1979), pp. 359-364.Stable URL:http://links.jstor.org/sici?sici %3B2-EJournal of the American Statistical Association is currently published by American Statistical Association.Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available athttp://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtainedprior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content inthe JSTOR archive only for your personal, non-commercial use.Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/journals/astata.html.Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission.The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academicjournals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community takeadvantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org.http://www.jstor.orgFri Jul 13 16:18:25 2007

O n Rounding PercentagesPERSl DlACONlS and DAVID FREEDMAN*We assess the probability that a table of rounded percentages addsto 100 percent. This extends work of Mosteller, Youtz, and Zahn(1967) who found that the chance of rounding to 100 percent wasabout 5 with three categories, 4 with four categories, and ( 6 / c ) 4with a large number c of categories. We give a mathematical treatment of this phenomenon when the table is drawn from a multinomial distribution or from a mixture of multinomial distributions.We discuss the very different small-sample behavior and treatBenford's leading digit data as an example.KEY WORDS: Rounding error; Table of counts; Multinomial;Mixture of multinomials; Limit theorem; Leading digit.1. INTRODUCTIONI n this article we give a rigorous treatment of the mainmathematical issues raised by MYZ. To begin with, weconsider the multinomial model for generating tables. Dueto the law of large numbers the sample proportions roundessentially the same way as the theoretical probabilities.Thus, the chance of getting rounded sample proportionsthat total 1 tends to 0 or 1 as the sample size tends t oinfinity. The same type of behavior holds for many othermodels for generating a single table. This analysis isgiven in Section 2.Next consider a collection of many tables. Suppose thejth table is drawn from a multinomial distribution withtheoretical probability vector @ ( j ) (pl(j), . . ., p,( j))and sample size N , . I t is natural t o consider models inwhich the different p ( j ) ' s are randomly chosen from adistribution on the simplexSums of rounded proportions often fail t o add to 1. Forexample, consider Table 1. This table has three categories.The number of categories is deno-ted by c. The proportions are rounded t o the nearest .001 or 1 in 1,000. This1,000 is the rounding number n. I n Table 1, the roundedproportions add t o 1.001 instead of 1. The chance that( p i p i 2 0 , i 1 , 2 , ., c; C pi 1 ) .(1)rounded proportions add to 1 depends on the number ci lof categories, the rounding number n, and the probabilityIf the sample size in the jth table is large, the propormodel for generating the table. Our object in this articletionsin the jth sample will round in the same way asis to compute this chance.Pi.Thus,the number of tables that round t o 1 will beFailure to add to 1 occurs so frequently that if manyclosetotheprobability that the random vector p roundssums of proportions add t o exactly 1 in a reported set ofto1.tables, one begins to suspect the reporter of forcing theThe principal result of this article concerns the MYZproportions t o add to 1. An example is discussed inbroken-stickmodel in which a uniform distribution isSection 6. Sometimes altering the sample proportions inputonthesimplex(1). I t is convenient t o introduce thethis way matters: For instance, it can cause large changessymbolrn(x)fortheresult of rounding x t o the nearestin the chi-squared statistic, as shown by a n example inl/n.Section 6.I n their 1967 article, Mosteller, Youtz, and Zahn.5mm lm l xi--(MYZ) proposed several probability models for generat- If -,then r,(x) -- ;nnning tables. They computed the chance that the roundedmm.5mproportions add to 1. They concluded that the chance didif - x - - ,then r,(x) -- ;not depend very much on the rounding number but didnnndepend strongly on the number of categories : They foundm.5mthat with two categories the chance is 100 percent, with ifIthen r,(x) -nnthree categories the chance is 75 percent, with four categories the chance is 66.66 percent, and with a largem lor -accordingnumber c of categories the chance is about (6/nc)*.nAlthough persuasive and backed by extensive empiricalevidence (rounding behavior of 565 tables in the National as m is even or odd.A discussion of r,, (x) can be found in Wallis and RobertsHalothane Study), the MYZ argument was only heuristic.(1956, p. 175). I n the broken-stick model, the sum of theproportions C: l rn(pi) is a random variable* Persi Diaconis is Associate Professor, Department of Statistics,Stanford University, Stanford, CA 94305, and member, technical with possible values 1, 1 f l/n, 1 f 2/n, and so on. Westaff, Bell Laboratories, 1978-79. David Freedman is Professor of want to find the chance thatr,(pi) 1. T solveC Statistics, Department of Statistics, University of California,Berkeley, CA 94720. This work was supported in part by NSF GrantMPS 74-21416 and by the Energy Research and DevelopmentAdministration under Contract EY-76-C-03-0515and (secondauthor) by NSF Grant GP43085.359c; ,O Journal of the American Statistical AssociationJune 1979, Volume 74, Number 366Theory and Methods Section

Journal of the American Statistical Association, June 19793601. Distribution of White Families by Type,United States, 1970TypeNumber (1,000)ProportionHusband-WifeOther Male HeadFemale 1Source: Statistical Abstract, 1976, Table 53this problem it is convenient to introduce c - 1 mutuallyindependent random variables Vi, each of which is uniformly distributed over the interval [- -5, .5]. Theorem1 shows that when the rounding number n is large,CzC lr, (pi) - 1 has approximately the same distributionas frl(V1 . . . ITc-1)j /n. In particular, the chancethat the rounded proportions sum to 1 converges to thechance that -.5 E I:Vi .5. The second theoremshows that this last chance is approximately equal to(6/s(c - I)) .These theorems verify a conjecture on p.857 of MYZ. Theorem 1: Suppose pl, . . ., p, are uniformly distributed over the simplex (1). As the rounding number napproaches infinity, n {r ,(pi) - 1) converges indistribution to rl(C:z: Vi) where Vi are independent anduniformly distributed on [-.5, .5].Theorem 1will be proved in Section 3. The assumptionthat pl, . . ., p, are uniformly distributed over the simplex(1) is not critical. Any absolutely continuous distributionwill do, as shown in Section 4.When c is large, the Edgeworth expansion may be usedto approximate the distribution of rl(V1. . V,-I).constant multiple of l / n with probability tending to 1as N - t w .Similar results hold for many other models. Forexample, suppose XI, . . ., X, are independent Poissonvariables with parameters XI, . . ., A,. Let N Xi , Xi,and A Xi 1X i . If X i goes to infinity in such a way thatXi/A tends to a limit, a standard argument shows that(XIIN, . . ., Xc/N) tends to the limit of (Xl/h, . . ., X,/A)in probability.We have been ignoring one exceptional case that wediscuss in the multinomial model for definiteness. Suppose pl, . . ., p, is in the simplex ( I ) , the rounding numberis n, and npi - [npi] .5 for one or more i's. I f X i / Nexceeds pil we have to round up ; if Xi/N is below pi, wehave to round down. This introduces some randomnessinto the asymptotic distribution ofrn(Xi/N).An example will make the issue clear. Suppose c 3,p1 .35, p2 .4,5, pa .20, and n 10. The roundedpi's are .4, .4, .2, and sum to 1. For large N , X3/N isnearly .2 and rounds to .2; so the asymptotic behavior ofC?rlo(Xi/N) depends on whether Xi/N is just above orjust below pi for i 1, 2. The four possibilities are shownin Table 2.2. Four Cases: C rl,(XiIN)Asymptotically, the two variables . Theorem 2: Suppose V1, . . ., V,-1 are independent anduniformly distributed over [-.5, .5]. Let c - m andlet j 0 (c,). Then rl (V1 . . V,-1) j with probability (6/7r (c - 1)) e-'jj2/O(l/c#). In particular, - .5 V1 . . . Vc-l .5 with probability(6/n(c - l ) ) * O(l/c#). are jointly normal, with mean 0, variance 1, andcorrelationI n Section 5 we look a t the small-sample behavior ofthe modeIs discussed before. This behavior can be quitedifferent from what large-sample theory predicts. Thefinal section contains a n application of our analysis toBenford's (1938) leading digit data.The chance that the rounded proportions sum t o 1therefore converges to the chance that two centered normalvariables with correlation - .66 are of opposite sign.2. FIXED CELL PROBABILITY MODELSI n this model, the number of categories c is fixed,pl, . ., p, are uniformly distributed over the simplex(I), the rounding number n is large, and r,(x) denotes xrounded to the nearest l/n. The random variablesV1, . . . , V,-1 referred to in Theorem 1 may be taken asthe rounding errors, defined as follows, for 1 5 i 5 c - 1:First consider the model in which XI, Xz, . ., X, havea joint multinomial distribution with theoretical probabilities pl, p2, . . ., p, and sample size N. Then, by thelaw of large numbers as in Feller (1968, p. 152), the vectorof sample proportions (XIIN, . . ., X,/N) must concentrate in smaller and smaller neighborhoods of(pl, . . ., p,) as N 4 m. Thus, Ci,l rn(Xi/N) behaveslike the constant Cz,l rn(pi). There is one exceptionalcase that is discussed a t the end of this section. I t followsthat for fixed n and c, the rounded proportions add to a3. PROOF OF THE M A I N THEOREM I N THEBROKEN-STICK MODEL.So far, the Vi are neither independent nor uniform. Asis easily seen, however,Lemma 1: If pi is uniformly distributed over

Diaconis and Freedman: On Rounding Percentages[mln, ( m[-L l)/n],361then I-, i. uniformly distri1)ut doverL]21 2' These rourldirig errors L', are d f i n p donly for 17' - c -- I ; now the roundirig error for p, niu5t he con.qiderecl. Here i y a preliminary fact : u)/7z)m/n f r l ( u ) ' / Ifor integer m arid rrlalS o w (3.1) implie5r,,((m u.(3.2)Using relation i:3.2),1c-1rFL(pc) 1-CI),1Thus,1"1 - r l ( Z ITt)rn(pz)e-l11 c--1S u n l i relationrlg(3.1) for z 1 1 o c - 1 and the11adding t h e relation (3.3) t o this i u r l , tlitx trlnnl l n CE-' I T L cancel,. T h i i prove. Lemm la2.Lemmn 2 If I ,,(.r) i i J rouiided t o the neare5t 1 /7z, aridthe rounding error5 17, are defined i- y (3 l ) , then2;r r , ( p t ) 1 (l/?z) r l ( z ; - l ITt).So far, the ai5urnption t h a t the p,'s are uniformly distributed ha5 1lOt been ubed, it will h? needed t o cornputethe di.;tril)utioil of the I-,. Let ml, . . . , ?n,-1 1)e rlonnegative integer5 whose iurn 1s a t most 7z - (c - 1) L e til (ml, . . . , m,-l) 1)e the el eiit that Itecsll t h a t tlir: al'gtlrnent caorlnecting tlie distri1)utionof the roundtld p , t o t a l l e sof courits is as follows : If weohserve a rnultiilonlial vector2 7 2 , . . . , X,) drawnwit11 para111etr r.s(pl, p2, . . . , p,) aild .ITthen, l ecauseofthe laii- of large nunlher., the. sanlple proportions will beclose e iouglito the voctor of 71,'s so t h a t the sanlpleproportions rourld in the sanlc, way th:tt tllc' p,':; round.When the p,'s are (,hose11fro111the uniform distributioiiof the simplex (1), ariotlier ayjproacli is availal le.'Thedist'rii1utioi1 of (2-1, X2, . . . , Xc) averaged over the simplex (1) follows so-callrlti Rose-Ei steiristatistics (Feller1968, 1). 40; Hill 1970). Uiidrr Rose-Eiristciri statistics,all partitions of .I7into c parts arc. r:qually likely. Thus, thepossii lesample vt.c:t'ors are all tlie lattice poiiits ill thrlsirilples C," ,J , .Y, .r, 2 0. All points in this sirnplesare equally lil c,ly. Tht: points that i.ounc1 t o a fixedmultiple of the rouriding num1)er are contained in certainfixed rcgiorls of this simples. As S tends t o infinity, tlieproportion of points in a given region tends t o the area ofthe region. It is esseritially tliis area t h a t was co ilputediri Lenlnia 3. The approach we have follo b-( tiseemspreferal le 1)ecaus it leads t o thtl generalizatio lof thenext section.We conclude tliis sc,ctio iwith a proof of Tlleorein 2.I'roqf. Let m c - 1. Let g , , , ( ) I)c the )rol)aI)ilit\density of. . . i1 , S o t e t h a t f ( t71) I?( 0, arid v a r i l ' l ) -1- . Tile ICdgewortli rxpansion, a \in Section 16 4 of P'eller (1971), 5llowi that Ir tegratingfrom jwc find that,,-.5 t o j .i,and setting I/ j - .rLet -4 ,, be tl-ie uiiiori of the.? d (ml, . . . , m, l) 01er allchoicei of ml, . . . , ? n , The probal ilitvof A ,,tend to 1 a in a .I n fact, a geometric argullieiit t h a t we ornit showsthat t h e chance of A ,, is ( n ( n - 1 ) . . . (n - c2) ) /n . Theorernl1, then, can he proved h. demorlstratingt h a t given ,4, , t h e random variables 171, . . ., T7,-1 areconditionally independent and uniformly distributed over[- .5, .5]. I n fact, a little more is true :,. Lemrna 3: Givc'n 14 (ml, . . . , mC-I),the random variablesIT1,. . . , \',-I are conditionally independent and unifornllydistributed over [- .5, . 5 ] .The proof of Lemrna 3 is direct: The distribution of(pl, . . . , p,-1) is urlifornl over t h e regionc-12,LOforli:i c-1,C x , I l .(3.4)1) hollyAlso, the hypercube defirliiig L4 (ml, . . , W L -15 icontained in tlui region Therefore, given -4 (ml, . . . , m,-l),the first c - 1 of the p,'s are independent, each beinguniformly dl-tributed over its edge of the hvpercubeSam Lemma 1 con pletesthe proof. Tliiy con pletcbt h eargument for Theorerr1 1By assumption, j 0 ( n :: )by calculus,This completes tlie proof4. THE GENERALIZATION T O THE ABSOLUTELYCONTINUOUS CASETheorem 1 is generalized in tlle follouing 11aragraph.s.Theorcm 3 Let ,u be a prol ahility Illpasure on theiinlplex ( I ) Suppose p is abiolutely c o n t i n u o u wit11respect t o the uniforni distributio ion ( I ) . Therl, a s t h erounding number n a , the p distribution otr,,lpz) - 1 ) coI1vCrge5 ti) tho distribution of

362r l ( x i - ' V 1 ) ,t he 1', being independent and uniformly distributed over [- .5, .5].This can be proved by the argument of the previoussection because given t h a t the p,'s lie in a typical smallhypercube, their conditional distribution must be almostuniform. T o make this precise, it is conr enientto use theidea of Lebesgue points (Dunford and Sch vartz,1957,pp. 210-218).The distribution 0 of pl, . . . , p,-l is a probability in theregion (3.4) and is absolutely continuous with respect t o(c - 1)-dimensional Lebesgue measure A. As a result, itadmits a derivative f. By definition, x is a Lebesgue pointof f provided t h a t :as the (c - 1)-dimensional hypercube C shrinks t o x . AsLebesgue proved, almost all x have this property. Ofcourse, if 2 is a Lebesgue point for f, thenas the (c - 1)-dimensional hypercube C shrinks t o x.T o prove t h e argument, the following theorem will beuseful. Recall t h a t 0 is the distribution of p,, . . . , pc-1and f d0/dA.Theorem 4: Suppose f ( x ) 0 and x is a Lebesguepoint of f . Let the hypercube C shrink t o x. Let Ac bethe uniform distribution over C , and let Be be 0 conditioned on C. Let '1 I denote variation norm. Then10- - A c l 0.Proof: T h e following computation is standard.Journal of the American Statistical Association, June 1979Returning t o the argument for Theorem 3, let [r]denot,e the greatest integer in r, and define a subset B,,of the region (3.4) by the requirement t h a t Xi-' [nx,]n - (c - I ) . P o r x (21, . . . , 5 , - 1 ) , x E B , , , l e t Cl(x)be the (c - 1)-dimensional hypercube As Corollary 1 implies, /10-(,)- A c , , , - 0 as n for&almost all x. I n particular, if E 0, then for all sufficiently large n ,Now the argument for Theorern 1 applies almostverbatim. Because O c c , , - A c c T )is conhtant over thehypercube C(x), what (4.1) says is t h a t except for a setof hypercubes A ( m l , . . ., m,-l) of total probability E ,t h e conditional distribution of pl, . . . , pC-, is within c ofbeing uniform over il (ml, . . . . mCpl).This completes theargument for Theorem 3.Remark: There is a n easy L1 argument,: Let C run overa partition into hypercubes, thenwhere f- (y) 0((')/A (C) for y E C , then the right handside goes t o 0 as mesh iC) 0 : approximate f ln L1 by asmooth f*, and observe 1 1 f c - fc*I 1 f - f * .For some approximationr, the c.onditiona1 argumentmay not be useful. For a n y probability p on the simplexr,, ( p , ) - 1 ) may be( I ) , t h e exact distribution of n(Cfcornputed exartly, as follo vs.Let p* be the joint distribution of np, - [np,] for 1 t 5 c - 1,a probabilitymeasure on the unit cubein (c - 1)-dirne11sional space. For x (XI, . . . , .c,- ) Kc-l, letU,is) r n( 4 - .xi. Theorem 6: The ,LL distribution of n ( C E r,,(p,) - 11coincides wit11 t h e p* distribution of r l ( X ? - ' IT,).This is immediate from Lemma 1. The point is t h a t p*even for many singularwill be almost uniform overmeasures p. An example of the use of p* is given a t theend of the next section.5. SMALL-SAMPLE RESULTSWe shall say t h a t a statenlent is true for 0-almost allx if i t is true for a set of values of x t h a t has probability1 under 8.Corollary I : Under the conditions of Theorem 4,110 - hell - 0 as C shrinks t o x, for 6-almost all x.Corollary 1 is closely related t o the martingale proofsof the Radon-Nikodym Theorern. Sorne references t h a tmake the connection clear are Blackwell and Dubins(1962) and Rleyer (1966, p. 153).J I Y Z also consider the multinonlial model withp,, . . ., p, fixed rather t h a n random. On p. 856 of theirarticle, RlYZ seern t o assert t h a t with large sanlples themultinomial model behaves like the broken-stick niodel.The argument of Section 2 shows t h a t this cannot becorrect.For example, consider the trinomial with equally likclycells: c 3 and pl pz p3 9. For n 10, or a n yother decimal rounding, the rounded p's add t o one unitless t h a n 1 :4 . 3 and.3 .3 . 3 1 --&.If X I , Xz, X3 are the counts in this model, as N chance t h a t C; rlo(X,/lV) 1 must approach 0.a;,the

Diaconis and Freedman: O n Rounding Percentages4. Joint Distribution of p* When (X,,X,,X:,) Have aMultinomial Distribution With Parameters 100and (V3, 1/3, V3). All Entries ShouldBe Multiplied by 111,0003. Joint Distribution of (XI/ 100, Xz/ 100, X3/ 100)When (Xl,X2,X3) Have a Multinomial DistributionWith Parameters 100 and (4,-&,& 0101010101010101010101010101010101010t,his case amounts t'o the joint distribution of* All entries should be multiplied by 1/1,000.AlYZ's Table 3 shows, however, t h a t for samples ofsize N 1 t o 20, this chance is close t o 73 percent. Wenow explain this.We began by recomputing the table. T o our dismay,it checked perfectly. Some further numerical exploration, however, suggested a tentative answer: From thepoint of view of rounding calculations, the law of largenumbers works very slowly. Even when N 100 in thepreceding example, the distribution of (X1/100, X2/100,X3/100) is much closer to uniform than it is t o a pointmass a t , )-and P I C ; rlo(X,/100) l ] .74!The joint distribution of (X1/lOO, X2/100, X3/100) 1sshown in Table 3. The values of X1/lOO appear a t theleft of the table ; the values of Xz/100 are across the top.Of course, X3/100 1 - X1/lOO - X2/100. The corresponding probabilities are reported in the body of thetable, rounded t o integer multiples of 1/1,000. For instance, the chance t h a t X1/lOO .33 and Xz/100 .33-so t h a t X3/100 .34 and the rounded proportions addt o .9-is about 8/1,000. The chance that X1/100 .33and Xz/100 .36-sothat X3/100 .31 and therounded proportions add t o 1-is about 7/1,000. A zeroin the table means t h a t the corresponding chance is below.0003. For instance, the chance t h a t X1/lOO .23 andXz/100 .30 is shown as 0 ; in fact, it is .0004.Table 3 demonstrates t h a t (X1/lOO, Xz/100, X3/100)spreads out around (i,i, ) in a way t h a t really matterswhen rounding t o tenths. The discussion leading up t oTheorem 5 suggests examining the measure F*, which in(a,This distribution is shown in Table 4, the values of thefirst variable being given along the left edge, the value ofthe second variable across the top, arid the correspondingprobability in the body of the table, rounded t o a ninteger multiple of 1/1,000. I t is essentially uniform.I n this example, we have heen rounding t o tenths.When rounding t o halves, for instance, the spread inTable 3 would be relatively small; and C?r2jXi/100) 1with probability only 14 percent, compared with the 75percent predicted by the broken-stick model on p. 856of MYZ.6. AN EXAMPLEWhile investigating t,he behavior of leading digits int'ypical data, Benford (1938) (also see Diaconis 1977,Raimi 1976, and Ylvisaker 1977) collected a sample ofsize 20,229 from a total of 20 sources. These data arepresented in Table 5 . For example, Benford looked a t theareas of 335 rivers and found t h a t 31.0 percent of t,heareas began with 1, 16.4 percent began with 2, and so on.Each row in Table 5 adds t o 100 percent. How likelyis this? On the broken-stick model, the chance of a givenrow rounding t o 100 percent is approximately (6/8a)i- 2I . Numerical calculations show t h a t this approximaAtion is quite accurate. Assuming the rows are independent', the chance of all rows si nultaneouslyrounding to100 percent is ast,rononlically small. We conclude that,Benford's table does not follow the broken-stick modelor any of the probability models introduced in Sections2, 3, or 4. This raises the suspicion t h a t Benford manipulated t'he data t,o make the rows round properly. Thissuspicion is not hard t o verify. Consider the first row ofTable 5 . The percent,age of numbers with leading digit 7is reported as 5.5, with a total of 335 cases. The onlyproport'ions compatible with 5.5 are 18/335, which roundst o 5.4, or 19/335, which rounds t,o 5.7: There is no proportion possible t h a t rounds t o 5.5.The bott,om row of averages also rounds t o 100 percent.Direct calculation shows t h a t the entries in columns 3 and9 have been incorrectly rounded. Benford was trying t o

364Journal of the American Statistical Association, June 19795. Benford Data , AreaPopulationConstantsNewspapersSpec. HeatPressureH. P. LostMolecular WeightDrainageAtomic Weightn-I, d n , . . .DesignDigestCost DataX-ray VoltsAm. LeagueBlack BodyAddressesn1,n2,. . . ,n!Death .833.432.427.932.731 14.83.05.45.05.54.14.7QRSTAverageshow that J ,the proportion of numbers that begin withthe leading digit i, follows the theoretical leading digitlaw: pi 100 loglo (1 l l i ) . I t turns out that in bothcolumns 3 and 9, Benford incorrectly rounded toward thetheoretical proportions pi. For column 3, 12.26 wasrounded to 12.4. For column 9, 4.775 was rounded to 4.7.4.6.The theoretical percentages are pa s 12.5 and psChanges in rounded proportions t o make tables roundto 100 percent can affect the results of statistical testssuch as chi-square. The chi-squared statistic for goodnessof fit of c sample proportions , based on a sample size ofN to theoretical probability pi is x2 N( i - pi)'/pi. If the Iji did not sum to 1, then adjusting the f i i thatcorrespond to small pi can change the value of x2 appreciably for large N. Of course, it becomes easier tochange the value of x2 as the rounding number decreases.For example, consider Benford's data in Table 5. Theproportion of all 20,229 numbers that begin with a 1 canbe found by taking a weighted average of the proportionsin the first column. Doing this for each digit yieldsTable 6.Ylvisaker (1977) gives x2 from Table 6 as 85. To showthe effect of rounding, Table 7 gives the results of rounding the numbers in Table 6 to the nearest 1 percent. Thex2 statistic for goodness of fit of data to theory is approximately 192. Both rows of Table 7 add to 101 percent.If 1 percent is subtracted from the data row in theeighth position and 1 percent is subtracted from thetheory row in the seventh position so that both rows sum7. Numbers in Table 6 Rounded to Nearest 1 Percent-- x9 16. Proportion of Benford Data Beginning With DigitLeading i and Theoretical Proportions100 log ,,(l 1li) eory1293022018313124910588667756865955to 100 percent, the x2 statistic becomes approximately118. Thus, rounding to help the data fit the theory canmake a difference. This example also shows that it isimportant to calculate with many-digit accuracy whencomputing x2 for large sample sizes.[Received January 1978. Revised December 1978.1REFERENCESBenford, Frank (1938), " T h e Law o f Anomalous Numbers," Proceedings of the American Philosophical Society, 78, 551-572.Blackwell, David, and Dubins, Lester (1962), "Merging of OpinionsW i t h Increasing Information," Annals of. Mathematical Statistics,33, 882-886.Diaconis, Persi (1977), " T h e Distribution o f Leading Digits andUniform Distribution Mod 1," Annals of Probability, 5, 72-81.Dunford, Nelson, and Schwartz, Jacob (1957), Linear Operators,New Y o r k : Wiley Interscience.Feller, William (1968),A n Introduction to Probability Theory and ItsApplications, Vol. I (3rd ed.), New Y o r k : John Wiley & Sons.(1971),A n Introduction to Probability Theory and Its Applications, Vol. I1 (2nd ed.), New Y o r k : John W i l e y & Sons.Hill, Bruce (1970), "Zipf's Law and Prior Distributions for t h eComposition o f a Population," Journal of the Americal StatisticalAssociation, 65, 1220-1232.Meyer, Paul Andre (1966), Probability and Potentials, Mass.:Blaisdell.Mosteller, Frederick, Y o u t z , Cleo, and Zahn, Douglas (1967)," T h eDistribution of Sums of Rounded Percentages," Demography, 4,850-858.Raimi, Ralph (1976), " T h e First Digit Problem," American Mathematics Monthly, 83, 521-538.Wallis, Allen, and Roberts, Harley (1956), Statistics: A New Approach, Chicago : Free Press.Ylvisaker, Donald (1977),"Test Resistance," Journal of the AmericanStatistical Association, 72, 551-556.

Statistics, Department of Statistics, University of California, Berkeley, CA 94720. This work was supported in part by NSF Grant MPS 74-21416 and by the Energy Research and Development Administration under Contract EY-7