MATH 2P82 MATHEMATICAL STATISTICS (Lecture Notes)

Transcription

MATH 2P82MATHEMATICAL STATISTICS(Lecture Notes)c Jan Vrbik

2

3Contents1 PROBABILITY REVIEWBasic Combinatorics . . . . . . . . . . . . . .Binomial expansion . . . . . . . . . . .Multinomial expansion . . . . . . . . .Random Experiments (Basic Definitions)Sample space . . . . . . . . . . . . . . .Events . . . . . . . . . . . . . . . . . . .Set Theory . . . . . . . . . . . . . . . . .Boolean Algebra . . . . . . . . . . . . .Probability of Events . . . . . . . . . . . . .Probability rules . . . . . . . . . . . . .Important result . . . . . . . . . . . . .Probability tree . . . . . . . . . . . . . .Product rule . . . . . . . . . . . . . . . .Conditional probability . . . . . . . . .Total-probability formula . . . . . . . .Independence . . . . . . . . . . . . . . .Discrete Random Variables . . . . . . . . .Bivariate (joint) distribution . . . . .Conditional distribution . . . . . . . .Independence . . . . . . . . . . . . . . .Multivariate distribution . . . . . . . .Expected Value of a RV . . . . . . . . . . .Expected values related to X and Y .Moments (univariate) . . . . . . . . . .Moments (bivariate or ’joint’) . . . . .Variance of aX bY c . . . . . . . . .Moment generating function . . . . . . . .Main results . . . . . . . . . . . . . . . .Probability generating function . . . . . . .Conditional expected value . . . . . . . . .Common discrete distributions . . . . . . .Binomial . . . . . . . . . . . . . . . . . .Geometric . . . . . . . . . . . . . . . . .Negative Binomial . . . . . . . . . . . .Hypergeometric . . . . . . . . . . . . .Poisson . . . . . . . . . . . . . . . . . . 1414141515

4Multinomial . . . . . . . . . . . . . . . . .Multivariate Hypergeometric . . . . . .Continuous Random Variables . . . . . . . .Univariate probability density functionDistribution Function . . . . . . . . . . .Bivariate (multivariate) pdf . . . . . . .Marginal Distributions . . . . . . . . . .Conditional Distribution . . . . . . . . .Mutual Independence . . . . . . . . . . .Expected value . . . . . . . . . . . . . . .Common Continuous Distributions . . . . .Transforming Random Variables . . . . . . .Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . .(pdf). . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Transforming Random VariablesUnivariate transformation . . . . . . . . . . . . . .Distribution-Function (F ) Technique . . . .Probability-Density-Function (f ) TechniqueBivariate transformation . . . . . . . . . . . . . . .Distribution-Function Technique . . . . . . .Pdf (Shortcut) Technique . . . . . . . . . . .3 Random SamplingSample mean . . . . . . . . . . .Central Limit Theorem . .Sample variance . . . . . . . . .Sampling from N (µ, σ) . .Sampling without replacementBivariate samples . . . . . . . .4 Order StatisticsUnivariate pdf . .Sample median . .Bivariate pdf . . .Special Cases.5 Estimating Distribution ParametersA few definitions . . . . . . . . . . . .Cramér-Rao inequality . . . . . . . .Sufficiency . . . . . . . . . . . . . . . .Method of moments . . . . . . . . . .One Parameter . . . . . . . . . .Two Parameters . . . . . . . . .Maximum-likelihood technique . . .One Parameter . . . . . . . . . .Two-parameters . . . . . . . . 2333536.3737384041.45454750515253535455

56 Confidence IntervalsCI for mean µ . . . . . . . . . .σ unknown . . . . . . . . .Large-sample case . . . .Difference of two meansProportion(s) . . . . . . . . . .Variance(s) . . . . . . . . . . .σ ratio . . . . . . . . . . .7 Testing HypothesesTests concerning mean(s)Concerning variance(s) . .Concerning proportion(s)Contingency tables . . . .Goodness of fit . . . . . . .5757585858596060.6162636363638 Linear Regression and CorrelationSimple regression . . . . . . . . . . . . . . . .Maximum likelihood method . . . . . .Least-squares technique . . . . . . . . . .Normal equations . . . . . . . . . . . . .Statistical properties of the estimatorsConfidence intervals . . . . . . . . . . . .Correlation . . . . . . . . . . . . . . . . . . . .Multiple regression . . . . . . . . . . . . . . .Various standard errors . . . . . . . . . .656565656667697071739 Analysis of VarianceOne-way ANOVA . .Two-way ANOVA . .No interaction .With interaction.757576777810 Nonparametric TestsSign test . . . . . . . . . . . . . . . . . . . .Signed-rank test . . . . . . . . . . . . . . .Rank-sum tests . . . . . . . . . . . . . . . .Mann-Whitney . . . . . . . . . . . . .Kruskal-Wallis . . . . . . . . . . . . .Run test . . . . . . . . . . . . . . . . . . . .(Sperman’s) rank correlation coefficient.7979798080818183.

6

7Chapter 1 PROBABILITY REVIEWBasic CombinatoricsNumber of permutations of n distinct objects: n!Not all distinct, such as, for example aaabbc:µ¶66! def. 3!2!1!3, 2, 1orN!def. n1 !n2 !n3 !.nk !in general, where N kPµNn1 , n2 , n3 , ., nk¶ni which is the total word length (multinomial coef-i 1ficient).Selecting r out of n objects (without duplication), counting all possible arrangements:n!def. Prnn (n 1) (n 2) . (n r 1) (n r)!(number of permutations).Forget their final arrangement:Prnn!def. Crn r!(n r)!r!(number of combinations). This will also be called the binomial coefficient.If we can duplicate (any number of times), and count the arrangements:nrBinomial expansionn(x y) n ³ Xni 0Multinomial expansionn(x y z)iX µ n ¶xi y j z ki,j,ki,j,k 0i j k nn(x y z w) xn i y iXi,j,k, 0i j k nµni, j, k,¶xi y j z k wetc.Random Experiments (Basic Definitions)Sample spaceis a collection of all possible outcomes of an experiment.The individual (complete) outcomes are called simple events.

8Eventsare subsets of the sample space (A, B, C,.).Set TheoryThe old notion of:Universal set ΩElements of Ω (its individual ’points’)Subsets of ΩEmpty set is (are) now called:Sample spaceSimple events (complete outcomes)EventsNull eventWe continue to use the word intersection (notation: A B, representingthe collection of simple events common to both A and B ), union (A B, simpleevents belonging to either A or B or both), and complement (A, simple eventsnot in A ). One should be able to visualize these using Venn diagrams, but whendealing with more than 3 events at a time, one can tackle problems only with thehelp ofBoolean AlgebraBoth and (individually) are commutative and associative.Intersection is distributive over union: A (B C .) (A B) (A C) .Similarly, union is distributive over intersection: A (B C .) (A B) (A C) .Trivial rules: A Ω A, A , A A A, A Ω Ω, A A,A A A, A A , A A Ω, Ā A.Also, when A B (A is a subset of B, meaning that every element of A alsobelongs to B), we get: A B A (the smaller event) and A B B (the biggerevent).DeMorgan Laws: A B A B, and A B A B, or in generalA B C . A B C .and vice versa (i.e. ).A and B are called (mutually) exclusive or disjoint when A B (nooverlap).Probability of EventsSimple events can be assigned a probability (relative frequency of its occurrencein a long run). It’s obvious that each of these probabilities must be a non-negativenumber. To find a probability of any other event A (not necessarily simple), wethen add the probabilities of the simple events A consists of. This immediatelyimplies that probabilities must follow a few basic rules:Pr(A) 0Pr( ) 0Pr(Ω) 1(the relative frequency of all Ω is obviously 1).We should mention that Pr(A) 0 does not necessarily imply that A .

9Probability rulesPr(A B) Pr(A) Pr(B) but only when A B (disjoint). This implies thatPr(A) 1 Pr(A) as a special case.This also implies that Pr(A B) Pr(A) Pr(A B).For any A and B (possibly overlapping) we havePr(A B) Pr(A) Pr(B) Pr(A B)Can be extended to: Pr(A B C) Pr(A) Pr(B) Pr(C) Pr(A B) Pr(A C) Pr(B C) Pr(A B C).In generalPr(A1 A2 A3 . Ak ) kXi 1Pr(Ai ) kXi jPr(Ai Aj ) Pr(A1 A2 A3 . Ak )kXi j Pr(Ai Aj A ) .The formula computes the probability that at least one of the Ai events happens.The probability of getting exactly one of the Ai events is similarly computedby:kXi 1Pr(Ai ) 2kXi jPr(Ai Aj ) 3 k Pr(A1 A2 A3 . Ak )kXi j Pr(Ai Aj A ) .Important resultProbability of any (Boolean) expression involving events A, B, C, . can be alwaysconverted to a linear combination of probabilities of the individual events and theirsimple (non-complemented) intersections (A B, A B C, etc.) only.Probability treeis a graphical representation of a two-stage (three-stage) random experiment.(effectivelyits sample space - each complete path being a simple event).The individual branch probabilities (usually simple to figure out), are the socalled conditional probabilities.Product rulePr(A B) Pr(A) · Pr(B A)Pr(A B C) Pr(A) · Pr(B A) · Pr(C A B)Pr(A B C D) Pr(A) · Pr(B A) · Pr(C A B) · Pr(D A B C).Conditional probabilityThe general definition:Pr(B A) Pr(A B)Pr(A)All basic formulas of probability remain true. conditionally, e.g.: Pr(B A) 1 Pr(B A), Pr(B C A) Pr(B A) Pr(C A) Pr(B C A), etc.

10Total-probability formulaA partition represents chopping the sample space into several smaller events, sayA1 , A2 , A3 , ., Ak , so that they(i) don’t overlap (i.e. are all mutually exclusive): Ai Aj for any 1 i, j k(ii) cover the whole Ω (i.e. ’no gaps’): A1 A2 A3 . Ak Ω.For any partition, and an unrelated even B, we havePr(B) Pr(B A1 ) · Pr(A1 ) Pr(B A2 ) · Pr(A2 ) . Pr(B Ak ) · Pr(Ak )Independenceof two events is a very natural notion (we should be able to tell from the experiment): when one of these events happens, it does not effect the probability of theother. Mathematically, this is expressed by eitherPr(B A) P (B)or, equivalently, byPr(A B) Pr(A) · Pr(B)Similarly, for three events, their mutual independence meansPr(A B C) Pr(A) · Pr(B) · Pr(C)etc.Mutual independence of A, B, C, D, . implies that any event build of A, B,. must be independent of any event build out of C, D, . [as long as the two setsare distinct].Another important result is: To compute the probability of a Boolean expression (itself an event) involving only mutually independent events, it is sufficientto know the events’ individual probabilities.Discrete Random VariablesA random variable yields a number, for every possible outcome of a randomexperiment.A table (or a formula, called probability function) summarizing the information about1. possible outcomes of the RV (numbers, arranged from the smallest to thelargest)2. the corresponding probabilitiesis called the probability distribution.Similarly, distribution function: Fx (k) Pr(X k) computes cumulativeprobabilities.

11Bivariate (joint) distributionof two random variables is similarly specified via the corresponding probabilityfunctionf (i, j) Pr(X i Y j)with the range of possible i and j values. One of the two ranges is always ’marginal’(the limits are constant), the other one is ’conditional’ (i.e. both of its limits maydepend on the value of the other random variable).Based on this, one can always find the corresponding marginal distributionof X:Xfx (i) Pr(X i) f (i, j)j iand, similarly, the marginal distribution of Y.Conditional distributionof X, given an (observed) value of Y , is defined byfx (i Y j) Pr(X i Y j) Pr(X i Y j)Pr(Y j)where i varies over its conditional range of values (given Y j).Conditional distribution has all the properties of an ordinary distribution.Independenceof X and Y means that the outcome of X cannot influence the outcome of Y (andvice versa) - something we can gather from the experiment.This implies that Pr(X i Y j) Pr(X i) Pr(Y j) for every possiblecombination of i and jMultivariate distributionis a distribution of three of more RVs - conditional distributions can get rathertricky.Expected Value of a RValso called its mean or average, is a number which corresponds (empirically)to the average value of the random variable when the experiment is repeated,independently, infinitely many times (i.e. it is the limit of such ave

8 Events are subsets of the sample space (A,B,C,.). Set Theory The old notion of: is (are) now called: Universal set Ω Sample space Elements of Ω(its individual ’points’) Simple events (complete outcomes)