Lecture 5: Estimation

Transcription

Lecture 5: Estimation

Goals Basic concepts of estimation Statistical approaches for estimating parameters Parametric interval estimation Nonparametric interval estimation (bootstrap)

“Central Dogma” of csSampleInferential Statistics

Estimation Estimator: Statistic whose calculated value is usedto estimate a population parameter, " Estimate: A particular realization of an estimator, "ˆ! Types of Estimators:- point estimate: single number that can be regarded! as themost plausible value of "- interval estimate: a range of numbers, called a confidenceinterval indicating, can be regarded as likely containing thetrue value of "!

Properties of Good Estimators In the Frequentist world view parameters arefixed, statistics are rv and vary from sample tosample (i.e., have an associated sampling distribution) In theory, there are many potential estimators for apopulation parameter What are characteristics of good estimators?

Statistical Jargon for Good EstimatorsGood Estimators Are: Consistent: As the sample size increases "ˆ gets closer to "()lim P ˆ % & 0n "#! Unbiased: E["ˆ ] "!! Precise: Sampling distribution of "ˆ should have a small!standard error!

Bias Versus PrecisionPreciseBiasedUnbiasedImprecise

Methods of Point Estimation1. Method of Moments2. Maximum Likelihood3. Bayesian

Methods of Moments Advantage: simplest approach for constructing anestimator Disadvantage: usually are not the “best”estimators possible Principle:Equate the kth population moment E[Xk] with the kth samplemoment!1X ik"n nand solve for the unknown parameter

Method of Moments Example How can I estimate the scaled population mutation rate:" 4N e µ Brief (very brief) expose of coalescent theory:timeT2!Coalescent times follow a geometric distribution4NE[Ti ] i(i "1)T3T4nTc " iTi!i 2

Method of Moments ExamplenE[Tc ] " iE[Ti ]i 2!

Method of Moments Examplenn4NiE[Tc ] " iE[Ti ] "i(i #1)i 2i 2!

Method of Moments Examplenn4Ni1E[Tc ] " iE[Ti ] " 4N "i(i #1)i #1i 2i 2i 2!n

Method of Moments Examplennn4Ni1E[Tc ] " iE[Ti ] " 4N "i(i #1)i #1i 2i 2i 2E[Sn ] µE[Tc ]!!n1E[Sn ] µ 4N #i "1i 2n!!1E[Sn ] " i #1i 2mom!"ˆ Snn1 i #1i 2

Methods of Point Estimation1. Method of Moments2. Maximum Likelihood3. Bayesian

Introduction to Likelihood Before an experiment is performed the outcome is unknown.Probability allows us to predict unknown outcomes basedon known parameters:P(Data " ) For example:!nxxP(x n, p) ( ) p (1" p)!n"x

Introduction to Likelihood After an experiment is performed the outcome is known.Now we talk about the likelihood that a parameter wouldgenerate the observed data:L(" Data)L(" Data) P(Data " ) For example:!!nxxL( p n, x) ( ) p (1" p)n"x Estimation proceeds by finding the value of " that makes theobserved data most likely!!

Let’s Play T/F True or False: The maximum likelihood estimate (mle) of "gives us the probability of "ˆ False - why?!! True or False: The mle of " is the most likely value of "ˆ False - why?! True or False: Maximum likelihood is cool!

Formal Statement of ML Let x1, x2, , xn be a sequence of n observed variables Joint probability:P(x1, x2, , xn " ) P(X1 x1)P(X2 x2) P(Xn xn)n " P(X i x i )i 1! Likelihood is then:nL( " x1!, x2, , xn ) " P(X i x i )i 1n!Log L( " x1, x2, , xn ) " log[P(X i x i )]!i 1

MLE Example I want to estimate the recombination fraction between locusA and B from 5 heterozygous (AaBb) parents. I examine 30gametes for each and observe 4, 3, 5, 6, and 7 recombinantgametes in the five parents. What is the mle of therecombination fraction?Probability of observing X r recombinant gametes for a singleparent is binomial:nP(X r) ( r )" r (1# " ) n#r!

MLE Example: Specifying LikelihoodProbability:P(r1, r2, , rn " , n) P(R1 r1)P(R2 r2) P(R5 r5)nr1nnr1n#r1r1n#r2"(1#") "(1#") . P(r1, r2, , rn " , n) ( )(r )(r )" r1 (1# " ) n#r525!Likelihood:!5!L(" r1, r2, , rn , n) " (nr )# r (1 # ) n riiii 15Log L !nlog(" ri) ri log# (n ri )log(1 # )i 1!!

MLE Example: Maximizing the Likelihood Want to find p such that Log L is maximized5n logLog L " ( ri ) ri log # (n ri )log(1 # )i 1 How?1. !Graphically2. Calculus3. Numerically

MLE Example: Finding the mle of p"

Methods of Point Estimation1. Method of Moments2. Maximum Likelihood3. Bayesian

World View According to Bayesian’s The classic philosophy (frequentist) assumes parametersare fixed quantities that we want to estimate as preciselyas possible Bayesian perspective is different: parameters are randomvariables with probabilities assigned to particular valuesof parameters to reflect the degree of evidence for thatvalue

Revisiting Bayes TheoremP(B A)P(A)P(A B) P(B)n!P(B) " P(B Ai )P(Ai )ContinuousP(B) Discretei 1!!" P(B A)P(A)dA

Bayesian Estimation In order to make probability statements about " given some observeddata, D, we make use of Bayes theoremf (" ) f (D " )f (" )L(" D)f (" D) !f (D)#" f (" ) f (D " )d"!"# %&'"&(!()'*%)' "",(-(.&'"&!The prior is the probability of the parameter and represents what wasthought before seeing the data.The likelihood is the probability of the data given the parameter andrepresents the data now available.The posterior represents what is thought given both prior information andthe data just seen.

Bayesian Estimation: “Simple” Example I want to estimate the recombination fraction between locusA and B from 5 heterozygous (AaBb) parents. I examine 30gametes for each and observe 4, 3, 5, 6, and 7 recombinantgametes in the five parents. What is the mle of therecombination fraction? Tedious to show Bayesian analysis. Let’s simplify and ask whatthe recombination fraction is for parent three, who had 5observed recombinant gametes.

Specifying The Posterior Densityf (" n 30, r 5) f (" ) f (r 5 ", n 30)0.5#f (r 5 ", n 30) f (" ) d"0prior f (" ) uniform[0, 0.5] 0.5!likelihood P(r 5 ", n 30) !normalizingconstant!305(rin#ri"(1#"))0.5 # P(r 5 ", n 30) f (" ) d"00.5!305250.5 "(1")d" ! 6531 (5 ) #!0

Specifying The Posterior Densityf (" n 30, r 5) f (" ) f (r 5 ", n 30)0.5#f (r 5 ", n 30) f (" ) d"0!5250.5 (30"(1#"))5f (" n 30, r 5) 6531Ta da !f (" n 30, r 5)!"

Interval Estimation In addition to point estimates, we also want to understandhow much uncertainty is associated with it One option is to report the standard error Alternatively, we might report a confidence interval Confidence interval: an interval of plausible values forthe parameter being estimated, where degree of plausibilityspecifided by a “confidence level”

Interpreting a 95% CI We calculate a 95% CI for a hypothetical sample mean to bebetween 20.6 and 35.4. Does this mean there is a 95%probability the true population mean is between 20.6 and 35.4? NO! Correct interpretation relies on the long-rang frequencyinterpretation of probabilityµ Why is this so?

Þxed, statistics are rv and vary from sample to sample (i.e., have an associated sampling distribution) In theory, there are many potential estimators for a populatio