Expected Value The Expected Value Of A Random Variable .

Transcription

Expected ValueThe expected value of a random variable indicates its weighted average.Ex. How many heads would you expect if you flipped a coin twice?X number of heads {0,1,2}p(0) 1/4, p(1) 1/2, p(2) 1/4Weighted average 0*1/4 1*1/2 2*1/4 1Draw PDFDefinition: Let X be a random variable assuming the values x1, x2, x3, . withcorresponding probabilities p(x1), p(x2), p(x3),. The mean or expected value of X isdefined by E(X) sum xk p(xk).Interpretations:(i) The expected value measures the center of the probability distribution - center of mass.(ii) Long term frequency (law of large numbers we’ll get to this soon)

Expectations can be used to describe the potential gains and losses from games.Ex. Roll a die. If the side that comes up is odd, you win the equivalent of that side. If itis even, you lose 4.Let X your earningsX 1X 3X 5X -4P(X 1) P({1}) 1/6P(X 1) P({3}) 1/6P(X 1) P({5}) 1/6P(X 1) P({2,4,6}) 3/6E(X) 1*1/6 3*1/6 5*1/6 (-4)*1/2 1/6 3/6 5/6 – 2 -1/2Ex. Lottery – You pick 3 different numbers between 1 and 12. If you pick all the numberscorrectly you win 100. What are your expected earnings if it costs 1 to play?Let X your earningsX 100-1 99X -1P(X 99) 1/(12 3) 1/220P(X -1) 1-1/220 219/220E(X) 100*1/220 (-1)*219/220 -119/220 -0.54

Expectation of a function of a random variableLet X be a random variable assuming the values x1, x2, x3, . with correspondingprobabilities p(x1), p(x2), p(x3),. For any function g, the mean or expected value of g(X)is defined by E(g(X)) sum g(xk) p(xk).Ex. Roll a fair die. Let X number of dots on the side that comes up.Calculate E(X2).E(X2) sum {i 1} {6} i2 p(i) 12 p(1) 22 p(2) 32 p(3) 42 p(4) 52 p(5) 62 p(6) 1/6*(1 4 9 16 25 36) 91/6E(X) is the expected value or 1st moment of X.E(Xn) is called the nth moment of X.Calculate E(sqrt(X)) sum {i 1} {6} sqrt(i) p(i)Calculate E(eX) sum {i 1} {6} ei p(i)(Do at home)Ex. An indicator variable for the event A is defined as the random variable that takes onthe value 1 when event A happens and 0 otherwise.IA 1 if A occurs0 if AC occursP(IA 1) P(A) and P(IA 0) P(AC)The expectation of this indicator (noted IA) is E(IA) 1*P(A) 0*P(AC) P(A).One-to-one correspondence between expectations and probabilities.If a and b are constants, then E(aX b) aE(X) bProof: E(aX b) sum [(axk b) p(xk)] a sum{xk p(xk)} b sum{p(xk)} aE(X) b

VarianceWe often seek to summarize the essential properties of a random variable in as simpleterms as possible.The mean is one such property.Let X 0 with probability 1Let Y -2 with prob. 1/3-1 with prob. 1/61 with prob. 1/62 with prob. 1/3Both X and Y have the same expected value, but are quite different in other respects. Onesuch respect is in their spread. We would like a measure of spread.Definition: If X is a random variable with mean E(X), then the variance of X, denoted byVar(X), is defined by Var(X) E((X-E(X))2).A small variance indicates a small spread.Var(X) E(X2) - (E(X)) 2Var(X) E((X-E(X))2) sum (x- E(X))2 p(x) sum (x2-2x E(X) E(X)2) p(x) sum x2 p(x) -2 E(X) sum xp(x) E(X)2 sum p(x) E(X2) -2 E(X)2 E(X)2 E(X2) - E(X)2Ex. Roll a fair die. Let X number of dots on the side that comes up.Var(X) E(X2) - (E(X)) 2E(X2) 91/6E(X) 1/6(1 2 3 4 5 6) 21/6 7/2Var(X) 91/6 – (7/2) 2 91/6 – 49/4 (182-147)/12 35/12

If a and b are constants then Var(aX b) a2Var(X)E(aX b) a E(X) bVar(aX b) E[(aX b –(a E(X) b))2] E(a2(X– E(X))2) a2E((X– E(X))2) a2Var(X)The square root of Var(X) is called the standard deviation of X.SD(X) sqrt(Var(X)): measures scale of X.Means, modes, and mediansBest estimate under squared loss: meani.e., the number m that minimizes E[(X-m) 2] is m E(X). Proof: expand anddifferentiate with respect to m.Best estimate under absolute loss: median.i.e., m median minimizes E[ X-m ]. Proof in book. Note that median is nonunique ingeneral.Best estimate under 1-1(X x) loss: mode. Ie, choosing mode maximizes probability ofbeing exactly right. Proof easy for discrete r.v.’s; a limiting argument is required forcontinuous r.v.’s, since P(X x) 0 for any x.

Moment Generating FunctionsThe moment generating function of the random variable X, denoted M X (t ) , is defined forall real values of t by, & e tx p ( x)!! xM X (t ) E (e tX ) # 'tx! % e f ( x)dx!"('if X is discrete with pmf p(x)if X is continuous with pdf f(x)The reason M X (t ) is called a moment generating function is because all the moments ofX can be obtained by successively differentiating M X (t ) and evaluating the result at t 0.First Moment:dddM X (t ) E (e tX ) E ( e tX ) E ( Xe tX )dtdtdtM ' X ( 0) E ( X )(For any of the distributions we will use we can move the derivative inside theexpectation).Second moment:M ' ' X (t ) dddM ' (t ) E ( Xe tX ) E ( ( Xe tX )) E ( X 2 e tX )dtdtdtM ' ' X (0) E ( X 2 )kth moment:M k X (t ) E ( X k e tX )M k X (0) E ( X k )

Ex. Binomial random variable with parameters n and p.Calculate M X (t ) :nn " %" n% knknn(kM X (t) E(e ) ) e ' p (1( p) ) '( pe t ) (1( p) n(k ( pe t 1( p)k# k&k 0k 0 # &tX!tk(M X ' (t ) n pe t 1 ! pn !1)pe t(M X ' ' (t ) n(n ! 1) pe t 1 ! p(n!2)E ( X ) M X ' (0) n pe 0 1 ! p(n !1)(( pe t ) 2 n pe t 1 ! pn !1)pe tpe 0 npE ( X 2 ) M X ' ' (t ) n(n ! 1) pe 0 1 ! pn!2)(( pe 0 ) 2 n pe 0 1 ! pn !1)pe 0 n(n ! 1) p 2 npVar ( X ) E ( X 2 ) ! E ( X ) 2 n(n ! 1) p 2 np ! (np ) 2 np (1 ! p )Later we’ll see an even easier way to calculate these moments, by using the fact that abinomial X is the sum of N i.i.d. simpler (Bernoulli) r.v.’s.

Fact: Suppose that for two random variables X and Y, moment generating functions existand are given by M X (t ) and M Y (t ) , respectively. If M X (t ) M Y (t ) for all values of t, thenX and Y have the same probability distribution.If the moment generating function of X exists and is finite in some region about t 0, thenthe distribution is uniquely determined.

Properties of ExpectationProposition:If X and Y have a joint probability mass function pXY(x,y), thenE ( g ( X , Y )) !! g ( x, y ) p XY ( x, y )xyIf X and Y have a joint probability density function fXY(x,y), then" "E ( g ( X , Y )) ! ! g ( x, y ) fXY( x, y )# "# "It is important to note that if the function g(x,y) is only dependent on either x or y theformula above reverts to the 1-dimensional case.Ex. Suppose X and Y have a joint pdf fXY(x,y). Calculate E(X)." """)"&E ( X ) ! ! xf XY ( x, y )dydx ! x'' ! f XY ( x, y )dy dx ! xf X ( x)dx# "# "#" ( #"#"%Ex. An accident occurs at a point X that is uniformly distributed on a road of length L. Atthe time of the accident an ambulance is at location Y that is also uniformly distributedon the road. Assuming that X and Y are independent, find the expected distance betweenthe ambulance and the point of the accident.Compute E( X-Y ).Both X and Y are uniform on the interval (0,L).1The joint pdf is f XY ( x, y ) 2 , 0 x L, 0 y L.LL L11E ( X " Y ) ! ! x " y 2 dydx 2LL0 0LxL L! ! x " y dydx0 0LxL'' y2 y2 x!y dy (x!y)dy (y!x)dy xy! %"% ! yx "(0(0(x2 #o & 2&#x (x2 !x2L2x2L2) ! xL ! ( ! x 2 ) x 2 ! xL22221E ( X ' Y ) 2LL& L2#1 - xL2 x 3 x 22 ! x'xLdx ' .0 % 2!32L2 , 2"L*1L( 2)0 L& L3 L3 L3 # L ' !! 32" 3% 2

Expectation of sums of random variablesEx. Let X and Y be continuous random variables with joint pdf fXY(x,y). Assume thatE(X) and E(Y) are finite. Calculate E(X Y)." "E( X Y ) !" "! ( x y) f XY ( x, y)dxdy # "# "" ! xf!! xf XY ( x, y)dxdy # "# "" "! ! yfXY( x, y )dxdy# "# ""X( x)dx #"! yfY( y )dy E ( X ) E (Y )#"Same result holds in discrete case.Proposition: In general if E(Xi) are finite for all i 1, .n, thenE( X 1 K X n ) E( X 1 ) K E( X n ) .Proof: Use the example above and prove by induction.Let X1, . Xn be independent and identically distributed random variables havingdistribution function FX and expected value µ. Such a sequence of random variables issaid to constitute a sample from the distribution FX. The quantity X , defined bynXX ! i is called the sample mean.i 1 nCalculate E( X ).We know that E ( X i ) µ .nE ( X ) E (!i 1nXi11 n) E (! X i ) ! E ( X i ) µnn i 1n i 1When the mean of a distribution is unknown, the sample mean is often used in statistics toestimate it. (Unbiased estimate)

Ex. Let X be a binomial random variable with parameters n and p. X represents thenumber of successes in n trials. We can write X as follows:X X1 X 2 K X nwhere#1 if trial i is a successXi "! 0 if trial i is a failureThe Xi’s are Bernoulli random variables with parameter p.E ( X i ) 1 * p 0 * (1 ! p ) pE ( X ) E ( X 1 ) E ( X 2 ) K E ( X n ) npEx. A group of N people throw their hats into the center of a room. The hats are mixed,and each person randomly selects one. Find the expected number of people that selecttheir own hat.Let X the number of people who select their own hat.Number the people from 1 to N. Let#1 if the person i choses his own hatXi "otherwise! 0then X X 1 X 2 K X NEach person is equally likely to select any of the N hats, so P( X i 1) E( X i ) 1111 0(1 ! ) .NNNHence, E ( X ) E ( X 1 ) E ( X 2 ) K E ( X N ) N1 1N1.N

Ex. Twenty people, consisting of 10 married couples, are to be seated at five differenttables, with four people at each table. If the seating is done at random, what is theexpected number of married couples that are seated at the same table?Let X the number of married couples at the same table.Number then couples from 1 to 10 and let,#1 if couple i is seated at the same table.Xi "otherwise! 0Then X X 1 X 2 K X 10To calculate E(X) we need to know E ( X i ).Consider the table where husband i is sitting. There is room for three other people at histable. There are a total of 19 possible people which could be seated at his table.&1#&18 # !! !!1 23P( X i 1) % "% " .19&19 # !!%3"316 3E( X i ) 1 0 .1919 19Hence, E ( X ) E ( X 1 ) E ( X 2 ) K E ( X n ) 103 30 19 19

Proposition: If X and Y are independent, then for any functions h and g,E ( g ( X )h(Y )) E ( g ( X )) E (h(Y )) .Proof:" "E ( g ( X )h(Y )) !! g ( x)h( y) f XY ( x, y)dxdy # "# "" ! g ( x) f#"" "! ! g ( x ) h( y ) fX( x) f Y ( y )dxdy# "# ""X( x)dx ! h( y ) f Y ( y )dy E ( g ( X )) E (h(Y ))#"In fact, this is an equivalent way to characterize independence: ifE ( g ( X )h(Y )) E ( g ( X )) E (h(Y )) for any functions g(X) and h(Y) (but not any functionf(X,Y)), then X and Y are independent. To see this, just use indicator functions.Fact: The moment generating function of the sum of independent random variablesequals the product of the individual moment generating functions.Proof: M X Y (t ) E (e t ( X Y ) ) E (e tX e tY ) E (e tX ) E (e tY ) M X (t ) M Y (t )

Covariance and correlationPreviously, we have discussed the absence or presence of a relationship between tworandom variables, i.e. independence or dependence. But if there is in fact a relationship,the relationship may be either weak or strong.Ex.(a) Let X weight of a sample of waterY volume of the same sample of waterThere is an extremely strong relationship between X and Y.(b) Let X a persons weightY same persons weightThere is a relationship between X and Y, but not as strong as in (a).We would like a measure that can quantify this difference in the strength of a relationshipbetween two random variables.Definition: The covariance between X and Y, denoted by Cov(X,Y), is defined byCov ( X , Y ) E [( X ! E ( X ))(Y ! E (Y ))].Similarly as with the variance, we can rewrite this equation,Cov ( X , Y ) E [( X ! E ( X ))(Y ! E (Y ))] E [( XY ! E ( X )Y ! XE (Y ) E ( X ) E (Y ))] E ( XY ) ! E ( X ) E (Y ) ! E ( X ) E (Y ) E ( X ) E (Y ) E ( XY ) ! E ( X ) E (Y )Note that if X and Y are independent,Cov ( X , Y ) E ( XY ) ! E ( X ) E (Y ) E ( X ) E (Y ) ! E ( X ) E (Y ) 0 .The converse is however NOT true.Counter-Example: Define X and Y so that,P(X 0) P(X 1) P(X -1) 1/3andif X 0#0Y "if X 0!1X and Y are clearly dependent.XY 0 so we have that E(XY) E(X) 0, so Cov ( X , Y ) E ( XY ) ! E ( X ) E (Y ) 0 .

Proposition:(i) Cov ( X , Y ) Cov (Y , X )(ii) Cov ( X , X ) Var ( X )(iii) Cov (aX , Y ) aCov( X , Y )nmni 1j 1m(iv) Cov (! X i , ! Yi ) !! Cov ( X i , Y j )i 1 j 1Proof: (i) – (iii) Verify yourselves.(iv). Let µ i E ( X i ) and ! j E (Y j )mmmmi 1i 1i 1i 1Then E (! X i ) ! µ i and E (! Yi ) !" inmnmnm( n%( n%Cov (! X i , ! Yi ) E &(! X i " ! µ i )(! Yi " !) i )# E &! ( X i " µ i )! (Yi " ) i )#i 1j 1i 1j 1i 1j 1' i 1 ' i 1 n m( n m% n m E &!! ( X i " µ i )(Yi " ) i )# !! E (( X i " µ i )(Yi " ) i )) !! Cov ( X i , Y j )i 1 j 1' i 1 j 1 i 1 j 1nnProposition: Var(" X i ) "Var(X i ) 2" " Cov(X i , X j ) . In particular,i 1i 1i jV(X Y) V(X) V(Y) 2C(X,Y).Proof:!nnnnnnVar(" X i ) Cov(" X i , " X i ) " " Cov(X i , X j ) "Var(X i ) " " Cov(X i , X j )i 1i 1i 1i 1 j 1i 1i# jn "Var(X i ) 2" " Cov(X i , X j )i 1!i jnni 1i 1If X1, Xn are pairwise independent for i j, then Var (! X i ) ! Var ( X i ) .

Ex. Let X1, . Xn be independent and identically distributed random variables havingnXexpected value µ and variance σ2. Let X ! i be the sample mean. The randomi 1 n2n(X " X )variable S 2 ! iis called the sample variance.n "1i 1Calculate (a) Var( X ) and (b) E(S2).(a) We know that Var ( X i ) ! 2 .2nX'1 '1 Var ( X ) Var (( i ) % " Var (( X i ) % "&n#&n#i 1 ni 1n2 n(Var ( X i ) i 1!2.n(b) Rewrite the sum portion of the sample variance:nnni 1i 1" ( X i ! X ) 2 " ( X i ! µ µ ! X ) 2 " (( X i ! µ ) ! ( X ! µ )) 2i 1nnnni 1i 1 " ( X i ! µ ) 2 ( X ! µ ) 2 ! 2( X i ! µ )( X ! µ ) " ( X i ! µ ) 2 " ( X ! µ ) 2 ! " 2( X i ! µ )( X ! µ )i 1i 1nnni 1i 1i 1 " ( X i ! µ ) 2 n( X ! µ ) 2 ! 2( X ! µ )" ( X i ! µ ) " ( X i ! µ ) 2 n( X ! µ ) 2 ! 2( X ! µ )n( X ! µ )n " ( X i ! µ ) 2 ! n( X ! µ ) 2i 1E (S 2 ) 1 ( n11%E[( X i " µ ) 2 ] " nE (( X " µ ) 2 )# n! 2 " nVar ( X ) (n " 1)! 2 ! 2)&n " 1 ' i 1n "1 n "1[(The sample variance is an unbiased estimate of the variance)]

Ex. A group of N people throw their hats into the center of a room. The hats are mixed,and each person randomly selects one.Let X the number of people who select their own hat.Number the people from 1 to N. Let#1 if the person i choses his own hatXi "otherwise! 0then X X 1 X 2 K X nWe showed last time that E(X) 1.Calculate Var(X).nnVar(X) Var(" X i ) "Var(X i ) 2" " Cov(X i , X j )i 1i 1i jRecall that since each person is equally likely to select any of the N hats P( X i 1) !Hence,E( X i ) 1111 0(1 ! ) NNNand2E ( X i ) 12111 0(1 ! ) .NNN21 '1 N !1Var ( X i ) E ( X i ) ! ( E ( X i )) ! % " .N &N#N222Cov ( X i , X j ) E ( X i X j ) ! E ( X i ) E ( X j )#1Xi X j "!if both persons i and j choses their own hat0otherwiseP( X i 1, X j 1) P( X i 1 X j 1) P( X j 1) 1 1N N !11.N

E( X i X j ) 11 11 11 1 0(1 !) N N !1N N !1N N !12Cov ( X i , X j ) 1 11'1 !% " 2N N !1 & N #N ( N ! 1)Hence, Var ( X ) N !1 ' N 1N !1 1 2%% "" 2 1NNN& 2 # N ( N ! 1)

Definition: The correlation between X and Y, denoted by ρ(X,Y), is defined, as long asVar(X) and Var(Y) are positive, by! ( X ,Y ) Cov ( X , Y )Var ( X )Var (Y ).It can be shown that "1 # (X,Y ) # 1, with equality only if Y aX b (assuming E(X 2)and E(Y 2) are both finite). This is called the “Cauchy-Schwarz” inequality.Proof: It suffices to prove (E(XY)) 2 E(X 2)E(Y 2). The basic idea is to look at the! E[(aX bY) 2] and E[(aX-bY) 2]. We use the usual rules for adding andexpectationssubtracting variance:0 E[(aX bY) 2] a 2E(X 2) b 2E(Y 2) 2abE(XY)0 E[(aX-bY) 2] a 2E(X 2) b 2E(Y 2)-2abE(XY)Now let a 2 E(Y 2) and b 2 E(X 2). Then the above two inequalities read0 2a 2b 2 2abE(XY)0 2a 2b 2-2abE(XY);dividing by 2ab givesE(XY) -sqrt[E(X 2) E(Y 2)]E(XY) sqrt[E(X 2) E(Y 2)],and this is equivalent to the inequality "1 # (X,Y ) # 1. For equality to hold, eitherE[(aX bY) 2] 0 or E[(aX-bY) 2] 0, i.e., X and Y are linearly related with a negative orpositive slope, respectively.!The correlation coefficient is therefore a measure of the degree of linearity between Xand Y. If ρ(X,Y) 0 then this indicates no linearity, and X and Y are said to beuncorrelated.

Conditional ExpectationRecall that if X and Y are discrete random variables, the conditional mass function of X,given Y y, is defined for all y such that P(Y y) 0, byp X Y ( x y ) P( X x Y y ) p XY ( x, y ).pY ( y )Definition: If X and Y are discrete random variables, the conditional expectation of X,given Y y, is defined for all y such that P(Y y) 0, byE ( X Y y ) ! xP( X x Y y ) ! xp X Y ( x y ) .xxSimilarly, if X and Y are continuous random variables, the conditional pdf of X givenY y, is defined for all y such that f Y ( y ) 0 , byf X Y ( x y ) f XY ( x, y ).f Y ( y)Definition: If X and Y are continuous random variables, the conditional expectation of X,given Y y, is defined for all y such that f Y ( y ) 0 , byE ( X Y y) ""! xP( X x Y y) ! xf#"X Y( x y )dx .#"Conditional expectations are themselves random variables. The conditional expectationof X given Y y, is just the expected value on a reduced sample space consisting only ofoutcomes where Y y.E(X Y y) is a function of y.It is important to note that conditional expectations satisfy all the properties of regularexpectations:1. E [g ( X ) Y y ] ! g ( x) p X Y ( x y ) if X and Y discrete.x"2. E [g ( X ) Y y ] ! g ( x) f#"X Y( x y )dx if X and Y continuous.

& n# n3. E ' X i Y y ! ' E [X i Y y ]% i 1" i 1Proposition: E ( X ) E ( E ( X Y ))If Y is discrete E ( X ) E ( E ( X Y )) ! E ( X Y ) pY ( y )yIf Y is continuous E ( X ) E ( E ( X Y )) ! E ( X Y ) f Y ( y )Proof: (Discrete case)&#E ( E ( X Y )) ' E ( X Y ) pY ( y ) ' ' xp X Y ( x y )! pY ( y )yy % x"p ( x, y ) '' x XYpY ( y ) ' x ' p XY ( x, y ) ' xp X ( x) E ( X )pY ( y )yxxyx

Draw PDF Definition: Let X be a random variable assuming the values x 1, x 2, x 3, . with corresponding probabilities p(x 1), p(x 2), p(x 3),. The mean or expected value of X is . Ex. Roll a die. If the side that comes up is odd, you win the equivalent of tha