Chapter 3 Multiple Linear Regression Model The Linear Model

Transcription

Chapter 3Multiple Linear Regression ModelWe consider the problem of regression when the study variable depends on more than one explanatory orindependent variables, called a multiple linear regression model. This model generalizes the simple linearregression in two ways. It allows the mean function E ( y ) to depend on more than one explanatory variablesand to have shapes other than straight lines, although it does not allow for arbitrary shapes.The linear model:Let y denotes the dependent (or study) variable that is linearly related to k independent (or explanatory)variables X 1 , X 2 ,., X k through the parameters 1 , 2 ,., k and we writey X 1 1 X 2 2 . X k k .This is called the multiple linear regression model. The parameters 1 , 2 ,., k are the regressioncoefficients associated with X 1 , X 2 ,., X k respectively and is the random error component reflecting thedifference between the observed and fitted linear relationship. There can be various reasons for suchdifference, e.g., the joint effect of those variables not included in the model, random factors which can notbe accounted for in the model etc.Note that the j th regression coefficient j represents the expected change in y per unit change in the j thindependent variable X j . Assuming E ( ) 0, j E ( y ). X jLinear model:A model is said to be linear when it is linear in parameters. In such a case y E ( y ))(or equivalently j jshould not depend on any ' s . For example,i)y 0 1 X is a linear model as it is linear in the parameters.ii)y 0 X 1 can be written aslog y log 0 1 log Xy* 0* 1 x*which is linear in the parameter 0* and 1 , but nonlinear is variables y* log y, x* log x. So it isa linear model.Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur1

iii)y 0 1 X 2 X 2is linear in parameters 0 , 1 and 2 but it is nonlinear is variables X . So it is a linear modeliv)y 0 1X 2is nonlinear in the parameters and variables both. So it is a nonlinear model.v)y 0 1 X 2is nonlinear in the parameters and variables both. So it is a nonlinear model.vi)y 0 1 X 2 X 2 3 X 3is a cubic polynomial model which can be written asy 0 1 X 2 X 2 3 X 3which is linear in the parameters 0 , 1 , 2 , 3 and linear in the variables X 1 X , X 2 X 2 , X 3 X 3 .So it is a linear model.Example:The income and education of a person are related. It is expected that, on average, a higher level of educationprovides higher income. So a simple linear regression model can be expressed asincome 0 1 education .Not that 1 reflects the change in income with respect to per unit change in education and 0 reflects theincome when education is zero as it is expected that even an illiterate person can also have some income.Further, this model neglects that most people have higher income when they are older than when they areyoung, regardless of education. So 1 will over-state the marginal impact of education. If age and educationare positively correlated, then the regression model will associate all the observed increase in income with anincrease in education. So a better model isincome 0 1 education 2 age .Often it is observed that the income tends to rise less rapidly in the later earning years than is early years. Toaccommodate such a possibility, we might extend the model toincome 0 1education 2 age 3age 2 This is how we proceed for regression modeling in real-life situation. One needs to consider the experimentalcondition and the phenomenon before making the decision on how many, why and how to choose thedependent and independent variables.Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur2

Model set up:Let an experiment be conducted n times, and the data is obtained as follows:Observation numberResponseyExplanatory variablesX1 X 2 X k1y1x11x12 x1k2y2x21x22 x2 k yn xn1xn 2 xnknAssuming that the model isy 0 1 X 1 2 X 2 . k X k ,the n-tuples of observations are also assumed to follow the same model. Thus they satisfyy1 0 1 x11 2 x12 . k x1k 1y2 0 1 x21 2 x22 . k x2 k 2 yn 0 1 xn1 2 xn 2 . k xnk n .These n equations can be written as y1 1 y2 1 yn 1x11 x12 x1k 0 1 x21 x22 x2 k 1 2 xn1 xn 2 xnk k n or y X .In general, the model with k explanatory variables can be expressed asy X where y ( y1 , y2 ,., yn ) ' is a n 1 vector of n observation on study variable, x11 x12 x1k x21 x22 x2 k X xn1 xn 2 xnk is a n k matrix of n observations on each of the k explanatory variables, ( 1 , 2 ,., k ) ' is a k 1vector of regression coefficients and ( 1 , 2 ,., n ) ' is a n 1 vector of random error components ordisturbance term.If intercept term is present, take first column of X to be (1,1, ,1)’.Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur3

Assumptions in multiple linear regression modelSome assumptions are needed in the model y X for drawing the statistical inferences. The followingassumptions are made:(i)E ( ) 0(ii)E ( ') 2 I n(iii)Rank ( X ) k(iv)X is a non-stochastic matrix(v) N (0, 2 I n ) .These assumptions are used to study the statistical properties of the estimator of regression coefficients. Thefollowing assumption is required to study, particularly the large sample properties of the estimators.(vi) X 'Xlim n n exists and is a non-stochastic and nonsingular matrix (with finite elements). The explanatory variables can also be stochastic in some cases. We assume that X is non-stochastic unlessstated separately.We consider the problems of estimation and testing of hypothesis on regression coefficient vector under thestated assumption.Estimation of parameters:A general procedure for the estimation of regression coefficient vector is to minimizenni 1i 1 M ( i ) M ( yi xi1 1 xi 2 2 . xik k )for a suitably chosen function M .Some examples of choice of M areM ( x) xM ( x) x 2M ( x) x , in general.pWe consider the principle of least square which is related to M ( x) x 2 and method of maximum likelihoodestimation for the estimation of parameters.Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur4

Principle of ordinary least squares (OLS)Let B be the set of all possible vectors . If there is no further information, the B is k -dimensional realEuclidean space. The object is to find a vector b ' (b1 , b2 ,., bk ) from B that minimizes the sum of squareddeviations of i ' s, i.e.,nS ( ) i2 ' ( y X ) '( y X )i 1for given y and X . A minimum will always exist as S ( ) is a real-valued, convex and differentiablefunction. WriteS ( ) y ' y ' X ' X 2 ' X ' y .Differentiate S ( ) with respect to S ( ) 2X ' X 2X ' y 2 S ( ) 2 X ' X (atleast non-negative definite). 2The normal equation is S ( ) 0 X ' Xb X ' ywhere the following result is used:Result: If f ( z ) Z ' AZ is a quadratic form, Z is a m 1 vector and A is any m m symmetric matrixthen F ( z ) 2 Az . zSince it is assumed that rank ( X ) k (full rank), then X ' X is a positive definite and unique solution of thenormal equation isb ( X ' X ) 1 X ' ywhich is termed as ordinary least squares estimator (OLSE) of .Since 2 S ( )is at least non-negative definite, so b minimize S ( ) . 2Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur5

In case, X is not of full rank, thenb ( X ' X ) X ' y I ( X ' X ) X ' X where ( X ' X ) is the generalized inverse of X ' X and is an arbitrary vector. The generalized inverse( X ' X ) of X ' X satisfiesX ' X ( X ' X ) X ' X X ' XX ( X ' X ) X ' X XX ' X ( X ' X ) X ' X 'Theorem:(i)Let ŷ Xb be the empirical predictor of y . Then ŷ has the same value for all solutions b ofX ' Xb X ' y.(ii)S ( ) attains the minimum for any solution of X ' Xb X ' y.Proof:(i) Let b be any member inb ( X ' X ) X ' y I ( X ' X ) X ' X .Since X ( X ' X ) X ' X X , so thenXb X ( X ' X ) X ' y X I ( X ' X ) X ' X X ( X ' X ) X ' ywhich is independent of . This implies that ŷ has the same value for all solution b of X ' Xb X ' y.(ii) Note that for any ,S ( ) y Xb X (b ) y Xb X (b ) ( y Xb) ( y Xb) (b ) X ' X (b ) 2(b ) X ( y Xb) ( y Xb) ( y Xb) (b ) X ' X (b )(Using X ' Xb X ' y ) ( y Xb) ( y Xb) S (b) y ' y 2 y ' Xb b ' X ' Xb y ' y b ' X ' Xb y ' y yˆ ' yˆ .Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur6

Fitted values:If ̂ is any estimator of for the model y X , then the fitted values are defined asŷ X ˆ where ̂ is any estimator of .In the case of ˆ b,yˆ Xb X ( X ' X ) 1 X ' y Hywhere H X ( X ' X ) 1 X ' is termed as Hat matrix which is(i)symmetric(ii)idempotent (i.e., HH H ) and(iii)tr H tr X ( X X ) 1 X ' tr X ' X ( X ' X ) 1 tr I k k .ResidualsThe difference between the observed and fitted values of the study variable is called as residual. It isdenoted ase y yˆ y yˆ y Xb y Hy (I H ) y Hywhere H I H .Note that(i)H is a symmetric matrix(ii)H is an idempotent matrix, i.e.,HH ( I H )( I H ) ( I H ) H and(iii)trH trI n trH (n k ).Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur7

Properties of OLSE(i) Estimation error:The estimation error of b isb ( X ' X ) 1 X ' y ( X ' X ) 1 X '( X ) ( X ' X ) 1 X ' (ii) BiasSince X is assumed to be nonstochastic and E ( ) 0E (b ) ( X ' X ) 1 X ' E ( ) 0.Thus OLSE is an unbiased estimator of .(iii) Covariance matrixThe covariance matrix of b isV (b) E (b )(b ) ' E ( X ' X ) 1 X ' ' X ( X ' X ) 1 ( X ' X ) 1 X ' E ( ') X ( X ' X ) 1 2 ( X ' X ) 1 X ' IX ( X ' X ) 1 2 ( X ' X ) 1.(iv) VarianceThe variance of b can be obtained as the sum of variances of all b1 , b2 ,., bk which is the trace of covariancematrix of b . ThusVar (b) tr V (b) k E (bi i ) 2i 1k Var (bi ).i 1Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur8

Estimation of 2The least-squares criterion can not be used to estimate 2 because 2 does not appear in S ( ) . SinceE ( i2 ) 2 , so we attempt with residuals ei to estimate 2 as follows:e y yˆ y X ( X ' X ) 1 X ' y [ I X ( X ' X ) 1 X '] y Hy.Consider the residual sum of squaresnSSr e s ei2i 1 e 'e ( y Xb) '( y Xb) y '( I H )( I H ) y y '( I H ) y y ' Hy.AlsoSS r e s ( y Xb) '( y Xb) y ' y 2b ' X ' y b ' X ' Xb y ' y b ' X ' y (Using X ' Xb X ' y )SSr e s y ' Hy (X )'H (X ) ' H (Using HX 0)Since N (0, 2 I ) , soy N ( X , 2 I ) . Hence y ' Hy 2 (n k ) .Thus E[ y ' Hy ] (n k ) 2or y ' Hy 2E n k or E MSr e s 2where MSr e s SSr e sis the mean sum of squares due to residual.n kThus an unbiased estimator of 2 is ˆ 2 MSr e s s 2 (say)which is a model-dependent estimator.Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur9

Variance of ŷThe variance of ŷ isV ( yˆ ) V ( Xb) XV (b) X ' 2 X ( X ' X ) 1 X ' 2H.Gauss-Markov Theorem:The ordinary least squares estimator (OLSE) is the best linear unbiased estimator (BLUE) of .Proof: The OLSE of isb ( X ' X ) 1 X ' ywhich is a linear function of y . Consider the arbitrary linear estimatorb* a ' yof linear parametric function ' where the elements of a are arbitrary constants.Then for b* ,E (b* ) E (a ' y ) a ' X and so b* is an unbiased estimator of ' whenE (b* ) a ' X ' a ' X '.Since we wish to consider only those estimators that are linear and unbiased, so we restrict ourselves tothose estimators for which a ' X '.FurtherVar (a ' y ) a 'Var ( y )a 2 a ' aVar ( ' b) 'Var (b) 2 a ' X ( X ' X ) 1 X ' a.ConsiderVar (a ' y ) Var ( ' b) 2 a ' a a ' X ( X ' X ) 1 X ' a 2 a ' I X ( X ' X ) 1 X ' a 2 a '( I H )a.Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur10

Since ( I H ) is a positive semi-definite matrix, soVar (a ' y ) Var ( ' b) 0 .This reveals that if b* is any linear unbiased estimator then its variance must be no smaller than that of b .Consequently b is the best linear unbiased estimator, where ‘best’ refers to the fact that b is efficient withinthe class of linear and unbiased estimators.Maximum likelihood estimation:In the model, y X , it is assumed that the errors are normally and independently distributed withconstant variance 2 or N (0, 2 I ).The normal density function for the errors isf ( i ) 1 1 exp 2 i2 2 2 i 1, 2,., n. .The likelihood function is the joint density of 1 , 2 ,., n given asnL( , 2 ) f ( i )i 1 1 n 2 exp 2 2 i (2 2 ) n /2i 1 1 1 exp 2 ' 2 n /2(2 ) 2 1 1 exp 2 ( y X ) '( y X ) .(2 ) 2 Since the log transformation is monotonic, so we maximize ln L( , 2 ) instead of L( , 2 ) . 12 n /21nln L( , 2 ) ln(2 2 ) 2 ( y X ) '( y X ) .22 The maximum likelihood estimators (m.l.e.) of and 2 are obtained by equating the first-orderderivatives of ln L( , 2 ) with respect to and 2 to zero as follows: ln L( , 2 )1 2 X '( y X ) 0 2 2 ln L( , 2 )n1 2 ( y X ) '( y X ).2 2 2( 2 ) 2The likelihood equations are given byX 'X X 'y1 2 ( y X ) '( y X ).nRegression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur11

Since rank( X ) k , so that the unique m.l.e. of and 2 are obtained as ( X ' X ) 1 X ' y1n 2 ( y X ) '( y X ).Further to verify that these values maximize the likelihood function, we find 2 ln L( , 2 )1 2 X 'X2 2 ln L( , 2 )1n 6 ( y X ) '( y X )22 24 ( ) 2 2 ln L( , 2 )1 4 X '( y X ).2 Thus the Hessian matrix of second-order partial derivatives of ln L( , 2 ) with respect to and 2 is 2 ln L( , 2 ) 2 2 ln L( , 2 ) 2 2 ln L( , 2 ) 2 22 ln L( , ) 2 ( 2 ) 2 which is negative definite at and 2 2 . This ensures that the likelihood function is maximized atthese values.Comparing with OLSEs, we find that(i)OLSE and m.l.e. of are same. So m.l.e. of is also an unbiased estimator of .(ii)OLSE of 2 is s 2 which is related to m.l.e. of 2 as 2 n k 2s . So m.l.e. of 2 is anbiased estimator of 2 .Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur12

Consistency of estimators(i) Consistency of b : X 'X Under the assumption that lim exists as a nonstochastic and nonsingular matrix (with finiten n elements), we have 11 X 'X lim V (b) 2 lim n n n n 1 2 lim 1n n 0.This implies that OLSE converges to in quadratic mean. Thus OLSE is a consistent estimator of . Thisholds true for maximum likelihood estimators also.The same conclusion can also be proved using the concept of convergence in probability.An estimator ˆn converges to in probability iflim P ˆn 0 for any 0 n and is denoted as plim( ˆn ) .The consistency of OLSE can be obtained under the weaker assumption that X 'X plim * . n exists and is a nonsingular and nonstochastic matrix such that X ' plim n 0. Sinceb ( X ' X ) 1 X ' 1 X ' X X ' . n n So 1 X 'X X ' plim(b ) plim plim n n * 1.0 0.Thus b is a consistent estimator of . Same is true for m.l.e. also.Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur13

(ii) Consistency of s 2Now we look at the consistency of s 2 as an estimate of 2 as1e 'en k1 ' H n ks2 11 k 1 ' ' X ( X ' X ) 1 X ' n n 1 1 k ' ' X X ' X X ' 1 . n n n n nNote that ' nconsists of1 n 2 i and { i2 , i 1, 2,., n} is a sequence of independently and identicallyn i 1distributed random variables with mean 2 . Using the law of large numbers ' plim n 2 1 ' X X ' X 1 X ' X ' 'X X ' X plim plim plim plim n n n n n n 0. * 1.0 0 plim( s 2 ) (1 0) 1 2 0 2.Thus s 2 is a consistent estimator of 2 . The same holds true for m.l.e. also.Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur14

Cramer-Rao lower boundLet ( , 2 ) ' . Assume that both and 2 are unknown. If E ( ˆ) , then the Cramer-Rao lowerbound for ˆ is grater than or equal to the matrix inverse of 2 ln L( ) I ( ) E ' ln L( , 2 ) E 2 ln L( , 2 ) E 2 X 'X E 2 (y X )' X E 4 X 'X 0 2 .n 0 2 4 ln L( , 2 ) E 2 2 ln L( , ) E 22 2 ( ) X '( y X ) E 4 ( y X ) '( y X ) n E 4 6 2 Then 2 ( X ' X ) 1 1 I ( ) 0 0 2 4 n is the Cramer-Rao lower bound matrix of and 2 .The covariance matrix of OLSEs of and 2 is 2 ( X ' X ) 1 OLS 0 0 2 4 n k which means that the Cramer-Rao have bound is attained for the covariance of b but not for s 2 .Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur15

Standardized regression coefficients:Usually, it is difficult to compare the regression coefficients because the magnitude of ˆ j reflects the unitsof measurement of j th explanatory variable X j . For example, in the following fitted regression modelyˆ 5 X 1 1000 X 2 ,y is measured in litres, X 1 in litres andX 2 in millilitres. Although ˆ2 ˆ1 but the effect of bothexplanatory variables is identical. One litre change in either X 1 and X 2 when another variable is held fixedproduces the same change is ŷ .Sometimes it is helpful to work with scaled explanatory variables and study variable that producesdimensionless regression coefficients. These dimensionless regression coefficients are called asstandardized regression coefficients.There are two popular approaches for scaling, which gives standardized regression coefficients. We discussthem as follows:1. Unit normal scaling:Employ unit normal scaling to each explanatory variable and study variable. So definezij yi* xij x jsj, i 1, 2,., n, j 1, 2,., kyi ysy1 n1 n( xij x j ) 2 and( yi y ) 2 are the sample variances ofs y2 n 1 i 1n 1 i 1variable and study variable, respectively.wheres 2j j th explanatoryAll scaled explanatory variable and scaled study variable has mean zero and sample variance unity, i.e.,using these new variables, the regression model becomesyi* 1 zi1 2 zi 2 . k zik i , i 1, 2,., n.Such centering removes the intercept term from the model. The least-squares estimate of ( 1 , 2 ,., k ) 'is ˆ ( Z ' Z ) 1 Z ' y* .This scaling has a similarity to standardizing a normal random variable, i.e., observation minus its mean anddivided by its standard deviation. So it is called as a unit normal scaling.Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur16

2. Unit length scaling:In unit length scaling, define ij yi0 xij x jS 1/2jj, i 1, 2,., n; j 1, 2,., kyi ySST1/2nwhere S jj ( xij x j ) 2 is the corrected sum of squares fori 1j th explanatory variables X j andnST SST ( yi y ) 2 is the total sum of squares. In this scaling, each new explanatory variable W j hasi 11 n ij 0 and lengthn i 1mean j n ( i 1ij j ) 2 1.In terms of these variables, the regression model isyio 1 i1 2 i 2 . k ik i , i 1, 2,., n.The least-squares estimate of the regression coefficient ( 1 , 2 ,., k ) ' is ˆ (W 'W ) 1W ' y 0 .In such a case, the matrix W 'W is in the form of the correlation matrix, i.e., 1 r12W 'W r13 r 1kwherenrij (xu 1uir13 r1k r23 r2 k 1 r3k r3k 1 r121r23 r2 k xi )( xuj x j )1/2( Sii S jj ) Sij( Sii S jj )1/2is the simple correlation coefficient between the explanatory variables X i and X j . SimilarlyW ' y o (r1 y , r2 y ,., rky ) 'wherenrjy (xu 1uj x j )( yu y )1/2( S jj SST ) Siy( S jj SST )1/2is the simple correlation coefficient between the j th explanatory variable X j and study variable y .Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur17

Note that it is customary to refer rij and rjy as correlation coefficient though X i ' s are not random variable.If unit normal scaling is used, thenZ ' Z (n 1)W 'W .So the estimates of regression coefficient in unit normal scaling (i.e., ˆ ) and unit length scaling (i.e., ˆ ) areidentical. So it does not matter which scaling is used, so ˆ ˆ .The regression coefficients obtained after such scaling, viz., ˆ or ˆ usually called standardized regressioncoefficients.The relationship between the original and standardized regression coefficients is SSb j ˆ j T S jj1/ 2 , j 1, 2,., kandkb0 y b j x jj 1whereb0 is the OLSE of intercept term and b j are the OLSE of slope parameters.The model in deviation formThe multiple linear regression model can also be expressed in the deviation form.First, all the data is expressed in terms of deviations from the sample mean.The estimation of regression parameters is performed in two steps: First step: Estimate the slope parameters. Second step : Estimate the intercept term.The multiple linear regression model in deviation form is expressed as follows:Let1A I 'nwhere 1,1,.,1 ' is a n 1 vector of each element unity. So 1 0A 00 0 1 1 0 1 1 n 0 1 11 1 1 1 . 1 1 Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur18

Then y1 y1 n11y yi 1,1,.,1 2 ' y nn i 1n yn Ay y y y1 y , y2 y ,., yn y '.Thus pre-multiplication of any column vector by A produces a vector showing those observations indeviation form:Note that1A ' n1 .nn 0and A is a symmetric and idempotent matrix.In the modely X ,the OLSE of isb X 'X X 'y 1and the residual vector ise y Xb.Note that Ae e.If the n k matrix is partitioned asX X 1X 2* whereX 1 1,1,.,1 ' is n 1 vector with all elements unity, X 2* is n k 1 matrix of observations of k 1 explanatory variables X 2 , X 3 ,., X k and OLSE b b1 , b2* ' is suitably partitioned with OLSE ofintercept term 1 as b1 and b2 as a k 1 1 vector of OLSEs associated with 2 , 3 ,., k .Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur19

Theny X 1b1 X 2*b2* e.Premultiply by A,Ay AX 1b1 AX 2*b2* Ae AX 2*b2* e.Premultiply by X 2* givesX 2* ' Ay X 2* ' AX 2*b2* X 2* ' e X 2* ' AX 2*b2* .Since A is symmetric and idempotent, AX ' Ay AX ' AX b*2*2*2*2.This equation can be compared with the normal equations X ' y X ' Xb in the model y X . Such acomparison yields the following conclusions: b2* is the sub vector of OLSE. Ay is the study variables vector in deviation form. AX 2* is the explanatory variable matrix in deviation form. This is the normal equation in terms of deviations. Its solution gives OLS of slope coefficients asb2* AX 2* ' AX 2* 1 AX ' Ay .*2The estimate of the intercept term is obtained in the second step as follows:Premultiplying y Xb e by1 ' givesn111 ' y ' Xb ' ennn b1 by 1 X 2 X 3 . X k 2 0 bk b1 y b2 X 2 b3 X 3 . bk X k .Now we explain various sums of squares in terms of this model.Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur20

The expression of the total sum of squares (TSS) remains the same as earlier and is given byTSS y ' Ay.SinceAy AX 2*b2* ey ' Ay y ' AX 2*b2* y ' e Xb e ' AX 2*b2* y ' e X 1b1 X 2*b2* e ' AX 2*b2* X 1b1 X 2*b2* e ' e b2* ' X 2* ' AX 2*b2* e ' eTSS SSreg SS reswhere the sum of squares due to regression isSS reg b2* ' X 2* ' AX 2*b2*and the sum of squares due to residual isSSres e ' e .Testing of hypothesis:There are several important questions which can be answered through the test of hypothesis concerning theregression coefficients. For example1.What is the overall adequacy of the model?2.Which specific explanatory variables seem to be important?etc.In order the answer such questions, we first develop the test of hypothesis for a general framework, viz.,general linear hypothesis. Then several tests of hypothesis can be derived as its special cases. So first, wediscuss the test of a general linear hypothesis.Test of hypothesis for H 0 : R rWe consider a general linear hypothesis that the parameters in are contained in a subspace of parameterspace for which R r , where R is ( J k ) a matrix of known elements and r is a ( J 1 ) vector of knownelements.Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur21

In general, the null hypothesisH 0 : R ris termed as general linear hypothesis andH1 : R ris the alternative hypothesis.We assume that rank ( R) J , i.e., full rank so that there is no linear dependence in the hypothesis.Some special cases and interesting example of H 0 : R r are as follows:(i)H 0 : i 0Choose J 1, r 0, R [0, 0,., 0,1, 0,., 0] where 1 occurs at the i th position is R .This particular hypothesis explains whether X i has any effect on the linear model or not.(ii)H 0 : 3 4 or H 0 : 3 4 0Choose J 1, r 0, R [0, 0,1, 1, 0,., 0](iii)H 0 : 3 4 5or H 0 : 3 4 0, 3 5 0 0 0 1 1 0 0 . 0 Choose J 2, r (0, 0) ', R . 0 0 1 0 1 0 . 0 (iv)H 0 : 3 5 4 2Choose J 1, r 2, R 0, 0,1,5, 0.0 (v)H 0 : 2 3 . k 0J k 1r (0, 0,., 0) ' 0 0 1 0 . 0 0 I 0 0 1 . 0 k 1 R . 0 0 0 . 1 ( k 1) k 0 This particular hypothesis explains the goodness of fit. It tells whether i has a linear effect or not and arethey of any importance. It also tests that X 2 , X 3 ,., X k have no influence in the determination of y . Here 1 0 is excluded because this involves additional implication that the mean level of y is zero. Our mainconcern is to know whether the explanatory variables help to explain the variation in y around its meanvalue or not.Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur22

We develop the likelihood ratio test for H 0 : R r.Likelihood ratio test:The likelihood ratio test statistic is max L( , 2 y, X )Lˆ ( ) 2max L( , y, X , R r ) Lˆ ( )where is the whole parametric space and is the sample space.If both the likelihoods are maximized, one constrained, and the other unconstrained, then the value of theunconstrained will not be smaller than the value of the constrained. Hence aightforwardcasewhenR I k and r 0 , i.e., 0 . This will give us a better and detailed understanding of the minor details,and then we generalize it for R r , in general.Likelihood ratio test for H 0 : 0Let the null hypothesis related to k 1 vector isH 0 : 0where 0 is specified by the investigator. The elements of 0 can take on any value, including zero. Theconcerned alternative hypothesis isH1 : 0 .Since N (0, 2 I ) iny X , so y N ( X , 2 I ). Thus the whole parametric space and samplespace are and respectively given by : ( , 2 ) : i , 2 0, i 1, 2,., k : ( , 2 ) : 0 , 2 0 .The unconstrained likelihood under .L( , 2 y, X ) 1(2 )2 n /2 1 exp 2 ( y X ) '( y X ) . 2 Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur23

This is maximized over when ( X ' X ) 1 X ' y1n where and 2 are the maximum likelihood estimates of and 2 which are the values maximizing the 2 ( y X ) '( y X ).likelihood function.Lˆ ( ) max L , 2 y, X ) 1( y X ) '( y X ) exp n 2( y X ) '( y X ) 2 2 n n ( y X ) '( y X ) n n n /2 exp 2 .nn /2(2 ) ( y X ) '( y X ) 2The constrained likelihood under isLˆ ( ) max L( , 2 y, X , 0 ) 1(2 )2 n /2 1 exp 2 ( y X 0 ) '( y X 0 ) . 2 Since 0 is known, so the constrained likelihood function has an optimum variance estimator1n 2 ( y X 0 ) '( y X 0 ) n n n /2 exp 2 .Lˆ ( ) n /2n /2(2 ) ( y X 0 ) '( y X 0) The likelihood ratio is n n /2 exp( n / 2) n /2n /2 (2 ) ( y X ) '( y X ) Lˆ ( ) ˆ L( ) n n /2 exp( n / 2) (2 ) n /2 ( y X ) '( y X ) n /2 00 ( y X 0 ) '( y X 0 ) ( y X ) '( y X ) 2 2 n/

Regression Analysis Chapter 3 Multiple Linear Regression Model Shalabh, IIT Kanpur 5 Principle of ordinary least squares (OLS) Let B be the set of all possible vectors . If there is no further information, the B is k-dimensional real Euclidean space. The object is to find a vector bbb b' ( , ,., ) 12 k from B that minimizes the sum of squared