Introduction To Generalized Linear Models - WU

Transcription

Introduction to Generalized Linear ModelsHeather TurnerESRC National Centre for Research Methods, UKandDepartment of StatisticsUniversity of Warwick, UKWU, 2008–04–22-24Copyright c Heather Turner, 2008

Introduction to Generalized Linear ModelsIntroductionThis short course provides an overview of generalized linear models(GLMs).We shall see that these models extend the linear modellingframework to variables that are not Normally distributed.GLMs are most commonly used to model binary or count data, sowe will focus on models for these types of data.

Introduction to Generalized Linear ModelsOutlinesPlanPart I: Introduction to Generalized Linear ModelsPart II: Binary DataPart III: Count Data

Introduction to Generalized Linear ModelsOutlinesPart I: Introduction to Generalized Linear ModelsPart I: IntroductionReview of Linear ModelsGeneralized Linear ModelsGLMs in RExercises

Introduction to Generalized Linear ModelsOutlinesPart II: Binary DataPart II: Binary DataBinary DataModels for Binary DataModel SelectionModel EvaluationExercises

Introduction to Generalized Linear ModelsOutlinesPart III: Count DataPart III: Count DataCount DataModelling RatesModelling Contingency TablesExercises

IntroductionPart IIntroduction to Generalized Linear Models

IntroductionReview of Linear ModelsStructureThe General Linear ModelIn a general linear modelyi β0 β1 x1i . βp xpi ithe response yi , i 1, . . . , n is modelled by a linear function ofexplanatory variables xj , j 1, . . . , p plus an error term.

IntroductionReview of Linear ModelsStructureGeneral and LinearHere general refers to the dependence on potentially more thanone explanatory variable, v.s. the simple linear model:yi β0 β1 xi iThe model is linear in the parameters, e.g.yi β0 β1 x1 β2 x21 iyi β0 γ1 δ1 x1 exp(β2 )x2 ibut not e.g.yi β0 β1 xβ1 2 iyi β0 exp(β1 x1 ) i

IntroductionReview of Linear ModelsStructureError structureWe assume that the errors i are independent and identicallydistributed such thatE[ i ] 0and var[ i ] σ 2Typically we assume i N (0, σ 2 )as a basis for inference, e.g. t-tests on parameters.

IntroductionReview of Linear ModelsExamplesSome Examplesbodyfat 14.59 0.7 * biceps 0.9 * abdomin60bodyfat402002530bic 35eps4045 160140100120abdomin8060

IntroductionReview of Linear Models200Examples100 50 Length 75.7 17.2 * AgeLength 143.3 232.8 * Age 66.6 * Age 20Length150 1234Age567

IntroductionReview of Linear ModelsExamples36particle sizeij operatori resinj 321121212323121335633328particle size34231212123334resin78

IntroductionReview of Linear ModelsRestrictionsRestrictions of Linear ModelsAlthough a very useful framework, there are some situations wheregeneral linear models are not appropriateIthe range of Y is restricted (e.g. binary, count)Ithe variance of Y depends on the meanGeneralized linear models extend the general linear modelframework to address both of these issues

IntroductionGeneralized Linear ModelsStructureGeneralized Linear Models (GLMs)A generalized linear model is made up of a linear predictorηi β0 β1 x1i . βp xpiand two functionsIa link function that describes how the mean, E(Yi ) µi ,depends on the linear predictorg(µi ) ηiIa variance function that describes how the variance, var(Yi )depends on the meanvar(Yi ) φV (µ)where the dispersion parameter φ is a constant

IntroductionGeneralized Linear ModelsStructureNormal General Linear Model as a Special CaseFor the general linear model with N (0, σ 2 ) we have the linearpredictorηi β0 β1 x1i . βp xpithe link functiong(µi ) µiand the variance functionV (µi ) 1

IntroductionGeneralized Linear ModelsStructureModelling Binomial DataSupposeYi Binomial(ni , pi )and we wish to model the proportions Yi /ni . ThenE(Yi /ni ) pivar(Yi /ni ) 1pi (1 pi )niSo our variance function isV (µi ) µi (1 µi )Our link function must map from (0, 1) ( , ). A commonchoice is µig(µi ) logit(µi ) log1 µi

IntroductionGeneralized Linear ModelsStructureModelling Poisson DataSupposeYi Poisson(λi )ThenE(Yi ) λivar(Yi ) λiSo our variance function isV (µi ) µiOur link function must map from (0, ) ( , ). A naturalchoice isg(µi ) log(µi )

IntroductionGeneralized Linear ModelsStructureTransformation vs. GLMIn some situations a response variable can be transformed toimprove linearity and homogeneity of variance so that a generallinear model can be applied.This approach has some drawbacksIresponse variable has changed!Itransformation must simulateneously improve linearity andhomogeneity of varianceItransformation may not be defined on the boundaries of thesample space

IntroductionGeneralized Linear ModelsStructureFor example, a common remedy for the variance increasing withthe mean is to apply the log transform, e.g.log(yi ) β0 β1 x1 i E(log Yi ) β0 β1 x1This is a linear model for the mean of log Y which may not alwaysbe appropriate. E.g. if Y is income perhaps we are really interestedin the mean income of population subgroups, in which case itwould be better to model E(Y ) using a glm :log E(Yi ) β0 β1 x1with V (µ) µ. This also avoids difficulties with y 0.

IntroductionGeneralized Linear ModelsStructureExponential FamilyMost of the commonly used statistical distributions, e.g. Normal,Binomial and Poisson, are members of the exponential family ofdistributions whose densities can be written in the form yθ b(θ)f (y; θ, φ) expφ c(y, φ)where φ is the dispersion parameter and θ is the canonicalparameter.It can be shown thatE(Y ) b0 (θ) µandvar(Y ) φb00 (θ) φV (µ)

IntroductionGeneralized Linear ModelsStructureCanonical LinksFor a glm where the response follows an exponential distributionwe haveg(µi ) g(b0 (θi )) β0 β1 x1i . βp xpiThe canonical link is defined asg (b0 ) 1 g(µi ) θi β0 β1 x1i . βp xpiCanonical links lead to desirable statistical properties of the glmhence tend to be used by default. However there is no a priorireason why the systematic effects in the model should be additiveon the scale given by this link.

IntroductionGeneralized Linear ModelsEstimationEstimation of the Model ParametersA single algorithm can be used to estimate the parameters of anexponential family glm using maximum likelihood.The log-likelihood for the sample y1 , . . . , yn isl nXyi θi b(θi )φii 1 c(yi , φi )The maximum likelihood estimates are obtained by solving thescore equationsns(βj ) X yi µ ixij l 0 0 βjφi V (µi ) g (µi )i 1for parameters βj .

IntroductionGeneralized Linear ModelsEstimationWe assume thatφi φaiwhere φ is a single dispersion parameter and ai are known priorweights; for example binomial proportions with known index nihave φ 1 and ai ni .The estimating equations are thennX ai (yi µi )xij l 0 0 βjV (µi )g (µi )i 1which does not depend on φ (which may be unknown).

IntroductionGeneralized Linear ModelsEstimationA general method of solving score equations is the iterativealgorithm Fisher’s Method of Scoring (derived from a Taylor’sexpansion of s(β))In the r-th iteration , the new estimate β (r 1) is obtained from theprevious estimate β (r) by 1β (r 1) β (r) s β (r) E H β (r)where H is the Hessian matrix: the matrix of second derivativesof the log-likelihood.

IntroductionGeneralized Linear ModelsEstimationIt turns out that the updates can be written as 1X T W (r) z (r)β (r 1) X T W ( r)Xi.e. the score equations for a weighted least squares regression ofz (r) on X with weights W (r) diag(wi ), where (r)(r)(r)(r)0zi ηi yi µig µia(r)iand wi 2(r)(t)V µig 0 µi

IntroductionGeneralized Linear ModelsEstimationHence the estimates can be found using an Iteratively(Re-)Weighted Least Squares algorithm:(r)1. Start with initial estimates µi(r)2. Calculate working responses zi(r)and working weights wi3. Calculate β (r 1) by weighted least squares4. Repeat 2 and 3 till convergenceFor models with the canonical link, this is simply theNewton-Raphson method.

IntroductionGeneralized Linear ModelsEstimationStandard ErrorsThe estimates β̂ have the usual properties of maximum likelihoodestimators. In particular, β̂ is asymptoticallyN (β, i 1 )wherei(β) φ 1 X T W XStandard errors for the βj may therefore be calculated as thesquare roots of the diagonal elements ofcov(ˆ β̂) φ(X T Ŵ X) 1in which (X T Ŵ X) 1 is a by-product of the final IWLS iteration.If φ is unknown, an estimate is required.

IntroductionGeneralized Linear ModelsEstimationThere are practical difficulties in estimating the dispersion φ bymaximum likelihood.Therefore it is usually estimated by method of moments. If βwas known an unbiased estimate of φ {ai var(Y )}/v(µi ) wouldben1 X ai (yi µi )2nV (µi )i 1Allowing for the fact that β must be estimated we obtainn1 X ai (yi µi )2n pV (µi )i 1

IntroductionGLMs in Rglm FunctionThe glm FunctionGeneralized linear models can be fitted in R using the glm function,which is similar to the lm function for fitting linear models.The arguments to a glm call are as followsglm(formula, family gaussian, data, weights, subset,na.action, start NULL, etastart, mustart, offset,control glm.control(.), model TRUE,method ”glm.fit”, x FALSE, y TRUE,contrasts NULL, .)

IntroductionGLMs in Rglm FunctionFormula ArgumentThe formula is specified to glm as, e.g.y x1 x2where x1, x2 are the names ofInumeric vectors (continuous variables)Ifactors (categorical variables)All specified variables must be in the workspace or in the dataframe passed to the data argument.

IntroductionGLMs in Rglm FunctionOther symbols that can be used in the formula includeIa:b for an interaction between a and bIa*b which expands to a b a:bI. for first order terms of all variables in dataI- to exclude a term or termsI1 to include an intercept (included by default)I0 to exclude an intercept

IntroductionGLMs in Rglm FunctionFamily ArgumentThe family argument takes (the name of) a family function whichspecifiesIthe link functionIthe variance functionIvarious related objects used by glm, e.g. linkinvThe exponential family functions available in R areIbinomial(link "logit")Igaussian(link "identity")IGamma(link "inverse")Iinverse.gaussian(link "1/mu2 ")Ipoisson(link "log")

IntroductionGLMs in Rglm FunctionExtractor FunctionsThe glm function returns an object of class c("glm", "lm").There are several glm or lm methods available foraccessing/displaying components of the glm object, eviance()Iformula()Isummary()

IntroductionGLMs in RExample with Normal DataExample: Household Food ExpenditureGriffiths, Hill and Judge (1993) present a dataset on foodexpenditure for households that have three family members. Weconsider two variables, the logarithm of expenditure on food andthe household income:dat - read.table("GHJ food income.txt", header TRUE)attach(dat)plot(Food Income, xlab "Weekly Household Income ( )",ylab "Weekly Household Expenditure on Food (Log )")It would seem that a simple linear model would fit the data well.

IntroductionGLMs in RExample with Normal DataWe will first fit the model using lm, then compare to the resultsusing glm.foodLM - lm(Food Income)summary(foodLM)foodGLM - glm(Food Income)summary(foodGLM)

IntroductionGLMs in RExample with Normal DataSummary of Fit Using lmCall:lm(formula Food Income)Residuals:Min1QMedian-0.508368 -0.157815 e Std. Error t value Pr( t )(Intercept) 2.4094180.161976 14.875 2e-16 ***Income0.0099760.0022344.465 6.95e-05 ***--Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1Residual standard error: 0.2766 on 38 degrees of freedomMultiple R-squared: 0.3441,Adjusted R-squared: 0.3268F-statistic: 19.94 on 1 and 38 DF, p-value: 6.951e-05

IntroductionGLMs in RExample with Normal DataSummary of Fit Using glmThe default family for glm is "gaussian" so the arguments of thecall are unchanged.A five-number summary of the deviance residuals is given. Sincethe response is assumed to be normally distributed these are thesame as the residuals returned from lm.Call:glm(formula Food Income)Deviance Residuals:Min1Q-0.508368 -0.157815Median-0.0053573Q0.187894Max0.491421

IntroductionGLMs in RExample with Normal DataThe estimated coefficients are unchangedCoefficients:Estimate Std. Error t value Pr( t )(Intercept) 2.4094180.161976 14.875 2e-16 ***Income0.0099760.0022344.465 6.95e-05 ***--Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1(Dispersion parameter for gaussian family taken to be 0.07650739Partial t-tests test the significance of each coefficient in thepresence of the others. The dispersion parameter for the gaussianfamily is equal to the residual variance.

IntroductionGLMs in RExample with Normal DataWald TestsFor non-Normal data, we can use the fact that asymptoticallyβ̂ N (β, φ(X 0 W X) 1 )and use a z-test to test the significance of a coefficient.Specifically, we testH0 : βj 0versusH1 : βj 6 0using the test statisticzj β̂jφ(X Ŵ X) 1jj0which is asymptotically N (0, 1) under H0 .

IntroductionGLMs in RExample with Normal DataDifferent model summaries are reported for GLMs. First we havethe deviance of two models:Null deviance: 4.4325Residual deviance: 2.9073on 39on 38degrees of freedomdegrees of freedomThe first refers to the null model in which all of the terms areexcluded, except the intercept if present. The degrees of freedomfor this model are the number of data points n minus 1 if anintercept is fitted.The second two refer to the fitted model, which has n p degeesof freedom, where p is the number of parameters, including anyintercept.

IntroductionGLMs in RExample with Normal DataDevianceThe deviance of a model is defined asD 2φ(lsat lmod )where lmod is the log-likelihood of the fitted model and lsat is thelog-likelihood of the saturated model.In the saturated model, the number of parameters is equal to thenumber of observations, so ŷ y.For linear regression with Normal data, the deviance is equal to theresidual sum of squares.

IntroductionGLMs in RExample with Normal DataAkiake Information Criterion (AIC)Finally we have:AIC: 14.649Number of Fisher Scoring iterations: 2The AIC is a measure of fit that penalizes for the number ofparameters pAIC 2lmod 2pSmaller values indicate better fit and thus the AIC can be used tocompare models (not necessarily nested).

IntroductionGLMs in RExample with Normal DataResidual AnalysisSeveral kinds of residuals can be defined for GLMs:Iresponse: yi µ̂iIworking: from the working response in the IWLS algorithmIPearsonP 2i (ri )I deviance r Dis.t.Pyi µ̂iriP pV (µ̂i )equals the generalized Pearson statisticP D 2s.t.i (ri ) equals the devianceThese definitions are all equivalent for Normal models.

IntroductionGLMs in RExample with Normal DataDeviance residuals are the default used in R, since they reflect thesame criterion as used in the fitting.For example we can plot the deviance residuals against the fittedvalues ( on the response scale) as follows:plot(residuals(foodGLM) fitted(foodGLM),xlab expression(hat(y)[i]),ylab expression(r[i]))abline(0, 0, lty 2)

IntroductionGLMs in RExample with Normal DataThe plot function gives the usual choice of residual plots, basedon the deviance residuals. By defaultIdeviance residuals v. fitted valuesINormal Q-Q plot of deviance residuals standardised to unitvarianceIscale-location plot of standardised deviance residualsIstandardised deviance residuals v. leverage with Cook’sdistance contours

IntroductionGLMs in RExample with Normal DataResidual PlotsFor the food expenditure data the residuals do not indicate anyproblems with the modelling assumptions:plot(foodGLM)

IntroductionExercisesExercises1. Load the SLID data from the car package and attach the dataframe to the search path. Look up the description of the SLIDdata in the help file.In the following exercises you will investigate models for the wagesvariable.2. Produce appropriate plots to examine the bivariate relationshipsof wages with the other variables in the data set. Which variablesappear to be correlated with wages?3. Use lm to regress wages on the linear effect of the othervariables. Look at a summary of the fit. Do the results appear toagree with your exploratory analysis? Use plot to check theresiduals from the fit. Which modelling assumptions appear to beinvalid?

IntroductionExercises4. Repeat the analysis of question 3 with log(wages) as theresponse variable. Confirm that the residuals are more consistentwith the modelling assumptions. Can any variables be droppedfrom the model?Investigate whether two-way and three-way interactions should beadded to the model.

IntroductionExercisesIn the analysis of question 4, we have estimated a model of theformpXlog yi β0 βr xir i(1)r 1which is equivalent toyi expβ0 pXr 1where i log( i ) E(log i ).!βr xir i(2)

IntroductionExercisesAssuming i to be normally distributed in Equation 1 implies thatlog(Y ) is normally distributed. If X log(Y ) N (µ, σ 2 ), then Yhas a log-Normal distribution with parameters µ and σ 2 . It can beshown that 1 2E(Y ) exp µ σ2 2var(Y ) exp(σ ) 1 {E(Y )}2so thatvar(Y ) {E(Y )}2An alternative approach is to assume that Y has a Gammadistribution, which is the exponential family with thismean-variance relationship. We can then model E(Y ) using aGLM. The canonical link for Gamma data is 1/µ, but Equation 2suggests we should use a log link here.

IntroductionExercises5. Use gnm to fit a Gamma model for wages with the samepredictor variables as your chosen model in question 4. Look at asummary of the fit and compare with the log-Normal model – Arethe inferences the same? Are the parameter estimates similar?Note that t statistics rather than z statistics are given for theparameters since the dispersion φ has had to be estimated.6. (Extra time!) Go back and fit your chosen model in question 4using glm. How does the deviance compare to the equivalentGamma model? Note that the AIC values are not comparable here:constants in the likelihood functions are dropped when computingthe AIC, so these values are only comparable when fitting modelswith the same error distribution.

Generalized Linear Models Structure Generalized Linear Models (GLMs) A generalized linear model is made up of a linear predictor i 0 1 x 1 i ::: p x pi and two functions I a link function that describes how the mean, E (Y i) i, depends on the linear predictor g( i) i I a variance function that describes how the variance, var( Y i .