Multiple Linear Regression (Dummy Variable Treatment)

Transcription

Multiple Linear Regression(Dummy Variable Treatment)CIVL 7012/8012

2In Today’s Class RecapSingle dummy variableMultiple dummy variablesOrdinal dummy variablesDummy-dummy interactionDummy-continuous/discrete interactionBinary dependent variables2

Introducing Dummy IndependentVariable Qualitative Information– Examples: gender, race, industry, region, rating grade, – A way to incorporate qualitative information is to use dummyvariables– They may appear as the dependent or as independent variables A single dummy independent variable the wage gain/loss if the personis a woman rather than a man(holding other things fixed)Dummy variable: 1 if the person is a woman 0 if the person is man3

Illustrative Example Graphical IllustrationAlternative interpretation of coefficient:i.e. the difference in mean wage betweenmen and women with the same level ofeducation.Intercept shift4

Specification of DummyVariables Dummy variable trapThis model cannot be estimated (perfectcollinearity)When using dummy variables, one category always has to be omitted:The base category are menThe base category are womenAlternatively, one could omit the intercept:Disadvantages:1) More difficult to test fordifferences between theparameters2) R-squared formula only valid5if regression contains intercept

Interpretation of Dummy Variables Estimated wage equation with intercept shiftHolding education, experience,and tenure fixed, women earn1.81 less per hour than men(Standard errors in parenthesis) Does that mean that women are discriminated against?– Not necessarily. Being female may be correlated with other productivity characteristics that have not been controlled for.6

Model with only dummy variables(Example-1) Comparing means of subpopulations described by dummiesNot holding other factors constant, womenearn 2.51 per hour less than men, i.e. thedifference between the mean wage of menand that of women is 2.51 . Discussion– It can easily be tested whether difference in means is significant– The wage difference between men and women is larger if no otherthings are controlled for; i.e. part of the difference is due to differences in education, experience and tenure between men and women7

Model with only dummy variables(Example-2) Further example: Effects of training grants on hours of trainingHours training per employeeDummy indicating whether firm received training grant This is an example of program evaluation– Treatment group ( grant receivers) vs. control group ( no grant)– Is the effect of treatment on the outcome of interest causal?8

Dependent log(y) and DummyIndependent Using dummy explanatory variables in equations for log(y)Dummy indicatingwhether house is ofcolonial styleAs the dummy for colonialstyle changes from 0 to 1,the house price increasesby 5.4 percentage points9

Dummy variables for multiplecategories Using dummy variables for multiple categories– 1) Define membership in each category by a dummy variable– 2) Leave out one category (which becomes the base category)Holding other things fixed, marriedwomen earn 19.8% less than singlemen ( the base category)10

Ordinal Dummy Variables Incorporating ordinal information using dummy variables Example: City credit ratings and municipal bond interest ratesMunicipal bond rateCredit rating from 0-4 (0 worst, 4 best)This specification would probably not be appropriate as the credit rating only containsordinal information. A better way to incorporate this information is to define dummies:Dummies indicating whether the particular rating applies, e.g. CR1 1 if CR 1 and CR1 0otherwise. All effects are measured in comparison to the worst rating ( base category).11

Interactions among dummyvariables Interactions involving dummy variablesInteraction term Allowing for different slopes intercept men slope men intercept women slope women Interesting hypothesesThe return to education is thesame for men and womenThe whole wage equation isthe same for men and women12

Graphical illustrationInteracting both the intercept andthe slope with the female dummyenables one to model completelyindependent wage equations formen and women13

Dummy-Continuous /DiscreteInteraction (2) Testing for differences in regression functions across groups Unrestricted model (contains full set of interactions)College grade point averageStandardized aptitude test scoreHigh school rank percentile Restricted model (same regression for both groups)Total hours spentin college courses15

Dummy-Continuous /DiscreteInteraction (3) Null hypothesisAll interaction effects are zero, i.e.the same regression coefficientsapply to men and women Estimation of the unrestricted modelTested individually,the hypothesis thatthe interactioneffects are zerocannot be rejected16

Restricted and UnrestrictedModels (with Dummy Variables) Joint test with F-statistic Null hypothesis is rejectedSSRr is the sum of squared residuals from the restricted regression, i.e., theregression where we impose the restriction.SSRur is the sum of squared residuals from the full model,q is the number of restrictions under the null andk is the number of regressors in the unrestricted regression.17

Binary dependent variable A Binary dependent variable: the linear probability model Linear regression when the dependent variable is binaryIf the dependent variable onlytakes on the values 1 and 0Linear probabilitymodel (LPM)In the linear probability model, thecoefficients describe the effect of theexplanatory variables on the probability that19y 1

Binary dependentvariable:Example-1 Example: Labor force participation of married women 1 if in labor force, 0 otherwiseNon-wife income (in thousand dollars per year)If the number of kids under sixyears increases by one, thepro- probability that thewoman works falls by 26.2%Does not look significant (but see below)20

Binary dependent variable:Example-2 Example: Female labor participation of married women (cont.)Graph for nwifeinc 50, exper 5,age 30, kindslt6 1, kidsge6 0The maximum level of education inthe sample is educ 17. For the given case, this leads to a predictedprobability to be in the labor forceof about 50%.Negative predicted probability (butno problem because no woman inthe sample has educ 5).21

ordinal information. A better way to incorporate this information is to define dummies: Dummies indicating whether the particular rating applies, e.g. CR 1 1 if CR 1 and CR 1 0 otherwise. All effects are measured in comparison to the worst rating ( base category). Ordinal Dummy Variables