Creating Dummy Variables In IBM SPSS Statistics

Transcription

Creating Dummy Variablesin IBM SPSS Statistics

What are Dummy Variables Also known as Indicator Variables Used in techniques like Regression where there is an assumptionthat the predictors measurement level is scale Dummy coding get’s around this assumption Take a value of 0 or 1 to indicate the absence (0) or presence (1)of some categorical effect k -1 dummy variables required for a variable with k categories2

An Example Suppose you have a nominal variable with more than twocategories that you want to use as a predictor in a linearRegression analysis i.e. Job Category Then you will need to create 2 dummy variables (i.e. thenumber of categories – 1) and include these new dummyvariables in your regression model3

Considerations Number of dummy variables – straight forward k-1, wherek is the number of categories Choose a reference category – this is the category that youwill compare all the other categories against Often the reference category will be the first or last category4

Doing this in IBM SPSS Statistics Built into the Logistic Regression procedures, needs to becreated manually for Linear Regression/DiscriminantAnalysis No single function available Best to do this using syntax5

Approach 1 Using “Employee Data.sav” located inC:\Program *Version: Your SPSS Statistics Version, e.g. 20, 21, 22, For variable jobcat create two dummy variables: jobcat1 andjobcat2 Initially set each variable to 0 and then specify that each willtake on a value of 1 for job categories 1 and 2 In this way category number 3 is set to be the reference6category

Approach 17

Approach 18

Approach 2 Using the VECTOR and LOOP – END LOOP commands Use the Vector Command to create the required number ofdummy variables i.e. 2 in this case Use the LOOP – END LOOP command to loop through eachof the dummy variables that are created using the VECTORcommand9

Approach 210

Approach 2 This approach will make the last category the referencecategory as we are only looping through categories 1 and 2in COMPUTE jobcat(#i) ( jobcat #i). To make the first category the reference category you couldmodify the COMPUTE statement in the syntax as follows: COMPUTE jobcat(#i) ( jobcat #i 1).11

Dealing with missing values Modify compute statements in Approach 1 to just: IF (NOT MISSING(jobcat)) jobcat1 0. IF (NOT MISSING(jobcat)) jobcat2 0. This ensures missing values are still missing in the dummyvariables Approach 2 will deal with missing values implicitly12

Approach 1 modified to account for missing values13

For more Tech www.presidion.comTalk to us info@presidion.com 44 (0)208 757 8820 (UK) 353 (0)1 415 0234 (IRL)

Doing this in IBM SPSS Statistics Built into the Logistic Regression procedures, needs to be created manually for Linear Regression/Discriminant Analysis No single function available Best to do this using s