SPSS Step-by-Step Tutorial: Part 2 - DataStep

Transcription

SPSS Step-by-StepTutorial: Part 2For SPSS Version 11.5

DataStep Development 2004

1Transformations and recodingrevisited 5Introduction 5Value labels 5SPSS Tutorial and Help6Using online help 6Using the Syntax Guide 7Using the statistics coach 8Moving around the output window 10Sorting Revisited: Sorting by multiplevariables 11Utilities: variable and file information 12Utilities Variables 12Utilities File Info 13Data Transformations14Computing new variables 14Performing calculations with a variable and a function 14Creating expressions with more than one variable 16Conditional expressions 18Creating subsets 20Deeper into crosstabs23Crosstab Statistics 23Crosstab cells 24Adding layers to crosstabs 26When to include zeros in a mean 27Gender, geography, and exercise: the universalvariables 28Summary 282Statistical procedures 31Introduction 31Measuring association31Bivariate correlations 32Partial correlation 35

Multiple correlation (multiple regression)Crosstabs 37Measuring differencesT-Tests 39ANOVA 42One-Way ANOVA 43Summary473936

1Transformations andrecoding revisitedIntroductionIn the first session, we’ll explored the SPSS interface, some elimentary data management and recodes, and some basic charting. In this second session, we’ll explorework with more complex data transformations like combining variables and subsetting populations and work with some of the primary statistical functions. We’ll alsolook more closely at the online help and tutorial provided by SPSS. . But first, asmall clean-up task from last week: displaying and hiding value labels.Value labels1.Open SPSS.2.Open the data file by selecting File Open Data and finding the fileEmployee data.sav in the folder named SPSSTutorialData.3.Make sure you’re in the Data View of any data file.4.From the menu, select View Value Labels. If Value Labels is checked, thevalue labels will be displayed for variables for which you have defined valuelabels. If it is not checked, the actual values will be displayed.SPSS Step-by-Step5

SPSS Tutorial and Help5.Select View from the menu again and make sure that Value Labels is checked.If it isn’t, click it once to select it. You can turn value labels on or off at any timeduring an SPSS session.SPSS Tutorial and HelpSPSS provides extensive assistance through its online help, tutorial, syntax guide,and statistics coach.Using online help1.From the menu, select Help Topics.2.In the keyword field, type:crosstabsNotice that SPSS begins matching topics as soon as you begin typing.3.From the list below the keyword field, double-click assumptions.4.From the keyword list, double-click formats.Notice that the resulting help window informs you that you can “arrange rows inascending or descending order of the values of the row variable.” What’s wrongwith this statement? Hint: In the rest of the computer world, “format” applies tohow you display text or numbers. Arranging in ascending or descending order iscalled sorting. Lesson: If a standard already exists, use it; don’t confuse peopleby making it up as you go along.5.From the Related topics list, click once on -Related procedures. The help window now displays information about modeling relationships between two ormore categorical variables.6.From the Related topics list, click once on Model Selection Loglinear Analysis Data Considerations. The help window now displays more informationabout this technique.7.In the keyword field, type:chi-square68.From the keyword list, double-click Chi-square test.9.In the next window, click Display. Notice that the help window now displaysgeneral information about the Chi-square test. In addition to the list of relatedtopics, the window also contains a Show Me link.SPSS Step-by-Step

SPSS Tutorial and Help10.Click Show Me. SPSS now opens the tutorial to the chi-square topic in the formof an Internet page.11.Click Next. In addition to an example of how to use a chi-square test, the window also identifies the sample data file you can use to follow the example foryourself.12.Click Next.13.Read the text on the right side of the screen. Here is where the tutorial explainseach step. And yes, that is a typo in “you must first be weight the cases ”Ignore the “be.”14.Click Next and read the steps.15.Click Next and read the steps.16.Click Next.17.Click Next. SPSS now displays the sample output.18.Close the tutorial window.19.Close the Help window.As with most help systems, you can use links to investigate topics related to thekeyword you selected. In SPSS, however, you can also open the online tutorial toget more information about using a specific procedure.Using the Syntax GuideIf you need information about SPSS syntax, you can open the online Syntax Guide.This guide explains each command and provides examples of its use. The SyntaxGuide is in Adobe Acrobat format. In this format, you can search the guide for specific text or use the Bookmarks pane to find a specific command.1.From the menu, select Help Syntax Guide Base.2.Look at the bottom of the window where Acrobat lists the page count. Yes, thatis 1490 pages. In other words, if you want, you can print the entire SyntaxGuide.3.In the bookmarks pane, click the “ ” to the left of UNIVERSALS. The topicopens to display its subtopics.4.Click the “ ” to the left of Commands. (The Commands under Universals, thatis, not the Commands further down the list.)5.Click Syntax Diagrams. This topic provides information about the basic structure of SPSS syntax.SPSS Step-by-Step7

SPSS Tutorial and Help6.Now click the “ ” to the left of the top-level COMMANDS topic. The windowopens to display the subtopics for COMMANDS.7.Scroll down until you can see the CROSSTABS entry.8.Click the “ ” to the left of CROSSTABS.9.Click the word CROSSTABS. The Crosstabs page is now displayed and provides information about the complete format of the Crosstabs command.10.Close the Syntax Guide window.Using the statistics coachOne of the most useful functions in SPSS is the statistics coach, particularly whenyou’re just starting to work with the program. The statistics coach provides promptsat which you can select what you want to do, the kind of data you’re using, and thekind of output you want.1.From the menu, select Help Statistics Coach. SPSS opens the first StatisticsCoach window (Figure 1).FIGURE 1.8Statistics Coach, opening windowSPSS Step-by-Step

SPSS Tutorial and Help2.Take a moment to review the choices offered in this window.3.Click More Examples a few times and notice that different types output available to you.4.For the moment, we’ll use the default task, Summarize, describe, or presentdata, so click Next.5.In this case, we want to create a summary of gender by job category. Both variables are categorical, so click Next.6.This time we’ll change the output from the default, so select Charts andgraphs by clicking its radio button.7.We want a simple two-dimensional chart, so click Next.8.Click Next.9.We want a bar chart, so click Finish.10.SPSS now opens the correct window for creating a bar chart with the type ofdata we have selected.11.Drag Employment Category to the horizontal axis.12.Drag Gender to the Legend Variables Color field.13.Close the remaining help window.14.Click OK. The chart appears in the output window. Next, we’ll use the statisticscoach for a more complicated task.15.From the menu, select Window Employee Data.sav - SPSS Editor to get backto the Data window.16.From the menu select Help Statistics Coach.17.Select Compare groups for significant differences.18.Click Next.19.We’re using categorical data, so click Finish. The How To window appears toguide you through the steps along with the Crosstabs window. (You might haveto move them around a bit so you can see both windows.20.Select Gender and move it to the Rows pane.21.Select Employment Category and move it to the Columns pane.22.Click Statistics.23.In the Crosstabs: Statistics window, select Chi-square.24.Click Continue.25.In the How To window, click Tell Me More. SPSS now displays the DataRequirements window that applies to using Chi-square.26.Click Next. Surprise! There is no next topic. Ha, ha, SPSS. Very funny.SPSS Step-by-Step9

Moving around the output window27.Click OK.28.Click Back. Oh look! There is no Back.29.Close the Data Requirements window.30.In the Crosstabs window, click OK. The output is now displayed in the Outputwindow.31.Close any remaining Help windows.Moving around the output windowNow that you have created some charts, crosstabs, and statistics results, it’s a goodtime to take a closer look at moving around the output window.1.From the menu, select Window Output1 - SPSS Viewer.2.Move your cursor over the border between the panels, hold down the mousebutton, and move the border to the right until you can read all the titles in theicons in the contents pane as in Figure 2.FIGURE 2.Changing pane width in the output windowhold and drag hereto change the panesize in the outputwindow103.Notice the icons on the left arranged in outline format.4.Click the icon named Interactive Graph. The output displays moves to the firstgraph you created in this session.SPSS Step-by-Step

Sorting Revisited: Sorting by multiple variables5.Notice the “-” to the left of the Output icon. The “-” indicates that a topic is fullyexpanded.6.Click the “-” next to Output. The “-” changes to a “ ” and all the output is nowhidden. If you ever “lose” output on the window, check to see if the output ishidden.7.Click “ ” next to Output to expand the items again.8.Click the icon named “Crosstabs.”9.Holding down the left mouse button, drag “Crosstabs” up above “InteractiveChart.” You can use the drag function to arrange your output in any order youlike.10.Below Crosstabs, click the icon for Title.11.Click the icon for Chi-square tests. Notice that the red arrow next to the iconcorresponds to the red arrow in the actual output window.12.Under Interactive Graph, click Bar Chart.13.Above the actual chart, double-click the title (“Interactive Graph”).14.Select all the text and type:Distribution of job category by gender15.Click anywhere outside the title to apply the change.16.In the navigation pane, click Title under Interactive Graph.17.Wait about one second, then click Title again to activate the text.18.With the text selected, type:19.Press Enter or click anywhere outside the title to apply the change.Distribution of job category by genderUse the title function in the navigation pane to indicate what each piece of output it contains. “Distribution of job category by gender” is a lot more informative than “Title.”Sorting Revisited: Sorting by multiple variablesSPSS provides sophisticated sorting functions. You can sort by multiple variables,and you can set the sort order for each variable. For example, you could sort inorder of increasing income, decreasing birthdate, and increasing expenditures. Inthe following task, you’ll sort the employee data set by gender (increasing) and current salary (decreasing).SPSS Step-by-Step11

Utilities: variable and file information1.Switch back to the data view by selecting Window Employee data.sav - SPSSData Editor.2.From the menu, select Data Sort Cases.3.Clear any criteria that might already be in the Sort by pane by double-clickingthem.4.Double-click gender and current salary to move them to the Sort by pane.5.Click once on gender.6.If it is not already selected, select Ascending.7.Click once on current salary.8.Click Descending.9.Click OK. Notice that all female employees are now listed first, in descendingsalary order.Utilities: variable and file informationTucked quietly under the Utilities menu are two especially useful functions: Variables and File Info. You can use these functions to get a snapshot of each variablein the file (Variables) and all variables together (File Info).Utilities VariablesThe Variables function provides all the information about each variable in yourdata file, including any categorical codes and their value labels.1.12From the menu, select Utilities Variables. (Figure 3)SPSS Step-by-Step

Utilities: variable and file informationFIGURE 3.Variables window2.In the variable list, click jobcat. Notice that the Variable Information pane displays the variable name, label, defined missing value, measurement level, values, and value labels.3.Click salary. This value is a scale variable (continuous) and so has no valuelabels.4.The Go To button takes you to the specific variable within a selected case or tothe variable in the first case if no case is selected.5.Click Close.Utilities File InfoThe File Info window provides information about all variables in the file. This isextremely useful information. We recommend that you print out the file definitionregularly and keep it close at hand.1.From the menu, select Utilities File Info.2.Scroll through the output to see how each variable is described. Because thisinformation goes to the output window, be sure that you print only the File Infooutput.3.In the navigation pane, click once on the File Information icon.4.Press Ctrl-P to open a print dialog window. Notice that under Print Range youcan select All Visible Output or Selection. When you print the File Info, be sureto select Selection.5.Click Cancel.SPSS Step-by-Step13

Data TransformationsData TransformationsSPSS provides a number of funtions you can use in computing new variables,including: Tarithmetic funtionsstatistical functionsstring functionsdate and time functionsdistribution functionsrandom variable functionsmissing value functionsIn this session, we’ll be looking at only the arithmetic functions.Computing new variablesPerforming calculations with a variable and a functionIn some cases, you might want to calculate new variables based on values in existing variables and some arithmetic function like multiplying or dividing. For example, if you have a variable that contains an annual salary, you might want tocalcuate a monthly salary. To create the new variable, you use the Compute function.1.14In the Data window, select from the menu Transform Compute. (Figure 4)SPSS Step-by-Step

Data TransformationsFIGURE 4.2.Compute windowIn the Target Variable field, type:salmonth3.Click Type & Label. (Figure 5)FIGURE 5.4.Compute: Type and label windowIn the Label field, type:Average monthly salary5.Click Continue.6.In the Compute Variable window, select Current Salary and move it to theNumeric Expression pane by clicking the right arrow.7.In the Numeric Expression pane, click the cursor after salary and type:/12SPSS Step-by-Step15

Data TransformationsYour window should now look likeFigure 6FIGURE 6.8.Entering a compute formulaClick OK. The Compute Variable window closes and the new variable is displayed in the Data window. You can now use the new variable in proceduressuch as crosstabs or in further calculations. For example, you could create a newvariable for monthly withholding that calculates withholding as a percentage ofmonthly salary. You could then subtrack the new withholding variable from themonthly salary to create still another variable for monthly net.Creating expressions with more than one variableLet’s use the previous example of calculating withholding and net to compute variables based on more than one variable. First you’ll compute the withholding variable, then you’ll compute the net variable.1.From the menu, select Transform Compute.2.In the Target Variable field, type:withhold3.Click Type & Label.4.In the Label field, type:Monthly withholding165.Click Continue.6.Select all the text in the Numeric Expression field and delete it.SPSS Step-by-Step

Data Transformations7.Move the new variable, Average Monthly Salary, to the Numeric Expressionfield.8.Click after salmonth and type:* .05Note: If you haven’t worked with computer programs before to make calculations,the asterisk denotes multiplication. A double asterisk (**) denotes exponentiation. In SPSS, a vertical bar ( ) denotes “OR”, and the ampersand (&)denotes “AND”.9.Click OK. The new variable appears in the data view. In the next step, you’lluse two variables to calculate a third.10.From the menu, select Transform Compute.11.In the Target Variable field, type:netmonth12.Click Type & Label.13.In the Label field, type:Monthly net14.Click Continue.15.Select all the text in the Numeric Expression field and delete it.16.From the list of variables, select Average Monthly Salary and move it to theNumeric Expression field.17.Click after salmonth in the Numeric Expression field.18.Using the keypad in the Compute Variable window, click “-”.19.From the list of variables, select the new variable Monthly Withholding.20.Click the right arrow to move it to the Numeric Expression pane. Your ComputeVariable window should now look like Figure 7.SPSS Step-by-Step17

Data TransformationsFIGURE 7.21.Complete Compute Variable window for monthly netClick OK. The new variable appears in the data view.Conditional expressionsIn some cases, you might want to look at only a specific subset of your data. Sayyou want to send a monthly newsletter to only female clerical staff. To identifythese staff, you’ll calculate a new binary variable (one that has only two values)using the IF statement to set the condition.1.From the menu, select Transform Compute.2.In the Target Variable field type:femclerk3.Click Type & Label.4.In the Label field type:Female Clerical5.Click Continue.6.Select all the text in the Numeric Expression field and delete it.7.In the Numeric Expression field type:18.18Click If to open the Compute Variable: If Cases window (Figure 8).SPSS Step-by-Step

Data TransformationsFIGURE 8.Compute Variable: If Cases window9.Select Include if case satistifes condition.10.Double-click Gender to move it to conditions field.11.Click after Gender in the conditions field and type: “f”Note: Whenever you create a condition, you must use the actual values in the variable, not their labels. Thus, setting a condition to gender “Female” wouldnot select any cases.12.Click after “f” and type a space.13.Using the keypad in the Compute Variable window, click &. You use theampersand to add a second condition.14.From the field list, double-click Employment category to move it to the calculation pane.15.In the calculation pane, type: 1Note that you don’t use quotation marks this time because is a numeric variable.16.Click Continue.SPSS Step-by-Step19

Data Transformations17.Click OK. The new variable appears in the Data window. Scroll through therecords to see how the values in the new variable. Notice that cases where gender is not female and job category is not manager have only a period, indicatinga missing value. Only those cases where gender is female and jobcat is managercontain a 1 in the new variable.Creating subsetsIn some instances, you might want to use only part of the file in an analysis. Forexample, you might want to look at changes in income among single workingmothers. Or you might want to consider only staff born before a specific date.To select a subset of the cases in your file,1.From the menu, select Data Select Cases (Figure 9).FIGURE 9.20Select cases window2.Select If condition is satisfied by clicking its radio button.3.Click If. Notice that the Select Cases: If window looks exactly like the If window you used in the earlier compute procedures.4.From the variable list, double-click Date of Birth.5.Click the cursor anywhere after bdate in the calculation pane.SPSS Step-by-Step

Data Transformations6.Type (or select from the keypad): 7.Scroll through the Function menu and double-clickDATE.MDY(month,day,year). (Figure 10)FIGURE 10.Selecting a functionscroll here toselect afunctionIn the next step, you’ll set the date criterion. SPSS adds the function to the calculation pane, substituting question marks to indicate that you need to specifythe values.8.Select the first question mark and type:19.Select the second question mark and type:110.Select the third question mark and type:1940Your completed window should look like Figure 11.SPSS Step-by-Step21

Data TransformationsFIGURE 11.22Completed Select Cases: If window11.Click Continue.12.Click OK. Notice that many of the records are marked with a diagonal linethrough the record number. These cases are excluded from any further calculations until you specifically include them again.13.To see the effect of the subset selection, right click the heading for bdate.14.From the pop-up menu, select Sort Ascending. Notice that all employees bornbefore 1940 are selected, except for the person with the missing date of birth. Inthe next step, you’ll instruct SPSS to include all cases until otherwise instructed.15.From the menu select Data Select Cases.16.Select All Cases by clicking its radio button (Figure 12).SPSS Step-by-Step

Deeper into crosstabsFIGURE 12.Selecting all casesclick here toinclude allcases17.Click OK. The diagonal lines appear to be gone, but to be sure, right-click theheading of bdate again and select Sort Descending, so that the youngestemployees are listed first. Notice that they are no longer excluded.Deeper into crosstabsCrosstab StatisticsWhen you the various statistical techniques, SPSS will frequently tell you whatkind of statistical tests are available for that procedure. For example, if you ask forcrosstabs, SPSS offers a number of statistics based on the type of data you’re using.(Figure 13)SPSS Step-by-Step23

Deeper into crosstabsFIGURE 13.Statistics indicated by data type in Crosstabs: Statistics windowdata typeavailable testsFor example, when you select a statistic like Chi-square, SPSS indiates the particular Chi-square technique that should be used based on the type of data. If you areusing nominal or ordinal data, SPSS provides a number of methods you can includein your output. To see how SPSS makes the selection of techniques and to see thedescription of the data types and corresponding statistics, try the following.1.From the menu, select Help Topics.2.Click the Index tab and enter:3.From the list below your entry, select Statistics. SPSS Help displays a description of the types of statistics to select based on the type of data you’re using.4.Close the Help window.crosstabsCrosstab cellsUsing the Crosstab cell display window (Figure 14) you can determine what datawill be displayed in each cell of the crosstab.24SPSS Step-by-Step

Deeper into crosstabsFIGURE 14.Crosstab cell displayTry it:1.From the menu, select Analyze Descriptive Statistics Crosstabs.2.Move Gender to the Rows list.3.Move Employment Category to the columns list.4.Click Cells.5.Under Percentages, select Row, Column, and Total.6.Click Continue.7.Click OK. SPSS displays the completed crosstab in the output window.Gender * Employment Category CrosstabulationGenderFemaleMaleTotalSPSS Step-by-StepCountRow %Column %Total %CountRow %Column %Total %CountRow %Column %Total %Employment 5.6%258100.0%54.4%54.4%474100.0%100.0%100.0%25

Deeper into crosstabsNotice that each cell contains the count, the percent of the row, the percent ofthe column, and the percent of the total. Crosstabs like this are useful for both ageneral overview and closer study of your data. For publication, however, youmay want to simplify the output by including only a column or row percentage,depending on the issue you’re addressing.Suppose you only want to know the distribution of job categories within gender.In the next task, you’ll create a cross-tab that includes only row percentages.8.From the menu, select Analyze Descriptive Statistics Crosstabs.9.Move Gender to the Rows pane.10.Move Employment to the Columns pane.11.Click Cells.12.Under percentages, make sure only Row is selected. Clear any others that arealready selected.13.Click Continue.14.Click OK. Your new output will look like Figure 15. If you wanted to know percentages of each job category across gender, you would select only column percentages.FIGURE 15.Crosstab showing only row percentagesGender * Employment Category CrosstabulationGenderFemaleMaleTotalCountRow %CountRow %CountRow %Employment 6100.0%258100.0%474100.0%Adding layers to crosstabsSo far, we have worked with just a single variable for rows. You can make yourcrosstabs much more specific, however, by adding multiple row variables or byadding layers. In this exercise, you’ll create layered crosstabs that provide a moredetailed breakdown of the data.1.26From the menu, select Analyze Descriptive Statistics Crosstabs. Notice thatthe variables from the previous crosstab are still selected.SPSS Step-by-Step

When to include zeros in a mean2.From the variable list, move Minority Classification to the Layer pane.3.Click OK. Notice that you’re still getting row percentages only because we didnot reset the cells information. Notice also that the Layers variable becomes theuppermost level, followed by gender. In the next task, you’ll reverse that order.4.From the menu, select Analyze Descriptive Statistics Crosstabs. Notice thatthe variables from the previous crosstab are still selected.5.In the Rows pane, double-click Gender to move it back to the variable list.6.In the Layers pane, double-click Minority Classification to move it back to thevariable list.7.Move Minority Classification to the Rows pane.8.Move Gender to the Layers pane.9.Click Cells.10.Under percentages, select Row, Column, and Total.11.Click Continue.12.Click OK. In the new output, the crosstab displays Minority Classificationwithin Gender.You can continue to add layers and multiple rows to your crosstabs. RememberRobin’s rule, however: ALWAYS GET THE UNIVERSE FIRST. That is, print outthe most general crosstabs before getting into the detail.When to include zeros in a meanWe were once working on a project looking into some of the economic issues ofchild care, and one of the researchers asked if we should include in the calculationof the average price the number of people who didn’t use child care. The answer is:it depends. If you want to look at, say, an average price paid, then, no, you wouldnot include people who didn’t buy the product. If, on the other hand, you want toknow the per capita cost, then, yes, you would include everyone.The question involves the issue of whether to include zero values in calculating statistics such as means or standard deviations or conducting statistical tests such as ttests and ANOVAs. And to some degree the answer lies in the question itself. Ifyou say, “How much did people pay for child care?” (or gasoline or televisions orclothes) then you want to look at the actual purchase price among those who actually purchased the product. If, for example, I tell you that the average per capitacost of gasoline is thirty cents a gallon, that’s not going to tell you what to expectSPSS Step-by-Step27

Gender, geography, and exercise: the universal variablesthe next time you drive up to the pump. What you really want to know is, what’sthe average price today in this particular area. If, on the other hand, you’re workingfor the Council of Economic Advisors and you want to know how the cost of gasoline factors into the overall expenditures of an average family, you do want to use aper capita cost.The other question to consider is whether zero is a valid value in your data set. Inmedical research, for example, particularly in dose-response research or lab values,zero is obviously a valid value. In other cases, such as a five-point Likert scalebeginning with 1, zero is not a valid value and should be treated as missing on anerror. In other words, the decision about whether to include zero in a particular testdepends on whether it’s a valid value and on the particular question you’re asking.Gender, geography, and exercise: the universalvariablesThere are certain variables that will affect nearly any statistic or test you use. Ourparticular favorites are gender, geography, and exercise. These are variables whoseeffect is so pervasive that failing to take them into account can seriously affect thevalidity of your research. There are others, of course, that will affect whatever datayou work with to varying degrees. Age, of course, is certainly one, along with dietand ethnicity. If you’re conducting medical research, for example, you must alwaystake into account age, gender, and ethnicity. More and more, however, the level ofexercise is being included as a concomitant variable‘. We once had a research psychologist tell us that “Exercise is implicated in every variable we look at.” In socialscience research, age, gender, ethnicity, and education are critical factors. Themoral: when you are designing a research project, make sure you have accountedfor all the variables that might affect the outcomes, not just the ones of immediateinterest.SummaryIf you think we have spent an awful lot of time dealing with data management andcrosstabs, you’re right. You’ll spend about eighty percent (a very rough guess) ofyour time on these two tasks. Then, finally, when you have the data exactly the wayyou need it, you run a couple of quick statistics and then . . . you start all over again.Remember that data analysis is iterative, so get ready now to do the same types oftasks over and over and over.28SPSS Step-by-Step

SummaryAnd over.SPSS Step-by-Step29

Summary30SPSS Step-by-Step

2Statistical proceduresIntroductionThis tutorial is not a replacement for a course in basic statistics or for a textbook onthat subject. The primary emphasis here is on the use of SPSS to explore data and toanswer some of the statistical questions you might ask about your data.In this chapter we will review some of the SPSS procedures f

SPSS Step-by-Step 7 SPSS Tutorial and Help 10. Click Show Me. SPSS now opens the tutorial to the chi-square topic in the form of an Internet page. 11. Click Next. In addition to an example of how to use a chi-square test, the win-dow also identifies the sample data file you can us