SPSS: Expected Frequencies, Chi-squared Test. In-depth .

Transcription

SPSS: Expected frequencies, chi-squared test.In-depth example: Age groups and radio choices.Dealing with small frequencies.Quick Example: Handedness and Careers

Last time we tested whether one nominal variable wasindependent of another.We did this by looking at the cross tabs and seeing how far theobserved frequencies were from the frequencies we wouldexpect if the two variables were independent.

For nominal variables that only had 2 possible responses each(yes/no, male/female, insane/sane), we could use the oddsratio.When one or both of the variables has more than 2 responsesodds ratio is no longer useful, so we use the chi-squared testinstead.The tradeoff: Odds ratio can be used for one-tailed tests, chisquared can’t. Chi-squared can handle any number of rowsand columns.

Get chi-squared is also heavy in math, so in the real world,SPSS and other software can handle most of it for us.Most important things to know:- How to get the expected frequency from a particular cell.- Chi-squared is a measure of how far the observedfrequencies are from the expected frequencies.- Large chi-squared values mean large deviations from theexpected frequencies.- The df for chi-squared is (rows – 1) x (columns – 1)

SPSS: Expected frequenciesStart with a crosstab.Analyze Descriptive Stats Crosstabs

In the pop-up, choose your row and column variables and clickthe cells button in the upper right of the pop-up.

The cells button brings up the menu of what you want the cellsto show. Uncheck observed and check expected.

Then click Continue, then OK. This will produce a crosstab ofthe expected values.If father figure type and parenting style were independent,there would be 9.4 moderate style stepfathers in our sampleon average.

Leaving Observed checked and leaving Expected uncheckedproduces the observed values.In our sample, we found 10 moderate style stepfathers.Very near the 9.4 in the independent ideal.

Checking both observed and expected produces a table thathas both the observed and expected values in the same table.It allows cell-to-cell comparison but it’s more cluttered.

The null hypothesis of independence fits the moderates andstepfathers.But live-in partners appear to be more permissive and lessauthoritarian than other types of father figure.

Note the vague language about the trends in the data. That’sbecause we can’t say whether these trends are significant ornot.We don’t have the tools to say anything definitive aboutspecific categories.

Bats: Observing frequencies you wouldn’t expect.

SPSS: Full crosstab analysis. Consider the following data on asample of people’s ages and radio preference.We want to know if a person’s radio preference depends onwhat generation they belong to.We have the data from 72 people in total in three nominalcategories of radio choice and three ordinal categories of age.

Should we do an odds ratio or a chi-squared?

Should we do an odds ratio or a chi-squared?Chi-squared.Because we have 3x3 table.Odds Ratio only works for 2x2 tables.

SPSS: Chi-Squared is also in the crosstabs section.Analyze Descriptive Statistics Crosstabs.Click on the Statistics button.

Put a check next to Chi-Square in the upper left.It doesn’t matter if Risk is checked or unchecked.Then click Continue, then OK.

Checking Chi-Squared produces the following table.We want the Pearson Chi-Square. (yeah, Pearson is a big deal)2χ 10.268df 4.We could have got this from(rows – 1) x (cols. – 1) 2 x 2 4.

We also know that the p-value .036.So if we were testing for independence at alpha 0.05, wewould reject the null hypothesis of independence.For interest: Asymp. Sig. stands for Asymptotic Significance.Asymptotic in statistics means “As n infinity.”

The Chi-Square test also tells us of potential problems.The test assumes there is a large number of respondents ineach cell. The standard rule is that every cell should have afrequency of at least 5.

Having small cells (cells with less than 5 respondents) makesthe p-value of the chi-squared test inaccurate.The more small cells there are, the worse the problem.There are ways to deal with cells with small n. The easiest oneis to find a logical way to group categories together.

Here, there are substantially fewer older adults than any othergroup.We could merge the middle age and older adult categoriesinto a “not young” category.Then we would have 2x3 cross tab with larger n values.

For a table of this size, it’s simple enough to do by hand.MusicNewsSportsYoung1447Middle Age10159Older Adult283The frequencies in the new categories are the frequencies inboth the old categories added together.MusicNewsSportsYoung1447Not Young10 2 1215 8 239 3 12

MusicNewsSportsYoung1447Not Young122312We still have one cell below 5, but that’s better than havingthree cells below 5. This won’t distort our answer by much.But if we do this by hand, then we can’t analyze the newdataset with SPSS.We need some way to make new variables from old ones.

This slide for interest: For 2x2 crosstabs, there is no way tomerge to improve the frequencies in cells, but we can use amodification to the chi-squared test called the Yates’Adjustment.The textbook talks about dealing with cells with fewrespondents in pages 326-331.Also, it’s technically the small expected frequencies that causetrouble, but the best indicator of these is small observedcounts.

We need some way to transform old variables into new ones.SPSS: Recoding variables.Goal: To take the three category variable Young/Middle/OldAnd make a two category variable Young/Not YoungTransform Recode into Different Variables.

Select the variable you want to change. In our case it’s age.Give the new variable a name in Output Variable: Name,Then click on Change.

Then, click on Old and New Values.This brings up the menu to define the old categories you havethe new categories you want.

In the new popup, check Output variables are strings firstThen enter the old category name in Old Value: ValueAnd enter the new category name in New Value: Value

Click Add and repeat the last slide for each category.“1Young Young”,“2MiddleAge NotYoung”, and“3OlderAdult NotYoung” are the recoding we’re doing.

Now we can a crosstab in SPSS with the variable with themerged category variable.( Analyze Descriptive Statistics Crosstabs )

We can look at the expected frequencies.(Crosstabs menu, Statistics button, Check “Expected”)Even though one cell has observed frequency less than 5, itsexpected frequency is more than 5, so the potential problem islessened.

We can also do the chi-squared test again and see if there’s aproblem or a change in the p-value.0/6 cells are too small instead of 3/9.We went from 4 df to 2 because we now have a 2x3 crosstab.(2 – 1) x (3 -1 ) 2.

Also, the most important part, the p-value, hasn’t changeddramatically. (In the 3x3 table it was .036)This implies that merging middle age and older didn’t changeanything major.We reject the null ; radio choice depends on age.

It’s easier to detect differences in larger groups, so we wouldexpect the p-value to go down a little, but not somethingdramatic like .001 or .000.If the p-value had increased much we would have lost theability to reject the null. (A bad merge can do this).

Pacing parrot asks: Do we time for another?

We took a survey of people in four career fields and found ifthey were left or right handed.These are the observed counts.

Most of the respondents are right handed except for in theathletics field, where a few more than half are left handed.We want to know if this difference is a fluke or if career andhandedness are somehow dependent.

We have a 2x4 crosstab, so we should use a chi-squared test.These are the results:Degrees of freedom 2χ There isevidence against independence.

We have a 2x4 crosstab, so we should use a chi-squared test.These are the results:Degrees of freedom 32χ 50.434There is very significant evidence against independence.

The chi-squared test has a very small p-value (less than .001).Do the results of this test tell us that there are more lefthanded people in athletics in general?

The chi-squared test has a very small p-value (less than .001).Do the results of this test tell us that there are more lefthanded people in athletics in general?No.Chi-squared only checks whether two variables areindependent, not specific trends within them.

By comparing the expected and observed counts, we can seethat the athletic field is much different from the others.We can use this information to guide a next step even if we’renot getting definite answers from just the expected counts.

We could try merging the other three fields into “non-athletic”and “athletic”, as long as those three fields together fairlyrepresented everything non-athletic.

In that case, the odds ratio shows that someone in the athleticfield has 7.371 times the odds of being left handed assomeone in a non-athletic profession.The confidence interval shows that this odds ratio issignificantly more than 1 at the alpha 0.025 level.

Next time: More on cross tabs.If time permits: Intro to Analysis of Variance. (Ch. 8)

SPSS and other software can handle most of it for us. Most important things to know: - How to get the expected frequency from a particular cell. - Chi-squared is a measure of how far the observed frequencies are from the expected frequencies. - Large chi-squared va