More Accurate Racial And Ethnic Codes For Medicare .


More Accurate Racial and Ethnic Codes for MedicareAdministrative DataCelia Eicheldinger, M.S. and Arthur Bonito, Ph.D.Analyses of health care disparities inMedicare using administrative race andethnicity data have typically been limitedto Black and White beneficiaries. This is inpart due to the small size of the other catego ries, inaccuracies in the race and ethnicitycodes, and caveats that more extensive anal yses would produce biased results. While pre vious Medicare efforts certainly improvedthe accuracy of race and ethnicity coding,we have developed an imputation algorithmthat dramatically improves the accuracy ofcoding for Hispanic and Asian or PacificIslander beneficiaries. When compared withself-reported race and ethnicity, sensitiv ity increased from 29.5 to 76.6 percent forHispanic and from 54.7 to 79.2 percent forAsian and Pacific Islander beneficiaries,with no loss of specificity, and Kappa coeffi cients reaching 0.80. As a result, 2,245,792beneficiaries were recoded to Hispanic and336,363 to Asian or Pacific Islander.introduCtionMedicare administrative data should bean ideal resource to examine the extent ofracial and ethnic disparities in the program.However, small population size and recog nized inaccuracies in the coding of race/ethnicity in the Medicare enrollment data base (EDB) have led health policy analyststo be wary of making comparisons thatgo beyond White and Black beneficiaries.The authors are with RTI International. The research in thisarticle was supported by the Centers for Medicare & MedicaidServices (CMS) under Contract Number 500-00-0024 (TO8).The statements expressed in this article are those of the authorsand do not necessarily reflect the views or policies of RTIInternational, or CMS.Some have advised against the analysis ofdata for Hispanic, Asian/Pacific Islander,and American Indian/Alaska Native benefi ciaries because of potential bias in analyseswhen large proportions of these relativelysmall racial/ethnic groups are not cor rectly identified, and they differ in impor tant ways from those who are (Lauderdaleand Goldberg, 1996; Arday et al., 2000).Historically, the Medicare Program hasreceived its race/ethnicity code for benefi ciaries from the Social Security Adminis tration’s (SSA’s) master beneficiary record(MBR). From 1935 to 1980, the SocialSecurity application form (SS-5) incorpo rated into the MBR only allowed classifi cation of an applicant’s race into White,Black, or Other. “Unknown” was used toclassify persons who did not report anyrace. In 1980, the number of race/ethnic ity categories on the form was expanded tosix responding to Office of Managementand Budget (OMB) Directive 15: (1) White(non-Hispanic); (2) Black (non-Hispanic);(3) Hispanic; (4) Asian, Asian American,or Pacific Islander; (5) American Indian orAlaska Native; and (6) Unknown. In 1989,SSA began to enroll new participants atbirth, extracting data from birth certifi cates rather than requiring applicants tofile Form SS-5; however, the race/ethnicityinformation on the birth certificate was notincluded in the data extraction because itwas considered unnecessary for adminis tration of the SSA program. Since 1989, theonly persons filing an SS-5 form have beenthose requesting a new number or a namechange (Scott, 1999).HealtH Care FinanCing review/Spring 2008/Volume 29, Number 327

In 1994, race data from the SS-5 formswith the expanded race/ethnicity codeswere integrated into the EDB directly tocorrect erroneous and missing codes.This changed the race/ethnicity codingfor more than 2.5 million beneficiaries(Lauderdale and Goldberg, 1996). Thisupdate using the SS-5 form was repeatedin 1997 and 2000, and is now conductedannually. The Medicare Program has alsoworked with the Indian Health Service toimprove the coding of American Indians/Alaska Natives.In 1997, to correct miscoded data andreduce the amount of missing race/ethnic ity information, the Health Care FinancingAdministration (now CMS) conducted apostcard survey of nearly 2.2 million bene ficiaries. The survey included beneficiarieswith Hispanic surnames or Hispanic coun tries of birth and beneficiaries coded as“Other” or “Missing” race/ethnicity data.The survey resulted in changes for approx imately 858,000 beneficiaries (Arday etal., 2000). These efforts clearly improvedthe EDB’s race/ethnicity data. None theless, comparisons of the EDB race/ethnicity codes with self-reported race/ethnicity data from the Medicare CurrentBeneficiary Survey (MCBS) indicated thatidentification of Hispanics, Asians/PacificIslanders, and American Indians/AlaskaNatives was still quite incomplete andmight result in biased analyses (Arday etal., 2000). An analysis comparing the distri bution of race/ethnicity for Medicare ben eficiaries age 65 or over in the EDB to thatof U.S. Census estimates of similar agedpersons produced similar results (Eggersand Greenberg, 2000). A recent analysiscomparing EDB to MCBS race/ethnicitycodes continues to find large proportionsof these same groups to be misclassifiedin the EDB (Waldo, 2004-2005).MetHodSThis work was conducted to identifyhealth care disparities among Medicarebeneficiaries, including Hispanics andAsians/Pacific Islanders. We first assessedthe accuracy of the race/ethnicity codingon the EDB, then developed and validatedan imputation algorithm to improve theaccuracy of the EDB race/ethnicity code,applying it to the EDB.dataWe conducted multiple analyses in theprocess of assessing and improving therace/ethnicity coding on the EDB. Thedata we used included: Separate Hispanic/Latino and Asian/Pacific Islander surname lists from the1990 and 2000 U.S. Census. Separate Hispanic/Latino and Asian/Pacific Islander first-name lists compiledfrom multiple Web sites. Self-reported race/ethnicity of 830,728Medicare beneficiary respondents fromthree different Consumer Assessmentof Health Care Providers Survey(CAHPS ) conducted from 2000 to 2002,including: Medicare fee-for-service,Medicare managed care enrollee, andMedicare managed care disenrollee.We henceforth refer to these as theCAHPS data. The self-reported race/ethnicity codes from these data are theSELFRACE variable and constitute thegold standard. Several variables found on the MedicareEDB, including: Race/ethnicity1, hence forth referred to as EDBRACE, has eightvalues and allows beneficiaries only onevalue each. The eight values are: (1) 0 Unknown, (2) 1 White (non-Hispanic),(3) 2 Black (non-Hispanic), (4) 3 The definitions of the values we have listed for EDBRACE arewhat we believe to have been intended by the codes.128HealtH Care FinanCing review/Spring 2008/Volume 29, Number 3

Other, (5) 4 Asian/Pacific Islander, (6)5 Hispanic/Latino, (7) 6 AmericanIndian/Alaska Native, and (8) Blank Temporary record. Other variables that identified language,source of beneficiaries’ race/ethnicitycode, and State from the beneficiary’smailing address.variable CreationPrior to making comparisons, we createda self-reported race variable, SELFRACE,from the following two CAHPS questionson race and ethnicity: Are you of Hispanic or Latino originor descent?— Yes, Hispanic or Latino— No, not Hispanic or Latino What is your race2? Please mark oneor more.— White— Black or African-American— Asian— Native Hawaiian or other PacificIslander— American Indian or Alaska NativeTo make meaningful comparisons, SEL FRACE was created with similar logic andthe same codes as EDBRACE. We did thefollowing to make SELFRACE comparablewith EDBRACE: If a CAHPS respondent reported beingHispanic/Latino, SELFRACE was set toHispanic/Latino. Otherwise, if a CAHPS respondentreported not being Hispanic/Latino(or the response was missing) andonly chose one race, SELFRACE wasset to the value of the race chosen. Forexample, if a respondent chose Asianor Native Hawaiian or other PacificIslander, SELFRACE was set to Asian/Pacific Islander.In 2000,included an option for beneficiaries to select“Other” as a race.2CAHPS If a CAHPS respondent reported notbeing Hispanic/Latino (or the responsewas missing) and reported more thanone race, SELFRACE was set to twoor more.3 If a respondent’s answer was missingfor both questions, SELFRACE was setto unknown. If the respondent reported not beingHispanic/Latino (or the answer wasmissing), and did not indicate a race,SELFRACE was set to unknown.We then compared SELFRACE with ED BRACE for all of the CAHPS respondents.Statistical MethodsUsing SELFRACE, we assessed ED BRACE using accuracy and agreement sta tistics (i.e., sensitivity, specificity, positivepredictive value, negative predictive value,and the Kappa coefficient). Table 1 showsthe association between EDBRACE andSELFRACE by measuring true positive(a)—EDBRACE and SELFRACE agreeon the beneficiary’s race/ethnicity, falsenegative (b)—EDBRACE disagrees withSELFRACE on what the beneficiary’srace/ethnicity is not, false positive (c) EDBRACE disagrees with SELFRACE onwhat the beneficiary’s race/ethnicity is,and true negative (d)—EDBRACE andSELFRACE agree on what the beneficiary’srace/ethnicity is not.Sensitivity represents how success ful EDBRACE was at correctly identify ing a beneficiary’s race/ethnicity and iscalculated as (a / [a b]) 100. Specificityindicates how often the EDBRACE vari able correctly identified persons who arenot in a given racial/ethnic group and iscalculated as (d / [c d]) 100. Positivepredictive value is calculated as (a / [a Since the EDB did not have an equivalent category, we didnot include the small number of beneficiaries coded this way inour analyses.3HealtH Care FinanCing review/Spring 2008/Volume 29, Number 329

c]) 100. Negative predictive value is cal culated as (d / [b d]) 100. (All calcula tions are derived from Table 1.)Although the goal is for both sensitiv ity and specificity to be high, there is atradeoff between them. A similar relation ship exists between positive and negativepredictive values. The goal is for both to behigh, but when we seek to improve one itis often at the expense of the other. We seta target of increasing sensitivity to 75 per cent, with negligible impact on specificity.Finally, we calculated the Kappa coef ficient (Cohen, 1960), widely used asa measure of inter-rater reliability, theKappa coefficient ranges from 1 (completeagreement), through 0 (no agreement),to –1 (complete disagreement). We seta goal of achieving a Kappa coefficient ofat least 0.81. Landis and Koch (1977) sug gested the following interpretations forthe Kappa coefficient:KappaStatistic 0.000.00 0.200.21 0.400.41 0.600.61 0.800.81 1.00Strength ofAgreementPoorSlightFairModerateSubstantialAlmost PerfectreSultSassessing the edBTable 2 illustrates the agreement be tween SELFRACE and EDBRACE, withrespect to the classification of beneficia ries as White or non-White and repeats thesame analysis for Black, Hispanic, Asian/Pacific Islander, and American Indian/Alaska Native beneficiaries.The table reveals some low levelsof accuracy and agreement betweenEDBRACE and SELFRACE in correctlyidentifying Hispanic, Asian/Pacific Island er, and American Indian/Alaska NativeMedicare beneficiaries. For example, thereare 43,927 self-reported Hispanics in theCAHPS data, but the EDB has correctlyclassified only 12,953. In other words, asreflected by the sensitivity statistic, theEDB captures only 29.5 percent of His panic beneficiaries. There is somewhat bet ter agreement for Asians/Pacific Islanders,with a sensitivity of 54.7 percent. But only35.7 percent of American Indians/AlaskaNatives are identified in the EDB. The sen sitivity of the EDB for correctly identifyingBlack and White beneficiaries is excellent.The EDB also does an excellent job of notmisclassifying non-Hispanic, non-Asian/Pacific Islander, non-Black, and non-Amer ican Indian/Alaska Native beneficiaries.This is shown by the specificities reaching98.8 percent or higher for these groups.Table 1Race/Ethnicity Agreement for a Given Beneficiary and Group According to Placement, byCAHPS and EDB CAHPS 1WhereRace/EthnicityMeasures Puts the BeneficiaryIn the GroupNot in the GroupWhere the EDB2 Race/EthnicityMeasures Puts the BeneficiaryIn the GroupNot in the Groupa True Positivec False Positiveb False Negative d True Negative 1 CAHPS 2 EDB(SELFRACE) is considered the gold standard.(EDBRACE) is considered the test measure.NOTES: CAHPS is Consumer Assessment of Health Plans Study. EDB is Medicare enrollment database.SOURCE: Eicheldinger, C. and Bonito, A., RTI International, 2007.30HealtH Care FinanCing review/Spring 2008/Volume 29, Number 3

Table 2 Accuracy and Agreement Between SELFRACE and EDBRACE Accuracy and Agreement Measures for EDBRACEReference GroupSELFRACEAssignmentEDBRACE AssignmentYesNoPositivePredictiveSensitivity 9.992.796.20.43Asian/Pacific 20.66American Indian/Alaska 96.70.01NOTES: EDBRACE is the unadjusted variable from the mid-July 2003 Medicare EDB for beneficiaries responding to the CAPHS fee-for-service,managed care enrollee, and disenrollee surveys for 2000-2002. SELFRACE is the variable for respondents from the CAHPS fee-for-service,managed care enrollee, and disenrollee surveys for 2000-2002.SOURCE: Eicheldinger, C. and Bonito, A., RTI International, 2007.However, the specificity is considerablylower for White beneficiaries, only 61.7percent indicating 60,794 of the 158,735non-White beneficiaries are mistakenlyidentified as White in the EDB. This sup ports the suggestion that many beneficia ries classified as White in the EDB actuallybelong in another category.The overall level of agreement, reflectedin the Kappa coefficients, is only moderatefor Hispanics, Asians/Pacific Islanders, andAmerican Indians/Alaska Natives—0.43,0.66, and 0.45, respectively. We speculatethat many Hispanic, Asian/Pacific Islander,and American Indian/Alaska Native ben eficiaries were coded as White because theappropriate categories were unavailableuntil relatively recently. While the Kappafor White beneficiaries is substantial (0.71),it is not as high as we would like, undoubt edly reflecting their rather low specificity.improving the Coding on the edBIn light of the low sensitivity for Hispan ics and Asians/Pacific Islanders in theEDB, we developed separate Hispanic andAsian/Pacific Islander imputation algo rithms. These algorithms used the follow ing pieces of EDB information: LANGPREF or the language a benefi ciary prefers CMS use when sendingthe Medicare Handbook. Allowed val ues are English, Spanish, and blank (nopreference specified). LANGCD or the language a beneficiaryhas requested SSA use when sendingbeneficiary notices. This variable isused by CMS for Medicare premiumbills. English, Spanish, and blank arethe allowed values. RACESRC or the source of a benefi ciary’s EDB race/ethnicity code. Threevalues are allowed:A Response from a one-time survey thatwas mailed to 2.2 million in 1997.B Data from the Indian Health Service.Blank Data from the SSA’s—MasterBeneficiary Record (SSA-MBR),SS-5 form (NUMIDENT), or Rail road Retirement Board (RRB).HealtH Care FinanCing review/Spring 2008/Volume 29, Number 331

The State in which a beneficiary residesso we could identify beneficiaries livingin Hawaii and Puerto Rico.At the core of the algorithm were His panic (Word and Perkins, 1996) and Asian/Pacific Islander (Falkenstein and Word,2002) surname lists developed at the U.S.Census Bureau. Associated with each nameon the list was the proportion of times ahousehold headed by a person with a par ticular surname was indeed a Hispanic(or Asian/Pacific Islander) household, asreported to the U.S. Census. In addition tothe surname lists we also included in thealgorithm a list of common Hispanic andAsian/Pacific Islander first names.We incorporated these pieces of informa tion into a SAS program that, through aniterative process, created two new variablesfor every beneficiary. The first, NEWHIS PANIC, identified each beneficiary asHispanic or not. The second, NEWAPI,identified each beneficiary as Asian/PacificIslander or not. The logic of the algorithmused to create NEWHISPANIC follows aswell as a description of how NEWAPI wascreated and how the two were combined tocreate NEWRACE.NEWHISPANIC was turned on if any ofthe following criteria were met: The beneficiary’s surname matched theHispanic surname list and the assignedpercentage from the list was at least 70percent. The EDB coded the beneficiary asHispanic. The person was living in Puerto Rico. The variable LANGCD indicated Spanish. The beneficiary’s first name hadHispanic origins, and the beneficiary’ssurname matched the Hispanic sur name list with the assigned percentageof at least 50 percent.NEWHISPANIC was turned off if any ofthe following criteria were met4: The beneficiary was not identified asHispanic in the previously mentionedsteps. LANGPREF indicated English. RACESRC indicated the race code camefrom the 1995 survey, and that race codewas not Hispanic. RACESRC indicated the beneficiary’srace code came from the Indian HealthService.Similar logic was used to set the value ofNEWAPI with the exception that the EDBvariables LANGCD and LANGPREF werenot used because they did not contain anAsian/Pacific Islander language indicator.Using the self-reported race/ethnic ity data from the CAHPS survey as thegold standard, we assessed the resultsof applying the algorithm to create theNEWHISPANIC and NEWAPI variablesfor the CAHPS respondents. We foundthe algorithms significantly improved theEDB race/ethnicity categorization ofHispanic and Asian/Pacific Islander bene ficiaries. Among Hispanic beneficiaries,sensitivity improved from 29.5 to 76.6 per cent, the Kappa coefficient rose from 0.43to 0.79, and the other measures (specificityand predictive values) remained virtuallyunchanged. The amount of improvementfor Asian/Pacific Islander beneficiarieswas not as dramatic but still impressive—sensitivity rose from 54.7 to 79.2 percent,Kappa increased from 0.66 to 0.80, andthe other measures were not materiallychanged. Analysis of the improvementsindicated that among both groups therewere somewhat more males correctlyidentified than females (possibly becauseof intermarriage and surname changes forethnic females), and more 65 to 74 yearThe last three criteria listed for identifying whether a benefi ciary was non-Hispanic had the effect of changing some benefi ciaries identified by the first half of the algorithm as Hispanicback to non-Hispanic.432HealtH Care FinanCing review/Spring 2008/Volume 29, Number 3

Table 3 Comparison of EDBRACE, NEWRACE, and SELFRACE (CAHPS ) Distributions of Race/Ethnicity Persons n/Pacific IslanderAmerican Indian/Alaska 43,34427,63680. EDBRACE is the unadjusted variable from the mid-July 2003 Medicare EDB for beneficiaries responding to the CAPHS fee-for-service,managed care enrollee, and disenrollee surveys for 2000-2002. SELFRACE is the variable for respondents from the CAHPS fee-for-service, managed care enrollee, and disenrollee surveys for 2000-2002. NEWRACE is the result of applying the race/ethnicity recoding algorithm to the MedicareEDB variable from mid-July 2003.SOURCE: Eicheldinger, C. and Bonito, A., RTI International, 2007.olds were correctly identified than thoseage 74 or over (probably because thereare more beneficiaries in the younger agegroup).Before merging the NEWHISPANICand NEWAPI variables together we usedt

Administrative Data Celia Eicheldinger, M.S. and Arthur Bonito, Ph.D. Analyses of health care disparities in ethnicity data have typically been limited to Black and White beneficiaries. This is in part due to the small s