The Predictive Validity Of The ABFM’s In-Training Examination

Transcription

ORIGINALARTICLESThe Predictive Validity of theABFM’s In-Training ExaminationThomas R. O’Neill, PhD; Zijia Li, MS; Michael R. Peabody, PhD; Melanie Lybarger, MS;Kenneth Royal, PhD; James C. Puffer, MDBACKGROUND AND OBJECTIVES: Our objective was to examinethe predictive validity of the American Board of Family Medicine’s(ABFM) In-Training Examination (ITE) with regard to predicting outcomes on the ABFM certification examination.METHODS: This study used a repeated measures design acrossthree levels of medical training (PGY1–PGY2, PGY2–PGY3, andPGY3–initial certification) with three different cohorts (2010–2011,2011–2012, and 2012–2013) to examine: (1) how well the residents’ ITE scores correlated with their test scores in the followingyear, (2) what the typical score increase was across training years,and (3) what was the sensitivity, specificity, positive predictive value, and negative predictive value of the PGY3 scores with regardto predicting future results on the MC-FP Examination.RESULTS: ITE scores generally correlate at about .7 with the following year’s ITE or with the following year’s certification examination. The mean growth from PGY1 to PGY2 was 52 points, fromPGY2 to PGY3 was 34 points, and from PGY3 to initial certification was 27 points. The sensitivity, specificity, positive predictivevalue, and negative predictive value were .91, .47, .96, and .27,respectively.CONCLUSION: The ITE is a useful predictor of future ITE and initial certification examination performance.(Fam Med 2015;47(5):349-56.)The American Board of FamilyMedicine (ABFM) offers residents enrolled in Accreditation Council for Graduate MedicalEducation (ACGME)-accredited residency programs the opportunity totake the ABFM In-Training Examination (ITE). The purposes of theABFM’s ITE are (1) to provide eachresident with a low-cost, low-stakesopportunity to become familiar withthe general format and item writingstyle that will be used on the Maintenance of Certification for FamilyFAMILY MEDICINEPhysicians (MC-FP) Examinationand (2) to provide each resident andhis or her program director with anopportunity to assess how well theresident is progressing toward eventually passing the MC-FP Examination. Given these purposes, it isimportant that (1) the ITE contentbe similar to that of the MC-FP Examination, (2) the ITE results are agood approximation of how a resident would perform on the MC-FPExamination at that point in time,and (3) the ITE be predictive of anexaminee’s future results on theMC-FP Examination. The ABFM’sITE is designed as a low-stakesexamination and, accordingly, theABFM advises program directorsthat the results should not be usedto make important decisions relatedto the promotion or advancement ofthe residents taking the exam.The ABFM asserts that the ITE isa good predictor of a resident’s performance on the MC-FP examination. The purpose of this study is todescribe the extent to which ITE results can be used to predict futureexamination performance, either ITEor MC-FP, and how confident canone be in those predictions. It alsoexamines the average score growthacross each year of residency as afactor in that prediction.BackgroundThe ABFM’s ITE was specificallydesigned to have a high degree ofconcurrent and predictive validity,both of which are forms of criterion-related validity. The importantcriterion in both of these cases isperformance on the MC-FP Examination. The concurrent validity claimfor the ABFM’s ITE is that it is intended to produce scores that wouldbe predictions of how an examineewould perform on the MC-FP Examination if he or she had taken itFrom the American Board of Family Medicine,Lexington, KY (Dr O’Neill, Ms Li, Dr Peabody,Ms Lybarger, and Dr Puffer) and NorthCarolina State University (Dr Royal).VOL. 47, NO. 5 MAY 2015349

ORIGINAL ARTICLESinstead of the ITE at that point intime. To achieve this, each form ofthe ITE is built to the same specifications as the core questions portion of the MC-FP Examination, andITE scores are equated onto the MCFP scale. Concurrent validity withthe MC-FP Examination is partially established through the regularquality checks that ensure the ITEis developed with the correct contentspecifications and that the equatingwas successful. Because the concurrent validity seems quite high, theABFM has never conducted an experiment in which both tests wereadministered to examinees on consecutive days; however, the ABFMdoes administer the MC-FP Examination a few months after PGY3 residents take the ITE, which is howthe ABFM usually establishes thepredictive validity of the ITE.1In general, the literature onhow well ITEs predict success onthe corresponding certification examination is positive; however, themethods used and the level of detailwith which the results were reported varied noticeably across studies.A number of studies examining thepredictive power of ITEs with respectto the outcome on their corresponding certification examinations havebeen performed by numerous American Board of Medical Specialties(ABMS) member boards, includingthe American Board of Neurological Surgery,2,3 the American Boardof Surgery,4-8 the American Board ofInternal Medicine,9-12 the AmericanBoard of Psychiatry and Neurology,13-15 the American Board of Radiology,16,17 the American Board ofPediatrics,18,19 the American Boardof Obstetrics and Gynecology,20,21 theAmerican Board of Anesthesiology,22the American Board of OrthopedicSurgery,23 and the American Boardof Pathology.24The literature related specifically to predicting the success on theABFM’s certification examination israther sparse. In 1990, Leigh et al1used a repeated measure data collection design with regression on anational sample of ABFM ITE scores350MAY 2015 VOL. 47, NO. 5and ABFM certification examination scores to demonstrate that theITE was a reasonably good predictor of performance on the certification examination. The correlationsbetween the ITE and the certification examination ranged from .69to .75. In 2004, Replogle and Johnson25 used a Monte Carlo study tolook at the positive predictive value(PPV) of the ABFM ITE with regardto predicting successful performanceon the ABFM certification examination. They concluded that the overall ITE score had a sufficiently highPPV to use it as part of a comprehensive resident evaluation system;however, the PPV for the subtestswas too low to warrant their use asperformance indicators.MethodsParticipantsThe ABFM’s ITE is administeredto nearly all family medicine residents in ACGME-accredited programs. Each year, approximately10,000 residents from roughly 450residency programs take the ITE.The number of residents in eachyear of residency is fairly evenlydistributed.26 The number of participants reported for the differentcomparisons in this study is slightlylower because the inclusion criteriarequired that each physician have atest score from consecutive test administrations. To illustrate, if a physician only had test scores for PGY1,PGY3, and the MC-FP Examination,then only the PGY3 to MC-FP comparison would be included becausethe PGY1 to PGY2 and the PGY2to PGY3 comparison would not beavailable.Of the possible 10,377 pairs going from PGY1 to PGY2, there were9,630 matches (93%). Of the possible 9,921 pairs going from PGY2to PGY3, there were 9,379 matches(95%). Of the 9,523 pairs going fromPGY3 to the next administration ofthe MC-FP Examination, there were6,152 matches (65%). It is importantto note that some PGY3 residents donot take the next available MC-FPExamination, probably for a varietyof reasons.InstrumentationThe ABFM’s MC-FP Examinationmeasures physicians’ clinical decision-making ability as it relates tofamily medicine. Passing this examination is one of the requirementsfor ABFM certification. The exam isadministered in examination windows during the months of Apriland November of each year. Theexam consists of a common core of260 multiple choice questions plustwo examinee-selected modules of45 questions each from a menu ofeight modules. These 350 items arescored as right or wrong, and theraw scores are converted to scaledscores that range from 200–800. TheMC-FP Examination is scored using the dichotomous Rasch27 model.In conjunction with a common itemequating design, this model is alsoused to equate examinations acrosstest forms and years of administration. The use of a common scale witha passing standard that is held constant for useful periods of time hasthe advantage of providing a morestable target for making predictionsrelated to whether a particular candidate will pass or fail. During thetimeframe from which the datawas gathered, the minimum passing score was 390. The process usedto develop the content specificationsfor this examination is described ingreater detail by Norris et al.28The ABFM’s ITE contains 240multiple-choice items and is builtto the same specifications as thecore element (non-module portion)of the MC-FP Examination. Eachyear, there is a different form of theITE with no items in common fromthe previous form. In order to equatethe ITE across administrations andto make the ITE score represent theexaminee’s predicted performanceon the MC-FP Examination, theABFM includes a small number ofITE questions as unscored pretestquestions on the MC-FP Examination, which are calibrated onto theMC-FP scale. These questions andFAMILY MEDICINE

ORIGINAL ARTICLEStheir associated calibrations on theMC-FP scale are used to connecteach administration of the ITE tothe continuously maintained MCFP scale. Because the ITE has beenequated onto the MC-FP scale andbuilt to similar specifications, ITEscores should be highly correlatedwith the MC-FP scores examineeswould have earned had they takenit instead of the ITE.ProceduresThis study used a repeated measuresdesign across three levels of medical training (PGY1 to PGY2, PGY2to PGY3, and PGY3 to initial certification) with three different cohorts (2010–2011, 2011–2012, and2012–2013) to examine: (1) how wellthe residents’ ITE scores on PGY1,PGY2, and PGY3 are correlated withtheir test scores in the following year(PGY 2, PGY3, and MC-FP, respectively), (2) what the typical scoreincrease was from PGY1 to PGY2,PGY2 to PGY3, and PGY3 to initialcertification, and (3) what was thesensitivity, specificity,12,29 positivepredictive value (PPV) and negativepredictive value (NPV) of the ITEwith regard to predicting results onthe MCFP Examination. This studywas deemed exempt by the American Academy of Family PhysiciansInstitutional Review Board.ResultsAcross years of training, the ITEcorrelated at .69 for PGY1 to PGY2,.70 for PGY2 to PGY3, and .71 forPGY3 to MC-FP (Table 1, Figure 1).These correlations were all positiveand statistically significant. The correlations were very similar across cohorts and years of medical training.After disattenuating for the unreliability of the examination, the correlations ranged from .81 to .85.With regard to resident performance over time, the results indicatethat exam scores tend to increasewith each successive year of residency; however, the average increasewas smaller in each successive year.The average increase from PGY1 toPGY2 was the largest at 52 points,followed by PGY2 to PGY3 with 34points, and finally 27 points fromPGY3 to MC-FP (Table 1, Figure 2).Using a minimum passing score(MPS) of 390 for both the ITE(PGY3s only) and the MC-FP Examination, the sensitivity, specificity, PPV, and NPV were computed(Table 2). The sensitivity, the proportion of actual MC-FP passers whowere also predicted to pass was .91.The specificity, the proportion of actual MC-FP failers who were alsopredicted to fail was .47. The PPV,the proportion of people who werepredicted to pass the MC-FP Examination based on their ITE score andactually passed was .96. The NPV,the proportion of people who werepredicted to fail the MC-FP Examination based on their ITE score andactually failed was .27. Additionally,Figure 3 shows the trade-off betweenTable 1: Summary Statistics of ComparisonsGains by YearCorrelationsnMeanSDMinMaxSEPearson 58-3202900.7.71**n/aPGY1 to PGY2PGY2 to PGY3PGY3 to MC-FP* P .05, ** P .01Note: Disattenuated correlations could not be calculated for the overall results because the disattenuation process removes the degree of unreliabilityfrom the pair of test forms. The degree of unreliability of each test could not be easily combined.FAMILY MEDICINEVOL. 47, NO. 5 MAY 2015351

ORIGINAL ARTICLESFigure 1: Scatterplot of ITE Performance and Subsequent Exam Performance by Year of Training and Cohortsensitivity and specificity using different ITE prediction thresholds.DiscussionCorrelation of Exam ScoresThe correlation of ITE scores withITE scores 1 year later or withMC-FP scores 6 months later istypically about 0.7 (Table 1). This352MAY 2015 VOL. 47, NO. 5indicates that ITE scores can be usedas reasonably good predictors of future performance on the ITE andMC-FP Examinations. This correlation is the appropriate correlationfor making predictions because itincludes both differences in the dimensionality across test forms andthe degree of unreliability associatedwith each test form. The Rasch reliability of the ITE typically runs approximately .81 to .83. The Raschreliability of the MC-FP Examination is typically .92 or .93. To assessthe extent to which two test formsare measuring the same dimension,the correlation must be disattenuated for the degree of unreliabilityFAMILY MEDICINE

ORIGINAL ARTICLESFigure 2: Histogram of Score Increase Across Exam Administrations by Year of Training and Cohortassociated with both t

the predictive validity of the American Board of Family Medicine’s (ABFM) In-Training Examination (ITE) with regard to predicting out-comes on the ABFM certification examination. METHODS: This study used a repeated measures design across three levels of medical training (PGY1–PGY2, PGY2–PGY3, and PGY3–initial certification) with three different cohorts (2010–2011, 2011–2012, and .