Evaluating Teacher Effectiveness - Hawaii State Legislature

Transcription

Evaluating Teacher EffectivenessLaura Goe, Ph.D.Presentation to the Hawaii Department of EducationJuly 20, 2011 Honolulu, HI

Today’s presentation available online To download a copy of this presentation orlook at it on your iPad, smart phone orlaptop, go to www.lauragoe.com Go to Publications and Presentations page. Today’s presentation is at the bottom of thepage2

Laura Goe, Ph.D. Former teacher in rural & urban schools Special education (7th & 8th grade, Tunica, MS) Language arts (7th grade, Memphis, TN) Graduate of UC Berkeley’s Policy, Organizations,Measurement & Evaluation doctoral program Principal Investigator for the NationalComprehensive Center for Teacher Quality Research Scientist in the Performance ResearchGroup at ETS3

The National Comprehensive Centerfor Teacher Quality A federally-funded partnership whosemission is to help states carry out theteacher quality mandates of ESEA Vanderbilt University Learning Point Associates, an affiliate ofAmerican Institutes for Research Educational Testing Service4

The goal of teacher evaluationThe ultimate goal of allteacher evaluation should be TO IMPROVETEACHING ANDLEARNING5

Trends in teacher evaluation Policy is way ahead of the research in teacherevaluation measures and models Though we don’t yet know which model and combination ofmeasures will identify effective teachers, many states anddistricts are compelled to move forward at a rapid pace Inclusion of student achievement growth datarepresents a huge “culture shift” in evaluation Communication and teacher/administrator participation andbuy-in are crucial to ensure change The implementation challenges are enormous Few models exist for states and districts to adopt or adapt Many districts have limited capacity to implement comprehensivesystems, and states have limited resources to help them6

The focus on teacher effectiveness ischanging policy Impacting seniority and tenure rules New legislation is changing “Last hired, firstfired” policies in many states and cities,including Los Angeles, New York City,Washington, DC, Illinois, Florida, Colorado,Tennessee Impacting privacy and confidentiality Los Angeles has already published teachers’valued-added scores and New York City willlikely follow suit7

The stakes have changed Many of the current evaluation measures andmodels being used or considered have beenaround for years, but the consequences arechanging Austin’s student learning objectives model couldearn a teacher a monetary reward but could notget her fired Tennessee’s value-added results could beconsidered in teacher evaluation but poor TVAASresults did not necessarily lead to dismissal8

How did we get here? Value-added research shows that teachersvary greatly in their contributions to studentachievement (Rivkin, Hanushek, & Kain,2005). The Widget Effect report (Weisberg et al.,2009) “ examines our pervasive andlongstanding failure to recognize andrespond to variations in the effectiveness ofour teachers.” (from Executive Summary)9

Definitions in the research & policyworlds Anderson (1991) stated that “ aneffective teacher is one who quiteconsistently achieves goals which eitherdirectly or indirectly focus on the learning oftheir students” (p. 18).10

Race to the Top definition of effective& highly effective teacherEffective teacher: students achieve acceptable rates(e.g., at least one grade level in an academic year) ofstudent growth (as defined in this notice). States,LEAs, or schools must include multiple measures,provided that teacher effectiveness is evaluated, insignificant part, by student growth (as defined in thisnotice). Supplemental measures may include, forexample, multiple observation-based assessments ofteacher performance. (pg 7)Highly effective teacher students achieve high rates(e.g., one and one-half grade levels in an academicyear) of student growth (as defined in this notice).11

Measures and models: Definitions Measures are the instruments,assessments, protocols, rubrics, and toolsthat are used in determining teachereffectiveness Models are the state or district systems ofteacher evaluation including all of the inputsand decision points (measures, instruments,processes, training, and scoring, etc.) thatresult in determinations about individualteachers’ effectiveness12

Multiple measures of teachereffectiveness Evidence of growth in student learning andcompetency Standardized tests, pre/post tests in untested subjectsStudent performance (art, music, etc.)Curriculum-based tests given in a standardized mannerClassroom-based tests such as DIBELS Evidence of instructional quality Classroom observationsLesson plans, assignments, and student workStudent surveys such as Harvard’s TripodEvidence binder (next generation of portfolio) Evidence of professional responsibility Administrator/supervisor reports, parent surveys Teacher reflection and self-reports, records of contributions13

Using multiple measures Lots of questions about multiple measures What is the right combination of measures? How do we “weight” measures? Are student growth measures fair and valid formeasuring teacher performance? Need more thinking around how to createsystems that turn evidence from multiplemeasures into strategies for continuousimprovement14

Measures that help teachers grow Measures that motivate teachers to examine their ownpractice against specific standards Measures that allow teachers to participate in or co-constructthe evaluation (such as “evidence binders”) Measures that give teachers opportunities to discuss theresults with evaluators, administrators, colleagues, teacherlearning communities, mentors, coaches, etc. Measures that are directly and explicitly aligned with teachingstandards Measures that are aligned with professional developmentofferings Measures which include protocols and processes thatteachers can examine and comprehend15

Keep in mind All teachers want to beeffective, and supportingthem to be effective isperhaps the most powerfultalent management strategywe have16

Considerations Consider whether human resources and capacity aresufficient to ensure fidelity of implementation Poor implementation threatens validity of results Establish a plan to evaluate measures to determine ifthey can effectively differentiate among teacherperformance Need to identify potential “widget effects” in measures If measure is not differentiating among teachers, may befaulty training or poor implementation, not the measure itself Examine correlations among results from measures Evaluate processes and data each year and makeneeded adjustments17

Validity of classroom observations ishighly dependent on training Even with a terrific observation instrument, the results aremeaningless if observers are not trained to agree onevidence and scoring A teacher should get the same score no matter whoobserves him This requires that all observers be trained on theinstruments and processes Occasional “calibrating” should be done; more often ifthere are discrepancies or new observers Who the evaluators are matters less than that they areadequate trained and calibrated Teachers should also be trained on the observation formsand processes to improve validity of results18

Most popular growth models:Value-added and Colorado Growth Model EVAAS uses prior test scores to predict thenext score for a student Teachers’ value-added is the difference betweenactual and predicted scores for a set of students http://www.sas.com/govedu/edu/k12/evaas/index.html Colorado Growth model Betebenner 2008: Focus on “growth to proficiency” Measures students against “academic peers” www.nciea.org19

What nearly all state and districtmodels have in common Value-added or Colorado Growth Model willbe used for those teachers in tested gradesand subjects (4-8 ELA & Math in most states) States want to increase the number of testedsubjects and grades so that more teacherscan be evaluated with growth models States are generally at a loss when it comesto measuring teachers’ contribution to studentgrowth in non-tested subjects and grades20

Measuring teachers’ contributions to student learninggrowth: A summary of current modelsModelDescriptionStudent learningobjectivesTeachers assess students at beginning of year and setobjectives then assesses again at end of year; principalor designee works with teacher, determines successSubject & gradealike team models(“Ask a Teacher”)Teachers meet in grade-specific and/or subject-specificteams to consider and agree on appropriate measuresthat they will all use to determine their individualcontributions to student learning growthPre-and post-testsmodelIdentify or create pre- and post-tests for every gradeand subjectSchool-wide valueaddedTeachers in tested subjects & grades receive their ownvalue-added score; all other teachers get the schoolwide average21

SLOs “Ask a Teacher” (Hybridmodel) Concerns about SLOs are 1) rigor, 2)comparability, and 3) administrator burden A “rigor rubric” helps with first concern Combining SLOs with aspects of the “Ask ATeacher” model will help with all 3 concerns Teachers discuss and agree to use particularassessments and measures of student learninggrowth, ensuring great rigor and comparability Teachers work together on aspects of scoring whichimproves validity and comparability and lightens theadministrator burden22

What’s next for Hawaii?23

Next steps Ensure that evaluation systems allow you todifferentiate between effective and lesseffective teachers Focus on improving effectiveness of teachersyou already have Develop strategies for retaining effective andpotentially effective teachers Recruit effective teachers through multiple,coordinated strategies (not one time bonuses)24

Final thoughts The limitations: There are no perfect measures There are no perfect models Changing the culture of evaluation is hard work The opportunities: Evidence can be used to trigger support for strugglingteachers and acknowledge effective ones Multiple sources of evidence can provide powerfulinformation to improve teaching and learning Evidence is more valid than “judgment” and providesbetter information for teachers to improve practice25

Evaluation System ModelsAustin (Student learning objectives with pay-for-performance, group andindividual SLOs assess with comprehensive ives/compensation/slos.phtml DelawareModel (Teacher participation in identifying grade/subject measures whichthen must be approved by state)http://www.doe.k12.de.us/csa/dpasii/student growth/default.shtmlGeorgia CLASS Keys (Comprehensive rubric, includes student achievement—see last few pages)System: http://www.gadoe.org/tss ent.aspx/CK%20Standards%2010-182010.pdf?p 8BFA2A0AB27E3E&Type DHillsborough, Florida (Creating assessments/tests for all eringteachers/26

Evaluation System Models (cont’d)New Haven, CT (SLO model with strong teacher development component andmatrix scoring; see Teacher Evaluation & Development System)http://www.nhps.net/scc/indexRhode Island DOE Model (Student learning objectives combined with teacherobservations and DOCS/Asst.Sups CurriculumDir.Network/Assnt Sup August 24 rev.pptTeacher Advancement Program (TAP) (Value-added for tested grades only,no info on other subjects/grades, multiple observations for all teachers)http://www.tapsystem.org/Washington DC IMPACT Guidebooks (Variation in how groups of teachers aremeasured—50% standardized tests for some groups, 10% otherassessments for non-tested subjects and grades)http://www.dc.gov/DCPS/In the Classroom/Ensuring Teacher Success/IMPACT (Performance Assessment)/IMPACT Guidebooks27

ReferencesBetebenner, D. W. (2008). A primer on student growth percentiles. Dover, NH: National Center for theImprovement of Educational Assessment h/PDF/Aprimeronstudentgrowthpercentiles.pdfBraun, H., Chudowsky, N., & Koenig, J. A. (2010). Getting value out of value-added: Report of aworkshop. Washington, DC: National Academies Press.http://www.nap.edu/catalog.php?record id 12820Finn, Chester. (July 12, 2010). Blog response to topic “Defining Effective Teachers.” National JournalExpert Blogs: 0/07/defining-effective-teachers.phpGlazerman, S., Goldhaber, D., Loeb, S., Raudenbush, S., Staiger, D. O., & Whitehurst, G. J. (2011).Passing muster: Evaluating evaluation systems. Washington, DC: Brown Center on EducationPolicy at 26 evaluating teachers.aspx#Glazerman, S., Goldhaber, D., Loeb, S., Raudenbush, S., Staiger, D. O., & Whitehurst, G. J. (2010).Evaluating teachers: The important role of value-added. Washington, DC: Brown Center onEducation Policy at 17 evaluating teachers.aspx28

References (continued)Goe, L. (2007). The link between teacher quality and student outcomes: A research synthesis.Washington, DC: National Comprehensive Center for Teacher etweenTQandStudentOutcomes.pdfGoe, L., Bell, C., & Little, O. (2008). Approaches to evaluating teacher effectiveness: A researchsynthesis. Washington, DC: National Comprehensive Center for Teacher atingTeachEffectiveness.pdfHassel, B. (Oct 30, 2009). How should states define teacher effectiveness? Presentation at theCenter for American Progress, Washington, , C., Burchinal, M., Pianta, R., Bryant, D., Early, D., Clifford, R., et al. (2008). Ready to learn?Children's pre-academic achievement in pre-kindergarten programs. Early Childhood ResearchQuarterly, 23(1), accno EJ783140Kane, T. J., Taylor, E. S., Tyler, J. H., & Wooten, A. L. (2010). Identifying effective classroom practices usingstudent achievement data. Cambridge, MA: National Bureau of Economic Research.http://www.nber.org/papers/w158032929

References (continued)Koedel, C., & Betts, J. R. (2009). Does student sorting invalidate value-added models of teachereffectiveness? An extended analysis of the Rothstein critique. Cambridge, MA: National Bureau ofEconomic ers/2009/WP0902 koedel.pdf McCaffrey, D., Sass, T. R.,Lockwood, J. R., & Mihaly, K. (2009). The intertemporal stability of teacher effect estimates. EducationFinance and Policy, 4(4), .1162/edfp.2009.4.4.572Pianta, R. C., Belsky, J., Houts, R., & Morrison, F. (2007). Opportunities to learn in America’selementary classrooms. [Education Forum]. Science, 315, mmary/315/5820/1795Prince, C. D., Schuermann, P. J., Guthrie, J. W., Witham, P. J., Milanowski, A. T., & Thorn, C. A.(2006). The other 69 percent: Fairly rewarding the performance of teachers of non-tested subjectsand grades. Washington, DC: U.S. Department of Education, Office of Elementary and ther69Percent.pdfRace to the Top p/resources.htmlRivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools, and academic achievement.Econometrica, 73(2), 417 - 458.http://www.econ.ucsb.edu/ jon/Econ230C/HanushekRivkin.pdf30

References (continued)Sartain, L., Stoelinga, S. R., & Krone, E. (2010). Rethinking teacher evaluation: Findings from the firstyear of the Excellence in Teacher Project in Chicago public schools. Chicago, IL: Consortium onChicago Public Schools Research at the University of her%20Eval%20Final.pdfSchochet, P. Z., & Chiang, H. S. (2010). Error rates in measuring teacher and school performancebased on student test score gains. Washington, DC: National Center for Education Evaluation andRegional Assistance, Institute of Education Sciences, U.S. Department of /20104004.pdfRedding, S., Langdon, J., Meyer, J., & Sheley, P. (2004). The effects of comprehensive parentengagement on student learning outcomes. Paper presented at the American EducationalResearch urces/Harvard.pdfWeisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The widget effect: Our national failure toacknowledge and act on differences in teacher effectiveness. Brooklyn, NY: The New heWidgetEffect.pdf31

References (continued)Yoon, K. S., Duncan, T., Lee, S. W.-Y., Scarloss, B., & Shapley, K. L. (2007).Reviewing the evidence on how teacher professional development affectsstudent achievement (No. REL 2007-No. 033). Washington, D.C.: U.S.Department of Education, Institute of Education Sciences, National Centerfor Education Evaluation and Regional Assistance, Regional EducationalLaboratory uthwest/pdf/REL 2007033.pdf32

Questions?33

Laura Goe, Ph.D.609-734-1076lgoe@ets.orgNational Comprehensive Center forTeacher Quality1100 17th Street NW, Suite 500Washington, DC 20036-4632877-322-8700 www.tqsource.org

A Practical Guideto DesigningComprehensiveTeacher EvaluationSystemsA Tool to Assist in theDevelopment of TeacherEvaluation SystemsM AY 2 0 1 1

A Practical Guide to Designing ComprehensiveTeacher Evaluation SystemsLaura Goe, Ph.D.ETSLynn HoldheideVanderbilt UniversityTricia Miller, Ph.D.American Institutes for Research

ContentsRationale and Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2State Accountability and District Responsibility in Teacher Evaluation Systems . . . . . . . . . . . . . 3Key State Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Models for State and District Evaluation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Factors for Stakeholder Consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Development and Implementation of Comprehensive Teacher Evaluation Systems . . . . . . . . . . 9Component 1: Specifying Evaluation System Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Component 2: Securing and Sustaining Stakeholder Investment andCultivating a Strategic Communication Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Component 3: Selecting Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Component 4: Determining the Structure of the Evaluation System . . . . . . . . . . . . . . . . . 34Component 5: Selecting and Training Evaluators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Component 6: Ensuring Data Integrity and Transparency . . . . . . . . . . . . . . . . . . . . . . . . . 40Component 7: Using Teacher Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Component 8: Evaluating the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Conclusion and Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Appendix A. Summary of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Rationale and StructureAcross the nation, states and districtsare in the process of building betterteacher evaluation systems that notonly identify highly effective teachersbut also systematically provide data andfeedback that can be used to improveteacher practice. A Practical Guide toDesigning Comprehensive TeacherEvaluation Systems is a tool designed toassist states and districts in constructinghigh-quality teacher evaluation systems inan effort to improve teaching and learning.This tool is not a step-by-step guideto devising a teacher evaluation system.Rather, it is intended to facilitate discussionand promote coherence in the developmentprocess. The following assumptions haveguided its construction: In response to federal initiatives andpriorities as well as state legislation,states are motivated to improve theircurrent evaluation systems to betteridentify successful teachers, assistless successful teachers, and helpall teachers improve their practice. Most current definitions of teachereffectiveness (e.g., the Race to the Topdefinition) include teachers’ contributionsto student learning growth, and statesneed to consider measuring thesecontributions for all teachers. States are interested in systems thatuse multiple measures to assess variousaspects of teachers’ performance andinstructional practice. States may be in various stages in termsof creating or revising teacher evaluationsystems. This tool allows states to focuson the specific components of the systemthat are most relevant for them. In states where districts have substantialcontrol over teacher evaluation systems,this tool may be used by districts orconsortiums of districts for discussionand guidance. Teachers play a critical role in ensuringthat the evaluation system is fair, valid,and successful, and they should be activeparticipants in designing, developing,implementing, and evaluating the system. Component 3: Selecting Measures Component 4: Determining the Structureof the Evaluation System Component 5: Selecting and TrainingEvaluators Component 6: Ensuring Data Integrityand Transparency Component 7: Using Teacher EvaluationResults Component 8: Evaluating the SystemEach subsection includes an overview ofthe component, resources and practicalexamples, and a series of guiding questionsdesigned to help states organize their workand move strategically toward an evaluationsystem that functions to improve studentlearning and teacher performance.The guide begins with an overview of thefactors influencing teacher evaluation reformtoday and continues with a discussion ofapproaches to balancing state accountabilityand district autonomy. The next section ofthe guide is structured around the followingessential components of the design processas supported through research: Component 1: Specifying EvaluationSystem Goals Component 2: Securing and SustainingStakeholder Investment and Cultivatinga Strategic Communication Plan 1

IntroductionThe research community has longrecognized the importance of teachers tostudent achievement. Although researchhas shown that teachers are the mostsignificant school-based factor in studentachievement, traditional methods ofevaluating teachers have not been ableto capture or explain differences betweeneffective and ineffective teachers.Initial efforts to ensure quality educationfocused on teacher qualifications anddegrees; however, research does notindicate that these factors significantlyinfluence teacher effectiveness. For example,Rivkin, Hanushek, and Kain (2005) analyzedresults from thousands of teachers andtheir students in Texas and determinedthat there were strong teacher effects onacademic achievement, but variation inthese effects could not be explained byeducation or experience.Further, mounting evidence indicatesthat the United States is losing ground incomparison to other countries in terms ofeducational outcomes. One internationalstudy showed that U.S. students wereoutperformed in mathematics by studentsin 20 of the other 28 industrialized countriesstudied (Lemke et al., 2004). In addition,a recent Program for International StudentAssessment study found that only 5 of theother 33 participating countries had lowerscores in mathematics literacy than theUnited States (Fleischman, Hopstock,Pelczar, & Shelley, 2010). These types offindings resulted in increased concern aboutdetermining the best way to improve studentlearning through teacher performance and ashift in focus from analyzing teacher inputs(e.g., education, certification, and experience)to measuring teacher effects (e.g., studentachievement and classroom practice).Improving teacher quality and effectivenessis a complex issue, and the ability to identifyhigh-performing and low-performing teachersis a necessary step toward pinpointinginstructional strategies and pedagogy thatresult in improved student growth (e.g.,evidence-based instructional strategies,strong student-teacher relationships).Unfortunately, traditional evaluationmethods have not proven to be useful inmeeting this challenge. In the past, teacherevaluation systems have varied widely intheir rigor and utility. Most systems werebased on classroom observations, usuallyconducted by principals but sometimesconducted by trained evaluators (seePractical Example: “Cincinnati PublicSchools Evaluation System”). The stepstaken after the observations differedconsiderably across states, districts, andeven schools, with some schools linkingresults to professional growth plans forteachers and others filing the results awaywith little or no follow-up. The perfunctory,compliance-oriented approach to teacherevaluation in some districts likely did notcontribute to tangible improvement inteaching and learning. Unfortunately,there has been little research on howthese different approaches to classroomobservation influenced teacher performance.PRACTICAL EXAMPLECincinnati Public Schools Evaluation SystemCincinnati teachers participate in a“comprehensive evaluation” during theirfirst and fourth years of teaching, after which,they are evaluated every five years. Teachersare observed four times by teacher evaluatorsand once by a school administrator. Beforethey can become teacher evaluators, teachersmust complete a three-step application processto become lead teachers. Lead teachers maythen apply for positions such as teacherevaluators, consulting teachers, and programfacilitators. Those selected for teacherevaluator positions are required to undergoextensive training in collecting and scoringevidence. Using videos, they are certifiedthrough a process of verifying the agreementof their scores with those of “master raters.”Through this process, a high degree ofreliability is ensured, meaning that a teacher’sobservation score is likely to be the same ornearly the same, regardless of which trainedevaluator conducts the observation.Source: Cincinnati Public Schools (n.d.) 2

In 2009, an investigation into thecompliance-oriented approaches ofevaluation systems conducted by TheNew Teacher Project sent shockwavesthrough the policy world. The study, titledThe Widget Effect: Our National Failure toAcknowledge and Act on Differences inTeacher Effectiveness, examined large andsmall districts across several states whereevaluation consisted primarily of classroomobservations (Weisberg, Sexton, Mulhern, &Keeling, 2009). The following conclusionsemerged from the study: Nearly all teachers received high ratings(good or great). Districts failed to recognize and rewardexcellence. Professional development was rarely tiedto results and when it was, little supportwas offered to teachers. New teachers generally were rated abovesatisfactory, and tenure was seldom deniedto teachers based on observation results. Poor performance rarely led to teacherdismissal.The inability of evaluation systems todifferentiate factors contributing to teachereffectiveness suggests that classroomobservations, at least as they were used inmost districts in the study, are of little usefor improving and rewarding performance oridentifying teachers who need support andtraining and those who should be dismissed.Through funding opportunities including theAmerican Recovery and Reinvestment Act(ARRA) and the Race to the Top competition,the federal government has encouragedstates and districts to develop rigorousevaluation systems for use in high-stakesdecisions including teacher advancement,compensation, distribution, and retention.These opportunities, coupled with theevidence of poorly functioning teacherevaluation systems, have resulted in anational urgency to create and implementcomprehensive, strategic systems forevaluating teacher performance that identify,support, and develop teacher effectivenessand student growth.In response to this urgency, many stateshave passed legislation mandating thedevelopment of rigorous, high-qualityevaluation systems for use in highstakes situations related to teacheremployment and advancement. Advisoryboards, committees, and multistateconsortia are meeting to gather informationon research and best practices related tothe development, implementation, and useof these evaluation systems. This PracticalGuide provide

The focus on teacher effectiveness is changing policy Impacting seniority and tenure rules New legislation is changing "Last hired, first fired" policies in many states and cities, including Los Angeles, New York City, Washington, DC, Illinois, Florida, Colorado, Tennessee Impacting privacy and confidentiality