The challenge of designing effective performance measurement and incentives is a general one in economic settings where behavior and outcomes are not easily observable. These issues are particularly prominent in education where, over the last two decades, test-based accountability systems for schools and students have proliferated. In this study, we present evidence that the design and decentralized, school-based scoring of New York's high-stakes Regents Examinations have led to pervasive manipulation of student test scores that are just below performance thresholds. Specifically, we document statistically significant discontinuities in the distributions of subject-specific Regent scores that align with the cut scores used to determine both student eligibility to graduate and school accountability. Our results suggest that roughly 3 to 5 percent of the exam scores that qualified for a high school diploma actually had performance below the state requirements. Using multiple sources of data, we present evidence that score manipulation is driven by local teachers' desire to help their students avoid sanctions associated with failure to meet exam standards, not the recent creation of school accountability systems. We also provide some evidence that variation in the extent of manipulation across schools tends to favor traditionally disadvantaged student groups.
We would like to thank Tom McGinty and Barbara Martinez of the Wall Street Journal for bringing this issue to our attention and providing us with some of the data used in this analysis. We would also like to thank Don Boyd, Jim Wyckoff, personnel at the New York City Department of Education and New York State Education Department, and seminar participants at Northwestern University’s Institute for Policy Research and the Massachusetts Institute of Technology for helpful discussions and comments. Sean Tom provided outstanding research assistance. All errors are our own.