Educational Vouchers: A Review of the Research
by
Alex Molnar
Center for Education Research, Analysis, and Innovation
School of Education
University of Wisconsin-Milwaukee
PO Box 413
Milwaukee WI 53201
414-229-2716
October, 1999
CERAI-99-21
Educational Vouchers: A Review of the Research
October 1999
CERAI-99-21Alex Molnar
Professor, Department of Curriculum and Instruction University of Wisconsin-Milwaukee
This document combines excerpts from two reports: "Smaller Classes -- Not Vouchers -- Increase Student Achievement" (Harrisburg, Pa.: Keystone Research Center, March 1998); and "Smaller Classes and Educational Vouchers: A Research Update" (Harrisburg, Pa.: Keystone Research Center, June 1999). Both documents are available on the website of the Center for Education Research, Analysis, and Innovation at http://www.uwm.edu/Dept/CERAI
In statistical analysis, social scientists need to know how to distinguish findings that could be the result of random chance from findings that indicate strong confirmation of a hypothesis, such as the hypothesis that Choice schools improve student performance. By convention, social scientists most commonly consider a result "statistically significant" when the probability of it occurring by chance is .05 (i.e. 5 chances out of a 100) or less. In their March 1997 paper, however, Greene, Peterson, and Du report a result as significant when there is a 1 in 10 (or .10) probability or less of it occurring by chance.
GPD further increase the number of "significant" findings that they report by evaluating results using a "one-tailed" test of significance rather than a more common "two-tailed" test.
One-tailed tests are usually used when there are strong theoretical reasons for believing that change in the independent variable (in this case attendance at Choice schools) is likely to produce a change in the dependent variable (test scores) in only one direction. GPD’s theory is that Choice students could not perform worse on tests than those who applied to the program but were rejected. GPD justify this by reference to the literature suggesting that private schools perform better than public schools. It is a questionable assumption because, as we saw in Box 3, the literature on private vs. public school achievement is drawn primarily from secondary school data, shows mixed results, and is very controversial. Rouse’s finding that students in a sub-group of Milwaukee public schools outperform those in Choice schools raises further questions about the one-tailed assumption.
The important point here is that by using both a .10 standard of significance and a one-tailed test in their March 1997 paper, GPD are four times more likely to find significant results than if they had applied a .05 standard using a two-tailed test. This allows them to report almost eight times as many statistically significant finding in Tables 3, 4, 5, and 7 of their March paper than they would have been able to report using a .05 level with a two-tailed test. In other words, they report 23 significant findings instead of 3.
GPD might respond that "statistical significance is not a cliff" and that results slightly below a customary threshold for significance are still unlikely to occur by chance and are therefore worthy of note. GPD, however, are not consistent in this view. In one important case, they fail to point out some significant findings (at the .10 level) that reduce confidence in their main finding about the performance advantage of voucher students. This case comes up when GPD respond to the claim that lower-performing students more often leave the voucher program, making their sample of students still in the Choice schools unrepresentative. In their August 29, 1996 paper, GPD directly test for such attrition bias by comparing (a) the scores of students who continued in the voucher program with (b) the scores of students who withdrew from the program (i.e., the last score of these students before they left the voucher program). GPD summarize their findings as follows:
In only two comparisons were differences statistically significant. In one the studentsWhen you look in their table reporting these results (Table 7 in their paper), you find that two of the "insignificant" differences between Choice stayers and leavers are nearly significant (they could have occurred by chance with only a .06 and a .09 probability). These differences meet the .10 standard that GPD earlier used as a threshold for significance. In both these cases, the math scores of continuing choice students exceed the math scores of those who drop out of the program. Perhaps adding to the inconsistency, GPD may have used a two-tailed test in their examination of Choice student attrition bias. If one accepts the theory that more successful students in Choice schools would not leave the voucher program, then a one-tailed test would be more appropriate. Under a one-tailed test, the math advantage of continuing Choice students over those who quit in 1993 and 1994 would be significant at a .05 level.
leaving the study had the higher test scores; in the other, continuing students had higher
test scores. In the other six cases, the two groups did not differ significantly.
The most recent analysis of the Milwaukee Parental Choice Program data has been done by Professor Cecilia Rouse of Princeton. Rouse analyzes the performance of all students selected to attend Choice schools (including those who never attended -- a small group -- and those who subsequently left). She compares this group’s performance to that of applicants not admitted to the Choice program and to a random sample of MPS students. By comparison with GPD’s main method, this approach has the advantage of avoiding non-random attrition from the Choice sample. It also increases the number of students in the "Choice" sample. Rouse sees including all those awarded vouchers in the Choice group as a better way of assessing the overall impact of the MPCP program than restricting the sample to those currently receiving vouchers. According to GPD, who use the same method in part of their March 1997 paper, Rouse’s approach better captures what would happen if the Choice experiment were generalized and students migrated back and forth between private and public schools.
In addition to her analysis of successful applicants for vouchers, Rouse does a more familiar comparison between students who actually attended Choice schools and her MPS sample. Whichever way she defines program participants -- as those selected or those actually attending -- Rouse’s estimate of their test scores relative to those of Milwaukee Public School students turns out to be similar.
Like Witte, Rouse finds no significant advantage for the Choice groups in reading. She describes the Greene, Peterson, and Du results for reading as "fragile."56 In math, Rouse finds that students admitted to the voucher program, and the sub-sample still participating in it, both had faster math gains than her random sample of MPS students. She estimates that the math scores of successful applicants and of program participants rise each year by 1.5–2.4 percentile points more than MPS student test scores. This amounts to an effect size of 0.32–0.48 over four years (see box 2 for a definition of effect size).
Rouse argues that the difference between her and Witte’s comparison of the math scores of MPS and voucher students results from a highly technical difference in the statistical models used. (She supports this claim by making her model similar to Witte’s and showing that she gets results comparable to his.) While Witte’s model includes prior test scores (and other individual characteristics) as controls, Rouse uses an individual "fixed effects" model that controls for all student characteristics that do not change over time (e.g., parental education and "innate" ability).57 Rouse’s approach enables her to include in her sample individuals that Witte excludes because of missing some prior year test scores.
Rouse cautions that there are several caveats to bear in mind when considering her results.58
First, a large number of students in the data set do not have total math scores. (This is a problem for all three research teams.) For 1993, Rouse had to impute the total math score (from scores on the components of the test) for 40 percent of the unsuccessful Choice applicants and 34 percent of the students in her Milwaukee Public Schools sample. For 1994, she had to impute 69 percent of the total math scores for the unsuccessful Choice applicants and 67 percent of the Milwaukee Public School sample.Rouse’s overall conclusion: allowing low-income children to attend private schools might raise the math achievement of those who participate. However, the Milwaukee data do not answer the question of whether vouchers give public schools an incentive to improve, nor do these data provide an adequate basis for making decisions about the widespread implementation of voucher programs.Second, Rouse’s method assumes that, in the absence of the voucher program, the two comparison groups would have improved their scores over time at the same rate. If, however, the test scores of children with high-voice parents tend to improve faster than the test scores of other students -- even when the high-voice offspring start off poorly -- then Rouse’s model would wrongly attribute this improvement to the voucher program.
Third, the data sets on the Milwaukee voucher experiment include no school variables, such as social and economic profile of the school, class size, school size, or spending per student. Therefore, neither Rouse nor the other analysts have any way of knowing whether differences between the achievement of Choice students and that of Milwaukee Public School students are attributable to these variables. Since there is clear evidence that class size, for example, has a significant effect on student achievement, Rouse’s results may have nothing to do with participation in the Choice program per se. In her most recent paper, analyzed at length in the class-size section of this report, Rouse takes a first step towards addressing the lack of school variables. She presents evidence that class size in public schools exceeds that in Choice schools. Moreover, she finds that students in the one sub-group of the Milwaukee Public Schools that have a class size comparable to Choice schools have better overall test scores than Choice schools.
Finally, Rouse points out that the average effects she reports say nothing about the performance of individual Choice schools, i.e., they do not suggest that all Choice schools are "better" than the Milwaukee Public Schools.
Rouse ends her December 1997 paper by noting:
If we really want to "fix" our educational system, then we need a better understanding of what makes a school successful, and not simply assume that market forces explain sectoral differences and are therefore the magic solution for public education.59Continue with the Next Section Milwaukee’s Private Voucher Program -- PAVE