School Reform Proposals: The Research



Editor: Alex Molnar


W. Steven Barnett, Gerald W. Bracey, Robert M. Carini

Douglas Downey, Jeremy D. Finn, Craig Howley

Gene V Glass, Haggai Kupermintz, Catherine Lugg,

Ulrich C. Reitzug, Barak Rosenshine



Education Policy Research Unit (EPRU)

Education Policy Studies Laboratory

College of Education

Division of Educational Leadership and Policy Studies

Box 872411

Arizona State University

Tempe, AZ 85287-2411


January 2002







Introduction and Executive Summary


By Alex Molnar

Arizona State University


For nearly two decades numerous prominent critics have pronounced American public education broken.  The debate over the state of the nation’s public schools has been joined by business leaders, teachers’ unions, and think tanks from all points on the political spectrum. The chorus of criticism has produced a curious disconnect between Americans’ perception of their local schools and their assessment of the nation’s schools as a whole. Of more than 1,100 respondents to a new Phi Delta Kappa/Gallup poll, for instance, just over half  told surveyors that they would give the nation’s schools a grade of ‘C’, even though an identical number – 51% – gave their own local schools A’s and B’s.[1]

The contention that the public education system has failed in turn has prompted a wide variety of reform proposals. Some, such as educational vouchers, would radically reorganize the system’s governance. Others, such as universal early childhood education, may demand equal or even greater changes in the educational system’s design and structure, while retaining the central feature of the system that has served public education for more than a century: the common school. Still other reform proposals are far more measured in their scope. Amid the welter of ideas put forth it is often far from clear how to best improve what is not working well without subverting the many successes of American public education.

In the last decade especially, reforms have tended to be justified as necessary to improve the academic achievement of children living in poverty. Moreover, the widespread and intense scrutiny of public school performance has increased the pressure on legislatures to act quickly, even as it has made it more important than ever for them to carefully weigh the benefits and costs of competing reform proposals. Unfortunately, the research evidence available to policy makers is often non-existent, incomplete, or appears to be contradictory. School reform is, therefore, frequently debated in an environment that is long on emotion and short on hard data. Furthermore, the data supporting proposals to reform public education varies enormously in its quality. It ranges from carefully conducted and rigorously reviewed research to ideologically oriented commentary. Often there are few indicators for policy makers to distinguish one from the other.

In order to clarify what we know about effective public schools, the Education Policy Studies Laboratory (EPSL) at Arizona State University invited a group of distinguished education scholars to review the research on a series of education reform topics.

The following literature reviews are the result. Some reviews focus on specific proposals that are proffered for making public schools more effective. Others examine core components or practices in our public schools in order to evaluate the impact of those components and practices on student achievement. In each case, the reviewers examined the research on the topic at hand with a particular eye toward its findings with regard to student achievement, especially that of children living in poverty.

Institutions, People, and Money


These 13 reviews can be grouped into three broad clusters. The first group examines schools as institutions and their structures. It includes reports on the efficacy of early education programs; the movements to reduce class size and to create smaller schools; alternatives in structuring the school day and year; variations in how students may be grouped; and the role that schools have played in the past and might play in the future in their larger communities and in involving parents in their children’s education.

The second group of reviews focuses on the teachers who deliver public education. Reports examine research on teacher characteristics and instructional behaviors;  the role of teacher unions as obstacles or assets to educational improvement; proposals to quantify the value that teachers add to the educational process and thereby assess teacher performance; and on the effectiveness of current approaches to professional development of teachers, and how those approaches might be improved.

The final review is a comprehensive look at various proposals to supplant all or part of the traditional public education system with institutions from outside that system. Those include vouchers that citizens might use to gain entry to private schools instead of their local public schools; charter schools, which present themselves as alternatives to public schools that have been released from some of the requirements and regulations under which public schools operate; and proposals to contract the  management of public schools out to private, for-profit companies.

The Varying Quality of Research


It should be no surprise to the informed reader that, from one topic to the next, the quality of available research varies greatly. Some topics, such as class-size reduction, have been the subject of rigorous and well-controlled experiments that have undergone the intense scrutiny of peer review and stood up to the test. Other topics, such as private school vouchers, have produced much in the way of strong opinions but very little well-founded research to support the conclusions drawn by their staunchest advocates.

Notwithstanding such limitations, each review represents the best information available to us on the topic at hand. Each presents evidence for the effectiveness – or the ineffectiveness – of certain reform proposals. Several include calls for additional research where our knowledge is too scanty to draw well-reasoned conclusions. And where the reviewers are able to uncover reforms that do work, they have presented evidence as well for how we can make each reform as effective as possible.

Many of these reviews point to reforms that can be achieved with only modest investments, or indeed simply a reallocation of additional resources. Others offer a warning worth heeding about proposals that seem certain to waste funds. For the most promising reforms, however, a common theme emerges: Money does matter. The reforms that offer the greatest promise, reforms supported by solid scientific research, cannot be advanced merely by working smarter. They require a deeper investment than we are currently making in the nation’s school systems.

Spending: Essential, but Not Sufficient


Yet, as one review after another suggests, simply spending more money is not sufficient, and is no guarantee of success. Rather, any enrichment of resources must be husbanded carefully and spent thoughtfully, with due consideration given to what works and what doesn’t in the pursuit of each reform strategy.

There is good news, here, however: the research evidence strongly identifies those investments that promise the highest return.  There is more good news as well. Support for committing additional resources, when that commitment is made thoughtfully and with a sound basis in action, may run deeper than many might assume. Fully 65 percent of Americans polled for the Washington Post and ABC News in the spring of 2000 advocated increased federal spending on schools.[2] Two-thirds of those responding to a 1998 Gallup Poll for Phi Delta Kappa said they would be willing to pay more in taxes to improve the quality of the nation’s inner-city schools.[3]  The most recent Gallup/Phi Delta Kappa poll shows support for reforming the existing public education system increasing and interest in radical reforms such as vouchers fading.  Seventy-one percent of respondents, for example, said they favored public school reforms over measures, such as private school vouchers, that would seek to supplant the public schools.[4]

Taken together, the chapters that follow constitute a comprehensive resource guide on the state of education reform and research into reform. Among the many reforms examined are those that have demonstrated their effectiveness beyond all reasonable doubt. Others, although they may have won widespread attention and praise, have already demonstrated through research to be at best far more limited than their promoters have warranted, or at worst completely ineffectual. Still others have not lived up to claims made on their behalf, and require much more research before they can be considered worthy of endorsement.

These reports, then, offer to policy-makers and citizens a road map for making public schools more effective, and to scholars an agenda for further research. It is our hope that they will sort out for all of us a clearer understanding of what works, and what we still need to know.

1. Early Childhood Education


Executive Summary


Summary of Research Findings


Pre-kindergarten education for disadvantaged children can greatly increase their cognitive abilities, leading to long-term increases in achievement and school success. Although general cognitive abilities as measured by IQ may only temporarily increase, persistent increases can be produced in the specific abilities measured by standardized achievement tests in reading and math.  In addition, programs can have positive effects on children’s long-term social and emotional development, reducing crime and delinquency.  To reap all of their potential benefits, pre-kindergarten programs for disadvantaged children must be intensive, high in quality, and emphasize both cognitive and social development.




·          Class sizes and child-teacher ratios must be kept low.

·          Teachers must be highly qualified, with at least a bachelor’s degree and with specialized training in early education, and must be paid well.

·          Curricula must be intellectually rich and sufficiently broad to address children’s developmental needs in all domains.

·          Programs must have an infrastructure adequate to support best practices, professional development, and ongoing evaluation and accountability.

·          Programs must engage in an active partnership with parents and accommodate their needs, including their needs for child care.

·          Programs should start no later than age three.

·          Resources should be focused primarily on disadvantaged children.

·          The existing array of public school, Head Start, and private programs all can be used, but both standards and resources must be substantially increased to produce the desired results.



1: Early Childhood Education


By W. Steven Barnett

Rutgers University Center for Early Education


A number of long-term social and economic trends have contributed to increasing interest in the education of children under five over the past several decades.[5]  Before 1960, the education of young children was regarded as primarily a matter of parenting in the home.  Since that time the percentage of young children cared for by someone other than a parent has risen steadily.  Today, most young children in the United States spend much of their day away from their parents, and most attend a center-based program prior to kindergarten.  Attendance at a center-based program is becoming the norm at ages three and four.  In 1999, center-based program participation was 70% at age four and 45% at age three.[6] 

The center-based programs attended by children at ages three and four go by a variety of names – child care, preschool, day care, nursery school.  They provide different numbers of hours, from a couple of hours one or two days per week to 10 hours per day 250 days per year.  They also operate under a variety of auspices – churches, independent non-profits, for-profits, public schools, Head Start.  Parents regard virtually all of these programs as educational regardless of the nomenclature used to describe the program, the hours of operation, or the auspices under which they operate.[7] Participation rates increase with income and parental education, despite greater government support for programs targeting children in low-income families.  Children under three are much less likely to attend center-based programs, and parents seem to view infant and toddler care as of less educational consequence.[8]

As non-parental education of young children becomes the norm, the extent to which such programs affect children’s learning and development has become a vital question for families and governments. Inequalities in early care and education may be responsible for much of the inequality in later educational outcomes in the United States.[9]  Moreover, there are concerns that parents may be unaware of the potential for their decisions about early care and education to have either adverse or positive impacts on their children’s development.  Some have raised hopes that public support for early education might provide a means for improving the productivity of our educational system and reducing educational and social inequalities.[10]

This report seeks to clarify the potential benefits and possible adverse effects of early care and education, with particular emphasis on the effects for children disadvantaged by social and economic circumstances.  In addition, it seeks to summarize what is known about the extent to which variations in child characteristics, program characteristics, and the social environment alter the magnitude of the educational benefits from early education. Key issues in the review are the nature and duration of program effects.  Often there is no dispute about whether programs have immediate or short-term effects on children, but there are disputes about the meaning or importance of the observed effects and whether they persist or result in other long-term effects that are more consequential.[11]



Early Childhood Education Research


Short-term Studies


A great deal of research has been conducted on the immediate and short-term effects of early education and child care.  Much of this research is found in two largely separate but related sets of literature: one on child care and the other on educational interventions.  Traditionally, these two bodies of research have focused on different questions and had different theoretical and methodological orientations.  In recent years, there has been some convergence, but differences remain. 

Early Intervention Program Studies


In many cases, but not all, the educational interventions have been half-day or school-day programs that operate over a school year.  Some have been home-based programs seeking to improve parent-child interactions in ways that are hypothesized to contribute to improvements in child development.  A few home-based programs have provided educational services directly to the child.  Some programs have delivered both center-based and home-based services and some have worked fairly extensively with both parents and children.  Virtually all center-based programs have made efforts to involve parents in some way.  These programs typically target children who are expected to have greater difficulty with school and high rates of grade repetition, special education, and other problems. 

Children have been identified for intervention based on social and economic factors that are taken as indicators of risk of school failure, or based on individual assessments of developmental delay or disability.  Poverty is the most frequently used criterion for disadvantage or risk, but other factors that might be employed include low levels of parental education or IQ, poor health or nutrition, poor housing, maternal depression, and family and neighborhood violence.[12]  Targeting based on socioeconomic disadvantage and based on developmental delay are clearly different conceptually.  As  socioeconomic disadvantage can lead developmental delay, however, there is some overlap.

The early intervention literature has focused on looking for positive effects on children’s development, most often looking at cognitive development, but assessing effects in other domains as well.  There are hundreds of studies of immediate and short-term effects, and their findings have been conveniently summarized in both quantitative meta-analyses and traditional literature reviews.[13]  Across these studies, the average initial effect on cognitive abilities is about 0.50 standard deviations, 7 or 8 points on an IQ test.  Average effects on such socio-emotional outcomes as self-esteem, self-efficacy, motivation, and social behavior also were positive, though somewhat smaller, 0.25 to 0.40 standard deviations.  No evidence of consistent negative effects appears in these studies.  A strength of this literature is that similar results are found across studies employing a wide variety of research designs, including randomized trials and single-subject designs in which the “treatment” was experimentally manipulated.  Effects are similar in size for disadvantaged populations and for children with disabilities or developmental delays.

Recent years have produced important advances in research as randomized trials, sometimes on a quite large scale, have been employed to examine the effects of specific approaches to early educational intervention at specific ages.  The findings of these studies add substantially to the knowledge provided by the studies summarized in previous reviews of the literature.  In particular, these randomized trials have tested the effects of home visitation and other approaches that focus on parents and the improvement of parenting as means to improve the development of young children.  These include models emphasizing case management to coordinate and increase the use of existing services for children beginning in the first year of life.  Randomized trials may be especially important for studies of these types of programs; unmeasured differences among parents might play a large role in who chooses to enroll in such programs, leading to substantial biases when researchers attempt to estimate program effects simply by comparing program families and children to others who did not choose to enroll.

Results of these studies indicate that home visit programs frequently fail to influence parenting or to improve children’s cognitive development.  Two randomized trials have been conducted in California on Parents as Teachers (PAT).[14] Both found small and inconsistent effects on parenting knowledge, attitudes, and behavior and no effects on child development.  A randomized trial of the Home Instruction Program for Preschool Youngsters (HIPPY) serving children ages four and five found significant effects on cognitive development for one cohort, but not another, and found no explanation for the inconsistent findings.[15] 

The Carolina Approach to Responsive Education (CARE) study randomly assigned children to three conditions: full-day year-round educational child care and home visits for parent education, parent education alone, and control.[16]  Treatment began shortly after birth and continued to age five.  The home-visit group of children had no better outcomes than the no-treatment controls.  A randomized trial of home visits in Head Start similarly found no effects of home visits on home environment or child development.[17]

A test of Levenstein’s Verbal Interaction Program (VIP) in Bermuda failed to find positive effects, replicating the results of Levenstein’s own earlier experimental results, but contradicting findings from quasi-experimental studies.[18]  One potential explanation for lack of consistent effects comes from a randomized trial that varied frequency of visitation and found that three visits per week were necessary to produce significant cognitive benefits.[19]  Most programs have provided home visits much less frequently.[20]

Several studies of attempts to provide comprehensive services in “two generation” models also have produced disappointing results.  A multi-site randomized trial of the Comprehensive Child Development Program (CCDP) found that CCDP substantially increased maternal participation in parenting education, mental health services, and their own schooling while producing modest increases in children’s participation in health services and early care and education over the first five years of life.[21]  At age two, small effects were found on some parent behaviors and child development (2 points on the Bayley Scales of Mental Development, an effect size of 0.10 standard deviations[22]).  No meaningful effects were found at the age five follow-up, however.[23]  Similarly, studies of the Avance family support program, Child and Family Resource Program, and New Chance all failed to find significant effects on child development.[24]  Research on Even Start found small effects, at best, on child development.[25] The recent large-scale multi-site randomized trial of Early Head Start found very small effects on child development and parent outcomes at age two, replicating the early findings of the CCDP study with 2 points on the Bayley and 0.10 effect sizes generally.[26]

The results of research on home visitation and two-generation approaches that do not provide substantial direct services to children in centers strongly suggest two conclusions. First, attempts to influence child development indirectly through parents are relatively weak.  Second, the size of the effect on child development varies with the amount, in frequency and in duration, of intervention provided.  These conclusions are consistent with conclusions from earlier reviews of the literature.[27] A fairly intensive level of direct service may be required to consistently produce effects on child development of the average size observed in the literature generally.

A few seeming exceptions in the literature suggest that further research is warranted on the circumstances under which parent-directed programs might be highly effective.[28]  Recent studies, however, also document  the high costs of parent-focused programs, which are so substantial that even programs that demonstrate positive effects are unlikely to be deemed cost-effective.[29]

Although the evidence presented above is not encouraging regarding the effects of home visitation on children’s cognitive development, there is evidence that some home visitation programs can improve the lives and development of young children in other ways. Over 20 years, David Olds and colleagues have found that a program in which nurses conducted home visits to economically disadvantaged new mothers produced significant positive effects: reducing the number and improving the timing of pregnancies and births after the first child,  and reducing children’s need for health care for injuries or ingestions.[30] 

The Infant Health and Development Program (IHDP) study was a multi-site randomized trial to investigate the effects of weekly home visits starting just after birth, with the addition of full-day educational child care from ages one to three for low-birth weight children.[31]  The IHDP substantially increased IQ (by more than 0.50 standard deviations) and decreased parent-reported problem behaviors through age three.  Effects were found to be larger for children with less educated mothers and for children with heavier birth weights.[32] At the age five and age eight follow-ups, significant effects were no longer found for the total sample.  Significant (though reduced) effects, however, were found for the heavier birth weight stratum on IQ at ages five and eight and on mathematics achievement at age eight. No differences in treatment effects were found for any of the parental education subgroups.[33]

Why effects for the total group in the IHDP study disappeared is not clear. It is possible that lower-birth weight children in the control group had access to additional services – such as early intervention services or preschool special education programs – before the age of three and between the ages of three and five, which could lead to the disappearance of differential findings. Conversely, the lighter birth weight stratum might have greater incidence of neurological damage that limited the effectiveness of the program.  Some researchers have disputed the follow-up findings of effects for the heavier birth weight group.[34]  It is worth noting, however, that the birth weight strata were defined prior to the analysis, differential effects for the two birth weight strata were found at three different points over five years, and plausible explanations have been offered.


Child care studies


Research on child care has tended to study the effects of typical programs on the general population, though some studies have focused on children in low-income families, with an emphasis on social and emotional development.  In particular, child care researchers have been concerned with the potential for separation from the mother to harm social and emotional development.  More recently, the field has broadened its attention to cognitive development and the potential for positive effects, just as educational research has increased its concerns with social and emotional development and potential negative effects.  Most child care studies have relied on statistical analysis of natural variation rather than experiments or even quasi-experiments with specific “treatments.”  Over time child care research has evolved from asking about the average effects of care to asking how the effects of care vary depending on interactions among the characteristics of care, children and families.[35]

Although programs for young children under a wide variety of names provide both care and educational experiences, child care is distinguished from preschool education by having as a primary goal enabling parents to work or pursue other activities. Child care centers are open for the hours parents work – typically 10 hours a day,  5 days a week –  and children often attend more than 30 hours per week. Of course, child care centers are not the only providers of child care – family day care homes, nannies, and others, including relatives and neighbors, provide care outside or inside the child’s home. However, the focus here is on child care centers and their influences on learning and development.

Looking across many studies, child care for young children, especially care for infants and toddlers, appears to produce small negative effects in the short term on child-mother attachment and on social behavior, particularly aggression.[36] The effects on aggression may be contemporaneous or at entry to school.  Although there is much agreement about these findings, some researchers have questioned the conceptualization and measurement of attachment, and it is essential to recognize that the social behaviors of the vast majority of children in care are in the normal range.[37]  In addition, there is no evidence that negative effects on social behavior persist past the first few years of school or result in other later problems.[38] Some studies have failed to find negative effects on aggression and have found positive effects on other social behaviors.[39]

Recently, new evidence on the short-term effects of child care on social behavior has come from the NICHD study of early child care, which had a sample of over 1300 children across 10 sites.[40] Media reports based on a conference paper indicated that new findings contradicted previous work and the views of most “experts” that child care was not harmful for children’s social and emotional development.[41]  In fact, the NICHD results reveal nothing new.  Child care (of all types, including father care) for 30 hours or more per week was associated with more reported behavior problems at age two, but not at age three, and then again at ages four and five.  At age five, children who received child care for 30 or more hours per week during the first four years of life had higher rates of reported problem behavior than those who had attended less than 10 hours per week.  However, as in other studies, the effects were small. Behavior problems for children with 30 or more hours of care were not more common than would be expected for the general population.  In addition, the negative effects on problem behavior were somewhat reduced for higher quality child care.[42]

Child care also has been found to produce modest positive effects (effect sizes in the neighborhood of 0.10-0.15) on cognitive and language development.[43]  Some studies find that effects are larger for children who enter care earlier.[44]  Some studies find larger effects for children from economically and educationally disadvantaged families.  In addition, some studies have found that there may even be small negative effects of child care (in the first three years) for children from homes offering the richest environments.[45] There is an implication that the difference between the resources provided to the child through parental and non-parental care is the active factor. This is consistent with evidence that the magnitude of effects increases with the quality of child care as well as evidence on the effects of parental education and other home resources.[46]

Recent large-scale longitudinal studies provide additional evidence regarding the effects of child care on the development of language and cognitive abilities. The NICHD study of early child care found associations between quality and child’s language and cognitive development throughout the first three years of life.[47]  At age four, higher child care quality was associated with greater language abilities and better short-term memory and attention. Child care centers were associated with better language and cognitive test scores at age four than other forms of care.  In addition to associations with observed quality, it was found that children enrolled in child care centers meeting a greater number of professional guidelines for child-staff ratio, group size, teacher training, and teacher education had higher cognitive and language ability, and higher school readiness. All of these associations were modest in size, controlling for family background and home environment.[48]  Variations in effects with family background have not been found consistently.[49]

A follow-up of the Cost, Quality, and Outcomes study investigated the effects of child care classroom quality on over 800 children in four states from ages four through eight, statistically controlling for family background.[50]  This study found that children who attended higher quality child care classrooms had higher scores on the Peabody Picture Vocabulary Test-Revised (PPVT-R) and on achievement tests for pre-reading and math abilities at age four.  The PPVT-R is a test of receptive language, but it often is used as a “quick” IQ test. Continued follow-up found significant effects on PPVT-R scores through kindergarten, but effects declined as children moved toward age eight (controlling for quality of later schooling).  Effects on math scores persisted through age eight.  Depending upon the specifics of the analysis, effects on pre-reading and math achievement are found for children with less well-educated mothers, but not for children with highly educated mothers.[51]


Long-Term Effects


Reviews that simply summarize the results of studies of early care and education have found that cognitive effects frequently decline over time and are negligible several years after children leave the programs.[52] This pattern has led some to conclude that even intensive preschool programs produce no lasting effects on cognitive development.  In this view, initial effects are either artificial (children learn to answer test questions better, but are not really smarter) or do not lead to long-term gains in cognitive ability.  Others have called attention to differences among programs and concluded that large-scale public programs for children in poverty produce no meaningful improvements in cognitive abilities, while more intensive, small-scale (and impractical) programs may produce small gains in cognitive development.  For example, Herrnstein and Murray conclude: “Head Start, the largest program, does not improve cognitive functioning.  More intensive, hence more costly, preschool programs may raise intelligence, but both the size and the reality of the improvements are in dispute.”[53] They and others contend that to the extent more intensive programs have substantive long-term benefits these are more likely due to socialization than to effects on cognitive abilities.[54]

Barnett challenged this view through a review of the literature with a specific focus on the long-term effects of programs on achievement and school success, selecting studies for inclusion if they met four criteria: (1) children entered the program as preschoolers (in Head Start this could include some five-year-olds prior to the availability of kindergarten); (2) the program served economically disadvantaged children; (3) at least one measure of achievement or school success was collected at or beyond age eight (Grade 3); and, (4) the research design identified treatment and no-treatment groups from program records.[55]  The requirement for follow-up through third grade allowed sufficient time to observe the fade-out in effects that is widely believed to occur.[56]

Thirty-seven studies were found that met these criteria, a larger number of long-term studies than had been included in previous research reviews and syntheses.  All are studies of educational interventions, although five of the model programs provided services through full-day child care.  The studies can be divided into two categories: one for small-scale research models, the other for large-scale public programs.  In 15 studies, researchers developed model programs to study the effects of controlled treatments.  In 22 other studies, researchers investigated the effects of on-going, large-scale public programs: 10 studied Head Start programs, eight examined public school programs, and four studied a mix of Head Start and public school programs.[57]

Model program studies


       The model program studies varied in entrance age, duration, services provided, and historical context (1962 to 1980).  In later years, significant percentages of the comparison groups are likely to have attended a preschool or child care program, leading to underestimation of program effects. All focused on highly disadvantaged populations. The average level of mother's education was under 12 years in all studies, and under 10 years in five studies. The majority of children were African-American in every study except for one, in which they were Hispanic. From program descriptions of teacher qualifications, class size, student-teacher ratio, and other information, it is apparent that model programs were much more resource-intensive, and therefore more expensive, than typical public programs for young children. Two studies limited their samples in additional ways that could have affected their results.  The Perry Preschool study selected children based on low IQ scores, and its sample had substantially lower IQ's at age three than children in other studies.[58]  The Milwaukee study selected children whose mothers had IQ's below 75.[59]

Seven of the model program studies were randomized trials.  Two stand out because they began with sample sizes larger than 30 in each group, and had low attrition throughout follow-up: the Abecedarian and Perry Preschool studies.[60]  The others suffered from extremely small initial samples or serious attrition.  The remaining eight model program studies constructed comparison groups, and it is possible that the groups differ in ways that may have biased the comparisons either for or against the program.  When randomized trials are not used, it is difficult to distinguish program effects from the effects of pre-existing differences (which may be unmeasured) between children and families in the preschool group and the comparison group, a problem sometimes referred to as “selection bias.”

Large-scale public school programs


The 22 large-scale public program studies generally represent public preschool programs targeting children in poverty. Most programs served children part-day for one school year at age four.  Four programs served children from age three.  In nearly all of the studies children moved on to regular public elementary schools.  In the Child Parent Center (CPC) studies, intervention continued through third grade, and the effects of the preschool and school-age programs have been estimated separately.  All of the large-scale public program studies used quasi-experimental designs.  In most studies, comparison groups were identified later, and there are no pre-program measures of children's cognitive abilities to verify that the two groups began with the same abilities. Many studies employ family background measures to assess comparability and adjust for initial group differences, but the family background measures tend to be crude, increasing the risk that unmeasured differences between groups bias the results.

Study Findings



All of the model program studies found positive initial effects on IQ.  In most cases IQ effects were sustained at least until school entry.  Estimated effects for 12 model program studies with IQ data at age five ranged from 4 to 11 IQ points (effect sizes of 0.25 to 0.75), with the exception of two studies, one quasi-experimental reporting no effect and one randomized trial of a  highly intensive program reporting an estimated effect of 25 points.  None of the large-scale program studies provided IQ test data, but a few administered the PPVT; these reported no significant effects on the PPVT after school entry.  In all but two studies, the effects on IQ clearly are transitory. 

Randomized trials of two model programs (the Milwaukee and Abecedarian interventions) that provided full-day intensive interventions over the first five years of life provide evidence that such programs may produce very long-term, possibly permanent, increases in IQ.  The long-term effect is about 5 IQ points, which is substantially smaller than the initial effects of the programs. Their findings contrast sharply with the apparent failure of later, less intensive interventions to produce lasting IQ gains.  This suggests that very early intensive interventions may have more fundamental or general effects on the cognitive development of children in poverty. 

The IQ findings of both studies have been discounted by scholars advocating the importance of heredity as an explanation for the low cognitive abilities of children in poverty.[61]  Even the strongest claims for heredity leave sufficient room for the estimated effects, however. Moreover, their arguments that the study results are questionable because IQ effects appear early (in the Abecedarian study) or are inconsistent with insignificant effect estimates for school outcomes (Milwaukee) do not hold up to scrutiny.  The Abecedarian study finds persistent IQ effects after controlling for maternal IQ and infant home environment, presumably sources of pre-existing differences in IQ between groups.[62] Estimated effect sizes for special education, grade repetition, and academic achievement are large in the Milwaukee study. With the limited statistical power provided by a very small sample size, it is inappropriate to construe lack of statistical significance as evidence that IQ effects occurred without effects on academic success.[63]


In contrast to the IQ findings, results regarding long-term effects on achievement varied considerably across studies. Five of 11 model program studies with achievement data found statistically significant positive effects on achievement test scores beyond Grade 3.  Evidence of achievement effects was strongest in the seven randomized trials, as all found statistically significant effects on achievement at some point. The two randomized trials with low attrition rates, the Abecedarian and Perry Preschool studies, found effects on test scores persisting into high school. Nine studies of large-scale programs never found statistically significant effects or lost statistical significance by Grade 3. Twelve studies of large-scale programs found significant positive effects on achievement at least through Grade 3.

Much of the variation in findings regarding long-term effects on achievement across programs can be explained by differences in research methods and procedures. Detailed analyses indicate that in many studies the apparent fade-out in effects on achievement can be attributed to flawed research methods, which bias estimated effects toward zero, and high rates of attrition, which decrease statistical power over time. Reliance on achievement test data from schools' routine testing programs is a major source of potential problems.  As testing typically is conducted by grade level for children in regular education, studies systematically lose the more poorly performing students from year to year as the cumulative percentage of children retained in grade, placed in special education, or otherwise omitted from testing grows.  Program and comparison group children with valid test scores become more similar over time (essentially equated on grade level), gradually hiding the true differences between the groups.[64]


School Progress and Placement

School progress and placement were primarily measured by the percentage of children repeating grades, given special education services, and graduating from high school. Cumulative school records data on these outcomes are not subject to the attrition bias introduced by the use of school test data.  Estimated effects on school progress and placement are uniformly positive and overwhelmingly statistically significant.  The evidence regarding High School graduation is highly consistent as well.  All six studies (including model, Head Start, and public school programs) produced large estimates of effects on the graduation rate, although only in the four with larger sample sizes were these statistically significant. 

Estimated effects on grade repetition and special education placements can be combined across studies to estimate average effects across studies and compare the effects of model and large-scale programs. Model programs were associated with 20 percentage point lower rates of special education placement and 15 percentage point lower rates of grade repetition.  The comparable figures for large-scale public programs are 5 percentage points and 8 percentage points, which are significantly less than the model program estimates in both cases.[65]


Social Development

       Most long-term studies of educational interventions for disadvantaged children have emphasized research on cognitive and academic outcomes.  However, most studies that assessed effects on social behavior have found positive effects (though a few have found no significant effects), and no study reported elevated aggression beyond Grade1.[66]   Five studies of educational interventions that investigated long-term effects on social behavior found positive effects on classroom behavior, social adjustment, and crime and delinquency reports.[67]  This includes two of the three studies that found elevated aggression associated with full-time child care that began in infancy.[68]  The third found no long-term effect on crime and delinquency, but rates were low for both groups.[69]

New Long-Term Research


Recent research on the long-term effects of Child Parent Centers (CPC) in Chicago provides an extremely valuable addition to knowledge regarding early education for disadvantaged children.[70] This longitudinal study with a sample of over 1500 children estimated the effects of a Title I funded half-day preschool and extended elementary program from ages three to nine operated by the Chicago public schools.  Separate estimates are provided for the preschool and elementary components and effects are estimated through age 21.  Controlling for family economic disadvantage, CPC preschool participants had significantly lower rates of special education placement, grade retention, juvenile arrest, and arrest for a violent offense.  They also had significantly higher achievement test scores in reading and math through age 15 and a higher rate of high school completion.  Effect sizes are in the 0.20 to 0.50 range, perhaps on the high side for large-scale programs generally. Effects are somewhat larger for children in the highest poverty neighborhoods.

In addition, the CPC study data were used to estimate structural models to investigate the chain of effects from preschool program to long-term outcomes. These analyses support the view that early education’s long-term effects on achievement and school success primarily result from initial effects on cognitive abilities.  These results replicate findings of structural equation modeling with the much smaller Perry Preschool data set, and the estimated chain of effects is remarkably similar to that for the Perry Preschool program.[71]

Costs and Benefits


While skeptics of  making early education more broadly available through public funds frequently cite cost as the basis of their objections, some research has shown that quantifiable benefits result that can make a high quality early education program cost-effective when properly accounted for. Barnett has estimated the costs of benefits of a high quality early education program based on the findings of the Perry Preschool study.[72]  The cost savings to society from avoiding crime and delinquency contribute a great deal to benefits.  However, there also are important economic benefits from reducing the direct costs of educational failure and from increasing adult economic success by preventing educational failure.  These benefits are not hypothetical, but are based on demonstrated increases in earnings and employment and decreases in reliance on public assistance.  His estimates reveal a high rate of return, comparable or better than one could expect to earn from investing in the stock market.  Even after discounting to calculate present value (a financial technique for making present costs and future benefits comparable), the estimated benefits are roughly ten times the costs.[73] It is important to note that this includes none of the economic benefits that a full-day, year-round program might generate by enabling parents to work more or participate in education and training.  Barnett’s results have been confirmed by a recent Rand report[74] that scrutinized his estimates and by similar estimates finding that the benefits of the Chicago Child Parent Centers far exceeded costs.[75]


Program Design and Effectiveness


From the evidence reviewed so far, it should be clear that some programs are more effective than others.  Educational interventions for disadvantaged children, including Head Start and public school programs, have larger estimated effects than child care programs.  This is true whether child care program effects are estimated for the general population or for disadvantaged children.  Model programs have larger estimated effects than Head Start and public school programs. However, some caution is required in drawing conclusions because programs vary with respect to the disadvantage of the children served and their social, political, and economic contexts, as well as in their design. 

Nevertheless, it seems clear that a dose-response is observed with respect to quality, or intensity of resources provided.  Studies of the effects of child care quality find that higher quality is associated with greater effects, and the quality of child care generally is lower than the quality of large-scale public programs, which in turn are of lower educational quality than model programs.[76]  Child care programs typically produce smaller effects even with disadvantaged children, compared with Head Start and public school programs. Studies that compare model programs with large-scale public programs (including child care) serving the same population find model programs to be more effective, confirming the cross-study inference.[77]

Additional guidance regarding program design can be gleaned from analyses of the model programs, cross-study comparisons of programs and their outcomes, research on variations in the quality and effects of child care programs, and research on the effectiveness of elementary school education.  Conclusions drawn from all of these sources are remarkably consistent.  More highly educated, better prepared, and better compensated teachers are more effective.[78] Smaller class sizes and better teacher-student ratios result in better teaching, more individual attention, and larger cognitive gains that improve achievement and school success, especially for disadvantaged students.[79] Other characteristics of programs that have generated the largest achievement and other gains for disadvantaged children include: a strong focus on language, strengthening children’s cognitive abilities generally, interactions that prepare children for the discourse patterns and other demands of school without pushing down the elementary school curriculum, individualized support for learning, regular opportunities for teachers to reflect with highly knowledgeable leaders or others, and collaborative relationships with parents to support the child’s learning and development.[80]

Research provides less guidance than policy makers and administrators might like regarding two key aspects of program design that have significant implications for cost: age of start and hours per year (length of day and days per year).  Highly intensive programs beginning earlier have had larger effects than those in which children start later, but the optimal entry age is unclear as each additional year adds to cost.  Two longitudinal studies indicate that programs beginning at age three produce substantial long-term benefits for disadvantaged children and that the benefits substantially exceed the costs.[81]  Even intensive programs beginning at age four might bring significantly fewer disadvantaged children up to the thresholds of learning and development required for early school success.  With respect to length of day and number of days per year, the research on the relative lack of progress for disadvantaged elementary school children during the summer is suggestive, and many parents may choose not to send their children to programs that do not address their needs for child care.[82] In addition, benefits from effects on parental employment associated with child care should be incorporated into any assessment of costs and benefits.


Summary and Recommendations


Pre-kindergarten education for disadvantaged children can greatly increase their cognitive abilities, leading to long-term increases in achievement and school success. Although general cognitive abilities as measured by IQ may only temporarily increase, persistent increases can be produced in the specific abilities measured by standardized achievement tests in reading and math.  In addition, programs can have positive effects on children’s long-term social and emotional development, reducing crime and delinquency.  To reap all of their potential benefits, pre-kindergarten programs for disadvantaged children must be intensive, high in quality, and emphasize  both cognitive and social development.  

Pre-kindergarten programs for disadvantaged children are among the most strongly evidence-based of those approaches to improving academic achievement and educational attainment that have been tested. However, they will produce the desired results only if implemented in accord with the principles for effective programs that emerged in this review.  These include:

·          Class sizes and child-teacher ratios must be kept low. The research literature suggests that the best practice is probably a class size of 15 with a teacher and an aide.

·          Teachers must be highly qualified, with at least a bachelor’s degree and with specialized training in early education, and must be paid well.

·          Curricula must be intellectually rich and sufficiently broad to address children’s developmental needs in all domains.

·          Programs must have an infrastructure adequate to support best practices, professional development, and ongoing evaluation and accountability.

·          Programs must engage in an active partnership with parents and accommodate their needs, including their needs for child care.

·          Programs should start no later than age three.   Beginning prior to age three might produce substantially better results, but only if a highly intensive center-based program is provided up to school entry.

·          Resources should be focused primarily on disadvantaged children, recognizing that income is not the only risk factor for poor achievement and that the poverty line is an arbitrary cut-off for educational purposes.  Universal pre-kindergarten programs can target resources on disadvantaged children by providing them with smaller classes, better teachers, more hours, and a sliding fee scale so that higher-income families share the cost.

·          The existing array of public school, Head Start, and private programs all can be used, but both standards and resources must be substantially increased to produce the desired results. There are many advantages to such a strategy, but the time and costs of increasing quality to the necessary level should not be underestimated. 

The way that educational costs are conventionally calculated, the foregoing recommendations will be seen as expensive.  However, they are not as expensive as the costs of failing to implement them: poor achievement, high rates of school failure and special education, low productivity, and high crime and delinquency.  Also, because disadvantaged children are highly concentrated geographically, these costs contribute to problems of segregation, urban decay, and suburban sprawl that add to the costs of current policy.[83]  From this perspective, it is difficult to see how society can afford not to implement high-quality pre-kindergarten education for disadvantaged children.           


2. Class-Size Reduction in Grades K-3


Executive Summary


Summary of research findings


Reducing class size in Grades K-3 has been found to have academic benefits in all subject areas, especially for children living in poverty. Studies published since the mid-1980s show that classroom behavior and test scores improve while students are in small classes. Further, the improvement persists through the middle school and high school years, even though students return to full-size classes. To reap the full range of benefits, it is important that pupils enter small classes in the early years (Grades K or 1) and continue in small classes for three or more years. Students who attend small classes are also more likely to take college-entrance examinations; this is especially true for minority students.




·          Resources should be provided to schools and districts serving low-income pupils to restrict class sizes in the primary grades to no more than 18 pupils.

·          To ensure that the research-documented benefits of small classes are realized, policies for implementing small classes should include the following provisions:

·          Begin class-size reduction in K-1 and add additional grades in each subsequent year.

·          Use the reduced-class model supported by the research: one teacher in a classroom with 18 or fewer pupils. Pupils assigned to small classes should represent a cross-section of students in the school, not just difficult-to-manage students.

·          Plan for class-size reduction in advance, hiring fully-qualified teachers. Additionally, some programs of professional support and development are likely to be helpful.

·          Systems should be established to monitor class-size reduction initiatives continually and closely, providing feedback to administrators, policy makers, and parents about the successes of the program. Teachers should be afforded opportunities to discuss problems as they arise, and to have them addressed by the school administration.



2: Class-Size Reduction in Grades K-3


By Jeremy D. Finn


State University of New York at Buffalo


       The advantages of small classes have been touted by parents and educators throughout modern history. Only in recent years, however, has there been a significant impetus for reducing class sizes in American public schools. This is partially due to the fact that teachers, parents, policy makers, and the courts understand the importance of small classes for teaching and learning, that education has risen to the top of state and national agendas, and that high-quality research has demonstrated the academic and behavioral benefits of small classes, especially for children at risk. This report summarizes the current state of research on class-size reduction and its implications for educational policy – especially as it pertains to the academic performance of students at risk.


Class-size reduction research


The impact of class size on educational outcomes is among the most researched areas in education. By the 1980s, more than 200 studies had appeared on the topic. Some early studies did not establish a connection between smaller class sizes and student achievement, but mainly attempted to weigh the value of small classes against larger classes. Others suffered from problems of methodology and data collection. Most acceptable studies, however, supported the importance of smaller classes in promoting student success. In a review of early studies, Educational Research Service[84] and Robinson[85] concluded that reducing class sizes  in the primary grades to 22 or fewer students appeared to have a beneficial effect on reading and math scores, especially for economically disadvantaged pupils. Since that time, more sophisticated experiments have confirmed and extended this conclusion.

The first refined analysis to connect reduced class size to academic achievement was a 1978 meta-analysis by Glass and Smith of 77 earlier research studies.[86] This analysis found that not only did small classes improve the chances for academic achievement, but that small classes could also be used as a predictor of student success. Glass and Smith showed that “as class size increases, achievement decreases.” Repeated studies have provided evidence of important relationships between the number of students in the classroom and the success of teaching and learning in the same classrooms. This research demonstrated that an appropriate class size was fewer than 20 students, and that the greatest benefits of small classes are obtained in the early grades.

Large Scale Studies


Based on this early work – particularly the findings of benefits to poor students and to young students – beginning in the mid-1980s some large-scale projects and an actual experiment in class size and student outcomes were started. Among them were Indiana’s Prime Time; HB 72, which limited class sizes in Grades K-4 to 22 students in Texas; STAR and its related studies in Tennessee; Wisconsin’s SAGE Project; and California’s massive Class-Size Reduction (CSR) effort. Prime Time and STAR were particularly important because they provided the motivation for many districts, states, and the federal government to reduce class sizes on a large scale.  Several overviews of the more recent class size research are available including a book by Achilles[87] and monographs by Finn[88] and by Ehrenberg, Brewer, Gamoran, and Willms.[89]

Prime Time in Indiana


Between 1981 and 1983, Indiana launched Project Prime Time as a statewide initiative. Prime Time began this reduction with first grade, but was not entirely a CSR initiative. In particular, it added teacher aides to classrooms to reduce the adult-to-child ratio – not truly resulting in small classes. Prime Time reported mixed results with some gains in student achievement on reading and math scores. Gains in reading were larger than those in math.[90] An important outcome of Prime Time was the demonstrated feasibility of large-scale efforts to change classroom organization in the pursuit of improved student learning.

Project STAR in Tennessee


From 1985-1989, the STAR (Student/Teacher Achievement Ratio) experiment was conducted in Tennessee. This large-scale (n=11,600) longitudinal study of class sizes provided the legislature and administrators with convincing data to support class-size reduction for students statewide. At each grade level K-3, a strictly controlled study was set up to examine whether small (13-17) classes made a difference in student accomplishments in the early years, when compared to regular (22-25) classes, or regular classes with a full-time teacher aide.[91]

Because of its magnitude and scientific rigor, the results of STAR carried more weight than the earlier studies. The most important findings are:

·        In every grade level (K-3) students in small classes outperformed students in larger classes on every achievement test administered – in all subject areas and on both norm-referenced and criterion-referenced achievement tests.

·        The benefits of small classes were greater for minority students and students attending inner-city schools than for white students or those in non-urban areas. In many cases, the advantages were two to three times as great for African-American students as for white students.

·        New analyses of the STAR data have shown that both starting early (K or 1) and continuous participation (3 to 4 years) in small classes lead to the greatest benefits.[92]

Students who had participated in Project STAR in K-3 were followed after they returned to full-size classes in Grade 4. The most important long-term findings are:

·        Pupils who attended small classes in K-3 performed significantly better in all academic subjects in all subsequent Grades, 4, 6, and 8.[93]

·        The more years pupils spent in small classes in K-3, the longer the benefits lasted into later grades. For example, at the end of Grade 6, pupils who had attended small classes for one year had a 1.2-month advantage in reading over pupils who attended full-size classes. Pupils who had attended small classes for two years had a 2.8-month advantage. Three years in a small classes produced a 4.4-month advantage, and four years produced a 6-month advantage in reading.

·        Pupils who attended small classes in K-3 were more likely to graduate from high school and more likely to take SAT/ACT college admissions tests. The impact on minority students was particularly strong, thus reducing by 60% the gap in SAT/ACT rates between black students and white students.[94]

Additional strength was added to the STAR results by secondary analysts at the University of London, The University of Chicago, and Princeton University who examined the STAR data using different statistical approaches.[95] All approaches yielded the same conclusions.

Other large-scale CSR efforts, described below, have confirmed the basic findings of STAR in other locations. Research using the STAR data continues today; researchers are examining the long-term effects of small classes on teen births[96] and on employment and schooling after high school.[97]

Besides the impact on academic achievement, Project STAR revealed that:

·        Teacher morale is increased in small classes, a finding consistent with all prior research.

·        Teachers of small classes spend more time on active teaching and less on classroom management, a finding substantiated in other research in addition to STAR.

·        There are fewer disruptions in small classes and fewer discipline problems, a finding replicated in other studies.

·        Students’ engagement in learning activities is increased.[98]

·        In-grade retentions are reduced.[99] Because retained students are disproportionately minority, male, and from low-income homes, the reduction in retentions also reduces the achievement gap in schooling.[100]

Project STAR found no achievement advantages associated with full-time teacher aides. In the most complete examination of this issue, researchers concluded that there were no differences in academic achievement “between ... students in teacher aide classes and students in regular classes on any test in any grade (K-3).”[101] The authors continue:

In several instances, students in aide classes performed more poorly than students in non-aide classes... In terms of learning behavior, again no significant differences were found ... In several instances, behavior was marginally poorer among students in classes with aides.[102]


Also, the problems teachers encounter in teaching and in managing classes “are not reduced when a teaching assistant is present.”[103]


STAR and the black-white achievement gap

The disproportionate impact of small classes on minority students and students attending inner-city schools reduced the achievement gap between black and white students. For example, the black-white gap in pass rates on the first grade reading mastery test was 14.3% in full-size classes – that is, 14.3% more whites mastered the reading tests. In small classes, the gap was reduced to 4.1%. Both black students and white students gained significantly by being in small classes, but black students gained more.[104] Other research has examined the achievement gap in more detail and reached the same conclusions.[105] Bingham performed a comparative analysis examining white vs. minority differences and also concluded that smaller class sizes are an effective strategy in reducing the gap. According to Bingham, the smallest white-minority gap was associated with small classes beginning no later than in Grade 1 and lasting for a minimum of two years. The finding of a reduced black-white gap in college aspirations, indicated by students’ taking SAT/ACT tests, shows a positive impact on behavior in later grades as well.[106] The effect of small classes on the achievement gap has been confirmed in other class-size initiatives, particularly Wisconsin’s Project SAGE, discussed below.


Critique of Project STAR

Despite the exceptional research design used in STAR, some factors were beyond the control of the research team. In particular, students moved from one neighborhood to another and changed schools in the process. This led to some attrition from STAR schools over the four-year period and, in a small number of cases, students changing from one class type to another when they changed schools. Economist Eric Hanushek has suggested that these factors may have compromised STAR’s findings, a criticism echoed by Witte[107] as well as by Ehrenberg et al.[108] These issues have been addressed by several data analysts. Krueger[109] undertook a thorough analysis of attrition in STAR. His work showed that neither of these factors produce “bias” in the study’s main findings, that is, average differences in performance among the class types. Hedges and his colleagues[110] compared the Grade 3 performance of STAR participants who were still in the sample in Grades 4, 6, and 8 with that of participants who left the sample. Again, the difference between small-class and large-class students was the same for “stayers” and “leavers.” Although attrition did result in a somewhat selective long-term sample, the basic findings of the experiment still hold.


Other large-scale class-size initiatives


Project STAR provided the scientific support for the long-held belief of educators and parents that small classes in the early grades had many advantages. Because the impact was particularly strong for students at risk, STAR helped motivate many districts, states, and even the U.S. Department of Education to undertake further reduced class initiatives. By the year 2000, approximately 35 states had class-size legislation.

Wisconsin’s Project SAGE, the Burke County project in North Carolina, the massive CSR program in California, and the federal initiative begun during the Clinton administration are among the CSR initiatives that were accompanied by formal evaluations. These programs were not intended to be controlled experiments: their foremost purpose was to provide an intervention – small classes – whose efficacy had already been demonstrated. Occasionally, critics lose sight of that purpose and comment on these programs’ lack of tightly controlled research designs.[111] Despite this criticism, each of the programs was accompanied by an extensive evaluation and each produced results consistent with those of STAR.

The SAGE Program in Wisconsin


The Student Achievement Guarantee in Education (SAGE) program is a statewide effort to increase the academic achievement of children living in poverty by reducing the student-teacher ratio in kindergarten through Grade 3 to 15:1. The program began in 1996 and was targeted toward schools with a high proportion of students living in poverty. School districts in Wisconsin that had a least one school with 50% of children or more living below the poverty level were eligible to apply for participation in SAGE. Within these districts, any school with 30% of students or more below the poverty level was eligible to become a SAGE school. Funding was set at a maximum of $2,000 per low-income student enrolled in SAGE classrooms (K-3). During the 1996-7 school year, 30 schools in 21 school districts, including seven in Milwaukee, began the program in K-1. Grade 2 was added in these schools in 1997-98 and Grade 3 in 1998-99.

The program requires that participating schools implement four interventions: (a) reduce the pupil-teacher ratio within a classroom to 15 students per teacher, (b) establish “lighted schoolhouses” open from early in the morning until late in the evening, (c) develop “rigorous” curricula, and (d) create a system of staff development and professional accountability. While most class-size reductions were accomplished by assigning 15 or fewer students to a teacher within one classroom, some alternate configurations were also adopted. They included classrooms of approximately 30 students with two-teacher teams, shared space classrooms with two separate teaching spaces each with one teacher and about 15 students, and floating teacher classrooms where an additional teacher supports classes of about 30 students during reading and math instruction. The class-size reduction was an immediate intervention in the schools whereas the other SAGE provisions were implemented by schools with considerable variation and, at times, with considerable delays.[112]

 To determine the impact of SAGE pupil-teacher reductions on student achievement, the SAGE evaluation uses a quasi-experimental, comparative change design. The quasi-experimental design was used because it was not possible to randomly assign students and teachers to classrooms and to keep classroom cohorts intact from year to year. The evaluation uses a control or comparison group of classrooms from districts participating in the SAGE program for the purpose of assessing the impact of SAGE class-size reductions. These comparison schools have normal class sizes, and, as group, resemble SAGE schools in family income, achievement in reading, K-3 enrollment, and racial composition.

The longitudinal evaluation of the SAGE program has produced substantial scientific data on the effects of small classes in Grades K-3. The positive impact of small classes on student achievement in SAGE classrooms, especially for minority students, has been a consistent finding for four years and has confirmed earlier findings from STAR. The greatest achievement gains were made in first grade with second- and third-grade students maintaining the gains. Perhaps of greater significance, SAGE has provided guidance for policy makers and administrators about how best to implement small classes at the district and local level through extensive non-experimental data collection such as principal and teacher questionnaires and classroom observations and teacher interviews.[113]

Like STAR, Project SAGE has not been without its critics. Some criticisms concern weaknesses in the project’s experimental design and methods of analysis, for example, the lack of random assignment, student attrition, a ceiling effect on some of the tests.[114] These comments may not be germane because SAGE, although it included a formal evaluation component, was not intended to be a controlled experiment. More pertinent are the comments that the expansion of SAGE has met with a shortage of qualified teachers and classroom space, especially in the Milwaukee Public Schools. To deal with these problems at some schools, teachers have “doubled up,” putting two teachers in one classroom with 30 students.[115] Team teaching presents both benefits and problems. Among the latter, teachers have to work well together and collaborate well in order for instruction to be optimal. Extensive advance planning is needed in order for this to occur, a principle also learned in California (below).


The Burke County Project in North Carolina


Studies of the effects of small classes in Burke County, North Carolina, reinforce SAGE and STAR findings, while addressing questions about financial and educational policy implications of CSR.[116] With the goal of improving education in relatively poor Burke County, a pilot program in 1991-1992 reduced class size to 18 in Grade 1 in four schools, and in Grades 2 and 3 in subsequent years. Pilot program results were highly positive. On the strength of these findings, the program was extended in 1995-1996 to all elementary schools, Grades 1-3, providing the same positive findings. By 2000, classes of about 17:1 were in all 17 schools with Grades 1-3. By comparing the CSR classes with the control classes, researchers reported higher rates of time on task for students and more emphasis on student interaction. The smaller classes significantly outperformed regular classes in math and reading at the end of Grades 1, 2, and 3, and later these same students continued to outperform the others after returning to regular classes in Grades 4 and 7. An important feature of the Burke County initiative was the ability of administrators to implement small class sizes with no increase of per-pupil expenditures for the district. This was accomplished through the careful reallocation of existing resources, especially the reassignment of qualified staff members who had not been teaching their own classes all day, to reduced size classes.


The California CSR Program


Class-size reduction began in California in 1996. Within a period of several months, new teachers were hired and placed in Grade K-3 classrooms across the state, reducing class sizes to 20 pupils or fewer. In three years of operation, this largest CSR initiative has resulted in 28,000 new teachers being deployed and virtually every classroom in Grades 1-2 being reduced in size. Since the program was implemented so quickly, very few large classes were available to serve as a comparison group for evaluators. The evaluation has focused on Grade 3, in which small but statistically significant achievement gains were reported in reading, language, and mathematics.[117]   The benefits of small classes were in the range 0.05 to 0.10 standard deviations.  Although these would be considered small effects, they replicate the results from project STAR for pupils who entered small classes in Grade 3; in STAR, the largest effects were obtained for students who entered small classes in earlier years (K or 1).

California’s experience provided important insight into the types of planning needed before implementing a large-scale CSR initiative: The speed with which teachers were hired resulted in many teachers being placed in classrooms who had not even completed their formal teacher credentialing programs. As a result, in the first year of California’s CSR program, the percentage of K-3 teachers who were not fully credentialed rose from 1.8% to 4%; this figure increased to 12.5% and13.4% in subsequent years.[118] Had the program been implemented in phases, the drop in the preparation and experience levels of California’s teachers could have been remedied.


Federal Initiatives


Begun in 1999-2000, the federal class-size reduction program provided funds to schools serving high-poverty populations. By the second year of operation the program supported CSR initiatives in 36 major urban school systems and increased its funding to $1.3 billion from $1.2 billion. School districts targeted their funds toward low-achieving schools and those identified as highest-need schools. Local schools districts used 87% of the federal funds to hire new teachers. In its first-year report, The Class-Size Reduction Program: Boosting Student Achievement in Schools Across the Nation, the U. S. Department of Education highlights the expected benefits of class-size reduction. Federal class-size reduction funds were aimed at helping to make classrooms more manageable so that teachers could focus on teaching and learning. Further, teachers were expected to report more enthusiasm for teaching and opportunities to address students’ individual needs, accompanied by a boost in students’ reading scores and overall achievement scores.[119]

The federal class-size reduction program permitted schools to implement several models of small classes, including some that were not small classes at all. The latter included large classes (e.g., 32-40 pupils) that were team-taught by two full-time teachers, and pairs or triplets of larger classes (e.g. 30 pupils) that shared a “rotating” teacher who would spend part of the day in each classroom. Both of these models reduce the pupil-teacher ratio (PTR) in classrooms but do not reduce the actual class size, that is, the actual number of pupils in the room who interact with the teacher full-time each day. STAR researchers have pointed out that the strong findings of reduced-class benefits do not apply to these settings.[120]

In its first year of operation, approximately 29,000 new teachers were hired under the federal CSR initiative. An evaluation contract was awarded to Abt Associates, a Boston firm. However, the ensuing calendar year saw a change in administrations in Washington. President Bush’s education plan, “No Child Left Behind,” targets federal class-size reduction money for elimination, apparently disregarding the research base that supports class-size reduction. Nevertheless, with or without support from the federal government, small classes have become standard practice in many states and districts across the country and are producing noticeable benefits to teachers and pupils.


Research, Policy, and Practice


The research on class size supports a number of practices that can be implemented to enhance students’ academic performance. The benefits of small classes, especially for minority students and students from low-income homes, have been confirmed time and again. STAR and the studies to follow STAR have also drawn these conclusions:

Timing and continuity of class-size reduction. The most recent analyses of STAR data show that the greatest initial impact on student achievement is obtained when students enter reduced-size classes in kindergarten or Grade 1.[121] Pupils who attended small classes for at least three years had significant sustained benefits through Grade 8; the carry-over effects of fewer than three years were mixed. Several large CSR initiatives have started in Kindergarten or Grade 1 and expanded to Grades 2 and 3 in subsequent years. This is good policy, especially if the same students attend small classes for several years in a row.

What does ‘small class’ mean? Research on class size has been conducted according to high scientific standards; this cannot be said of any other educational intervention to improve pupil achievement. Project STAR has received praise from scientists and policy makers;[122] it has provided the starting point for several national conferences of researchers concerned with the need to base educational decisions (like medical decisions) on strong empirical evidence.[123]

The evidence provided by STAR, and by other CSR efforts that confirm STAR findings, are not relevant to other classroom arrangements. The results tell us little or nothing about programs that reduce pupil-teacher ratios without decreasing the number of students in the room. They tell us little or nothing about team-taught classrooms, about “push-in” or “pull-out” classrooms with a common teacher, or about part-time class-size reduction, for example, just for reading. The STAR results do tell us about one alternative reduced-ratio arrangement: a full-size class with a full-time teacher aide does not work.

Alternative class configurations such as team-taught classes or classes with support teachers for reading and math instruction need their own research to evaluate whether or not they offer viable options to increase student achievement. This research is important, especially given the shortage of space faced by many schools and districts. However, for schools to benefit from the strong findings about small classes, the accumulated body of research indicates that actual class sizes must be small: that is, fewer than 20 pupils for the entire school day.

Professional support and development for teachers of small classes. Due to the short lead time in hiring teachers for California’s CSR program, the quality of the entire state’s teaching force declined. In other locations, difficulties in locating and placing qualified teachers in newly created classrooms has created a level of disorganization that required weeks or months to settle.[124] These dynamics can easily offset the benefits that small classes provide.

The experiences of districts across the country show that CSR initiatives benefit from careful advance planning. The most effective settings were those in which school administrators, parents, and community leaders were informed about the program and what it was expected to accomplish.[125] Several initiatives were hindered by the lack of lead time to find space for CSR classes or to identify teachers before the school year began.

Professional support and development activities for teachers have been useful as well. Research has demonstrated clearly that the academic benefits of small classes are obtained without programs of professional development. Project STAR demonstrated advantages with no intervention other than reduced classes (and teacher aides). Nonetheless:

·        Many teachers being placed in elementary classrooms are new to teaching, new to the classroom, and new to their school setting. They have a critical need for help “getting started” and for targeted on-the-job training.

·        Many veteran teachers are transferring from other kinds of settings to small classes. The instructional practices that may be ingrained from years of experience in these settings are often not current best practice.

·        It may be possible to enhance the benefits of small classes by taking advantage of the opportunities the class size provides; good professional development can help make this happen.

The recent report, “The Professional Development and Support Needs of Beginning Teachers,” discusses this issue in depth.[126] Particular classroom strategies and particular domains of professional support are identified in the report that are especially useful when implementing CSR programs.

The need to monitor CSR programs closely. In recent years, many districts have undertaken CSR programs, both with and without an accompanying evaluation. The absence of a systematic evaluation can create problems subsequently. It may not be necessary to document that academic achievement is improved by CSR in every site; the benefits have already been demonstrated scientifically. Follow-up evaluation is necessary, however, to make sure that smaller classes are implemented correctly and that problems are addressed quickly. Several evaluations, including one in Buffalo, New York, were able to identify implementation problems during the school year and to provide mid-course corrections. It is also important that basic information is available to administrators, parents, and legislators to demonstrate that the investment in small classes has been spent properly.

There is also a great deal yet to be learned about small classes and the opportunities they provide, as the SAGE, California, Burke County, and US Department of Education programs have demonstrated. A regular system for monitoring reduced-class programs, addressing problems that arise, and reporting progress to administrators and the public has been demonstrated to be an important ingredient of CSR initiatives. Several models have been forwarded to help districts monitor or conduct a formative evaluation of their CSR program.[127]


Questions that Remain


Many questions about small classes remain to be answered. For example:

·        How small does a class have to be in order to reap the benefits demonstrated by STAR and other studies? Most CSR interventions are using “fewer than 20 pupils” as their guideline, but research has not established a specific threshold that must be met.

·        What are the effects of small classes in later grades, for example, the middle school years or high school years? The early overviews of research on class size[128] reported mixed results based on a relatively small number of studies. Recent years have not seen an increased number of studies of class size in the upper grades. Several studies have been performed using a federal data set, the National Longitudinal Study of 1988. These have produced non-significant results[129] and mixed results,[130] respectively, for Grade 8. The complexity of the situation, with students moving from class to class for different subjects, has undoubtedly discouraged research in this arena.

·        What are the effects of combining small classes with other interventions, especially those targeted to students at risk, such as full-day kindergarten, preschool programs, remedial reading programs?

·        What are the long-term effects of small classes on high school and post-secondary outcomes, for example, college attendance and employment? Researchers are currently studying these questions.

The broadest question not fully answered to date is, “Why is it that small classes work as well as they do?” Many studies of teachers’ instructional strategies have compared teachers in small classes with teachers in full-size classes, but few if any systematic differences have been found.[131] It is clear that small classes make additional time available to teachers – time that would be spent on record keeping or classroom management in larger classes.[132] The time saved may be used to provide more active teaching to the class and, in theory, more individualized instruction. However, research has not shown consistently that students in small classes receive more individual attention or instruction directed to their specific needs.[133]

The strongest hypothesis about why small classes work concerns students’ classroom behavior. Evidence is mounting that students in small classes are more engaged in learning activities and exhibit less disruptive behavior.[134] Educational and psychological theory explain why this may occur. For example, in a small class, each student is constantly on the firing line; he or she may be called on at any time to answer questions or complete assignments. Students cannot escape by sitting in back corners of the room or avoiding the teacher’s attention. By the same token, teachers cannot ignore students that they might otherwise prefer not to attend to, for whatever reasons. Psychologists have forwarded the principle of “diffusion of responsibility” to explain why members of small groups tend to take more individual responsibility than do members of large groups – a principle supported by empirical research.[135] Further, if one’s classmates are well behaved and engaged in the learning process, then this behavior will become the norm that others will follow. Research on the socializing effect of group norms is also extensive.[136]

Further research is needed to explain fully why small classes have behavioral and academic benefits. However, the evidence to date suggests that it is the very feature of smallness that has the greatest impact. If this principle is correct, then it is also clear that large classes with two teachers (reduced pupil-teacher ratio but not reduced class sizes) are less likely to yield the same benefits.


Controversy over the value of reduced class size


Despite the appeal of small classes and despite the strong evidence of their value, the ideas have not gone unchallenged. In particular, economist Eric Hanushek has engaged in a vigorous campaign to convince policy makers and the public that small classes are not an efficient way to improve student performance. Few researchers take this position, but Mr. Hanushek has promulgated this view widely in the professional and public media. The view is consistent with his thesis of many years that fiscal resources spent on public education are not related to academic outcomes.

The conclusions are based on two sets of analyses, summarized in a monograph published by the University of Rochester, then Professor Hanushek’s institution,[137] and in a document giving both sides of the argument produced by the Economic Policy Institute.[138] The first analysis is an examination of pupil-teacher ratios and academic performance for the entire country from 1970 to 1995. According to Hanushek, although the ratios declined regularly during that period, academic performance as indicated by the National Assessment of Educational Progress (NAEP) did not increase. The second analysis is a meta-analysis of the results of 277 econometric studies of the relationship between educational “inputs” (including class size) and academic achievement. According to Hanushek, these studies show no systematic relationship with class size.

Hanushek’s position holds sway with some policy makers, and he has advised the current administration, which has marked reduced-class-size funds for elimination. A number of education researchers and other economists, not to mention most practitioners, dispute Hanushek’s conclusions, however. Among the points that have been forwarded to rebut Hanushek’s position are these:

All of the studies cited by Hanushek are studies of pupil-teacher ratios (PTRs), mainly computed at the district, state, or national level. Pupil-teacher ratios at these levels do not reveal the actual class sizes – that is, how many students are actually in classrooms. The PTR includes regular teachers, special education and Title-I teachers, teachers who don’t have their own regular classrooms (for example, remedial teachers, language, music, or art teachers, or librarians), administrators, and other staff members as well.[139] Pupil-teacher ratios at these highly aggregated levels reveal little or nothing about the actual classroom conditions in which pupils are learning. In fact, it has been shown that large urban districts tend to have low pupil-teacher ratios because of the large numbers of Title I and remedial teachers, yet often have badly overcrowded classrooms.[140] This distinction is discussed in depth in Ehrenberg et al., who concluded that “class size is not the same thing as the pupil/teacher ration.  Indeed, it is quite different.”[141]

Hanushek’s reviews do not include any of the studies of class size reviewed by either Glass and Smith or by Educational Research Service. He also does not include class-size studies such as Prime Time, Project STAR, or SAGE.

Project STAR, being a controlled scientific experiment, provides stronger evidence than is possible through “production function analysis,” the technique used in all the studies cited by Hanushek. A randomized experiment such as STAR is the highest quality research design available; it is the method of choice used by the Food and Drug Administration, for example. This point is acknowledged by Hanushek in these two manuscripts and others. For this reason, Princeton economist Alan Krueger concluded: “The design of the STAR experiment clearly produces results that are more persuasive than [all] the rest of the literature on class size.”[142]  

Hanushek’s conclusions are selected in order to show just one view of the data. For example, in order to show that NAEP scores did not increase in the period from 1970 to 1995, Hanushek focused on the reading performance of 17-year-olds, with no attention to the NAEP Grade 4 or Grade 8 results and no attention to topics that are taught explicitly to older students.

One extensive PTR study using NAEP data has been performed at Educational Testing Service.[143] The study involved a national sample of 10,000 fourth-grade students and 10,000 eighth-grade students. This study found significant gains in mathematics of reduced PTRs, with greater impact on fourth-grade students than on eighth-grade students. Also, the gains were larger for inner-city students than for any other group. This study is not included in the Hanushek review.

Hanushek’s methods of analysis have also come under attack. Researchers at The University of Chicago noted that Hanushek’s analyses did not take into account that some studies were more informative than others because they were based on larger samples.[144] They reanalyzed a portion of Hanushek’s data using meta-analysis methods that weight studies according to the sample sizes, and found the opposite conclusion – that resources (including class size) do have an impact on academic achievement.

Economist Alan Krueger performed an even more complete reanalysis of Hanushek’s studies.[145] First, Krueger noted that the 277 “studies” cited by Hanushek were in fact 59 studies from which 277 statistics (“effect sizes”) were drawn. Some studies contributed far more to Hanushek’s conclusions than others. (In fact, between them, two studies contributed 48 of the 277 effect sizes; as it happens, these two studies accounted for most of the negative findings reported by Hanushek.) Several other studies were misinterpreted or mis-coded before being entered into Hanushek’s analysis. Overlooking the latter issue, Krueger performed a complete reanalysis of Hanushek’s studies, counting each of the 59 investigations just once. In additional analyses, he also took into account that some studies were of higher quality than others, and that some studies had more atypical samples than others. In all three analyses, Krueger’s results were the reverse of Hanushek’s. He concluded that resources in general, and pupil-teacher ratios in particular, are significantly related to academic performance in the direction consistent with Project STAR: lower ratios associated with higher performance.


Summary and Recommendations


Class-size reduction is sound education policy. It has been shown to be effective time and again, and no serious challenge has been made to the research findings that support those conclusions. Educators have long known this. No school improvement effort relies on larger rather than smaller classes. Indeed, programs targeted to students with academic problems (for example, special education or other remedial programs) are all based on small-class arrangements. Parents often place children in private schools at least in part because of small classes. Many interventions, such as home schooling, Reading Recovery, or Success for All, rely on the ultimate small class, one-on-one instruction.

Research has now documented the advantages of small classes, especially in the elementary grades and especially for students who attend small classes for two, three, or four consecutive years. The effects are especially pronounced for minority students and those attending school in large urban districts. As a result, the achievement gap is reduced, both in the years while pupils attend small classes and later on when they consider applying to college. Teachers, meanwhile, benefit as well. They spend less time on classroom management and clerical tasks, and have more time available to get to know each student better. Reduced-size classes provide the opportunity for improved instruction and for increased learning to take place.

The weight of this evidence supports the following recommendations for policy-makers:

·        Resources should be provided to schools and districts serving low-income pupils to restrict class sizes in the primary grades to no more than 18 pupils.

·        To ensure that the research-documented benefits of small classes are realized, policies for implementing small classes should include the following provisions:

1)     Begin class-size reduction in K-1 and add additional grades in each subsequent year.

2)     Use the reduced-class model supported by the research: one teacher in a classroom with 18 or fewer pupils. Pupils assigned to small classes should represent a cross-section of students in the school, not just difficult-to-manage students.

3)     Plan for class-size reduction in advance, hiring fully-qualified teachers. Additionally, some programs of professional support and development are likely to be helpful.

Systems should be established to monitor class-size reduction initiatives continually and closely, providing feedback to administrators, policy makers, and parents about the successes of the program. Teachers should be afforded opportunities to discuss problems as they arise, and to have them addressed by the school administration.


3: Small Schools


Executive Summary


Summary of research findings


       Research on school size points to several conclusions about the benefits of smaller schools. Smaller school size has been associated with higher achievement under certain conditions. Smaller schools promote substantially improved equity in achievement among all students, and smaller schools may be especially important for disadvantaged students. Many US schools are too large to serve students well, while smaller schools, especially in impoverished communities, are widely needed.  The evidence favoring the benefits of small schools, however, cannot be generalized to so-called “Schools Within Schools,” which to date lack a substantial research base supporting the belief that they provide benefits equivalent to smaller schools.




       Policy makers should:

·          Find ways to sustain existing small schools, especially in impoverished rural and urban communities.

·          Acknowledge an upper limit for school size, acknowledgment that means many schools should be much smaller than the upper limit.

·          Not design, build, or sustain mega-schools serving upwards of 500 to 2,000 students, depending on educational level and grade-span configuration.

·          Design, build, and sustain much smaller schools in impoverished districts or districts with a mixed social-class composition.  In very poor communities, design, build, and sustain the smallest schools.

·          Not oversell smaller schools. Operating smaller schools in impoverished communities is good policy, but it is not a “magic bullet.”

·          Not believe that mega-schools serving affluent areas are necessarily excellent or even very good.  Most accountability schemes obscure this fact because they do not generally take socio-economic status into account.

·          Recognize that smaller schools in impoverished settings accomplish miracles even when test their scores are about average.



3: Small Schools


By Craig Howley

ERIC Clearinghouse on Rural Education and Small Schools, AEL




Even though the study of school effects has been a major sociological enterprise over the past two decades, empirical analyses tend to slight structural variables such as size.[146]

Matters have changed a bit since Morgan and Alwin made their observation in 1980, and, today, despite a surprisingly thin research literature, “small schools” is a concept in danger of becoming a slogan.  Because slogans can impede thoughtfulness, a critical assessment of the concept is now timely.

What are “small schools”?  What do different authorities mean by “small schools”?  Is there a difference between “small schools” as set off by quotation marks, and schools that just happen to be small?[147] What influence does school size exert on student achievement?  What do we know?  What don’t we know?  What relationship does school size bear to the achievement of poor children? What are the points of contention?  Given our inevitably limited knowledge, what are the implications for practice?

“Small schools,” in short, is not so simple a topic as it might seem at first glance.  This review aims to convey both the complexities and the practical applicability of research on small schools. In particular, it seeks to present the most substantive empirical work as the best chance for understanding this complex issue.[148]


Small Schools Research


Effusive praise of small schools is easily found in the education literature these days.  One of the most frequently cited syntheses, for instance, portrays small schools as superior on virtually all measures of concern.[149] Warrants for the conclusions drawn in that synthesis come from sources – magazine articles, evaluations of single projects, first-person narratives, and empirical studies (that is, actual research) – of widely varying quality, and readers are provided with no assessment of that quality.  Similar reports abound.[150]

In contrast with such syntheses, this one gives most weight to studies that exhibit larger sample sizes, peer-reviewed publication, and, for one set of studies, state-level replications.  Evaluations, syntheses, and anecdotal reports are used in the present review to support discussion of the focal studies.  This review also takes note of the substantial number of unknowns in the area of small-schools research, and of related methodological differences in the focal studies.


Defining Small Schools


The first challenge is to examine what we really mean by “small schools.”  The best empirical literature has focused its efforts simply on school size. 

Small schools exist everywhere, as a feature of the variability of school size.  Some states, however, maintain proportionally more small schools – sometimes far more –  than do others, but no agreement prevails, even among small-schools advocates, about what defines a small school.  Small in rural Vermont is apt to differ sharply from small in Queens, New York, and high schools in rural Vermont are considerably larger than they are in rural Montana.  This variability indicates that school size, more than class size, is an issue that requires research designs sensitive to within-state variability.[151]

In general, one can think of high schools enrolling 400 or fewer and  K-8 or K-6 elementary schools enrolling 200 or fewer (on the basis of a 2:1 ratio with high schools) as small. The related positions taken in state-level policies are very wide, and all of them lack solid justification from the research literature, which has not examined possible threshold effects of size.[152]

In cities and suburbs, “small schools” has recently become a reform movement.[153] Rural communities, however, struggle to maintain small schools in the face of states’ attempts to close them on business principles based on cheap inputs.[154] These differing interpretations have practical significance because confounding new, reformist small schools with extant, traditional small schools obscures the salient structural issues that are the actual object of most research related to small schools.


Norms of Size


In contrast to many nations, the U.S. Constitution is silent about the human right to education and leaves the provisions for schooling to the discretion of the various states.  The geography, history, economics, politics, and cultures of the states differ considerably, and, in consequence, school size varies substantially from state to state.

For instance, the percentage of 9-12 high schools enrolling 400 or fewer students (a small school by most definitions) ranges from 81% in Montana to 0% (none) in Hawaii, Rhode Island, and Vermont.[155]  Hawaii is also the state with the largest percentage of 9-12 high schools enrolling 1,000 or more students (92%).  Though there is a relationship between the rural nature of a state and the proportion of small schools it maintains, the relationship is not strong.  In comparison with the urban states of California (where 78% of the state’s high schools enroll 1,000 or more students), Florida (84%), Hawaii (92%), and Maryland (76%), such urbanized states as Illinois, New Jersey, and Massachusetts have only about 40% of high schools enrolling 1,000 or more students.[156] In the District of Columbia, just 22% of all 9-12 high schools enroll this many students, whereas 28% of DC high schools enroll 400 or fewer students. Thus, DC maintains proportionately more small high schools than Vermont.

There is an apparent relationship between school and district size as well.  States that have retained small districts are somewhat less likely to have created large high schools, all else being equal.  The data for Hawaii – which is administered as a single district – make sense in this light:  as a single huge district, it operates almost all high schools with 1,000 or more students and none with 400 or fewer.


Small Versus Smaller


Although many observers of the school size issue long for a uniform definition of small and large schools,[157] smaller and larger are by far the more useful terms, since, as suggested above, school size varies so dramatically by state.[158]  Look within states, rather than across states, for useful comparisons.  Vermont and California, for instance, confront dramatically different circumstances, and their de facto approaches to school size differ accordingly.  In making within-state comparisons, however, size rank (students per grade in rank order) needs to be viewed in consideration of grade-span configuration, educational level, and locale (rural, suburban, urban).  A small elementary school in Vermont will not be the same size as one in California.


Enrollment Per Grade as School Size


Why should the number of students a school enrolls be of much concern?  In fact, it turns out that school size is not best represented as total enrollment.  Surprisingly, exactly the same total enrollment can describe schools of quite different size.  This assertion is counterintuitive, but consider a 9-12 school with 800 students and a 9th grade academy enrolling 800 students.  Are they really the same size?  What about a 6-8 middle school with 800 students and a K-2 primary school with 800 students? It is easy to see that the 9th grade academy is really larger than a four-year high school with the same enrollment. Because it is both the expectation of the public and a professional norm that elementary schools are smaller than middle or high schools, the K-2 school is also “larger” than a middle school with the same enrollment. Thus in each case, the latter school is larger than the former, though total enrollment is the same in all four schools.

For this reason,  for both research and real-world action, enrollment per grade is a better metric of size than total enrollment. With this measure it’s easy to see that a ninth-grade academy with 1,500 students is really four times as large as a 9-12 high school with exactly the same total enrollment, just as a K-2 school enrolling 800 students is at least three times the size of a K-8 school enrolling 800 students.

If policy makers can better appreciate the role of grade span configuration in determining school size, they can avoid the misconception that merely reducing total enrollment in a school (by building new schools with narrower grade span configurations, a national trend) necessarily constitutes a reduction in school size.  More likely, this trend is resulting in larger schools.[159]  Reconfiguring the grade spans of schools is a time-honored tradition in American education used to make schools larger, but it could also be used to make schools smaller.[160]  For instance, imagine a district with 1,200 students in separate buildings that house Grades K-2, 3-5, and 6-8.  Each school houses 400 students, or 133 students per grade.  If, however, the same buildings were used to house three K-8 schools, the reconfigured schools would actually be smaller (400/9 = 44 students per grade).  Creating smaller schools, then, is probably easier than most educators and policy makers seem to realize.[161]  One research team has found substantial achievement benefits for smaller schools in impoverished communities, using this definition of size.[162]


The Upper Limits of Size


The notion that some size might be absolutely too large for a school is a comparatively recent development. Most of the 20th century was required to make schools as large as they are, and the emerging popular consensus on small schools probably reflects a widely held perception that schools have grown too large.[163]          Authoritative opinions now exist about the upper limits of school size.  Various authorities have given “informed judgments” about absolute upper limits of school size.  Predictably, the opinions differ significantly.  The author[164] has advised 1,000 as the absolute upper limit for high schools and 500 as the absolute upper limit for K-8 or K-6 elementary schools. Tom Sergiovanni, on the other hand, believes that no school should enroll more than 300 students.[165]  Deborah Meier clearly agrees.[166]   Lawton concluded that fiscal studies point to an upper limit of 500 for a K-8 elementary school.[167]  The bases of these opinions vary.  Howley and Lawton claim a basis in different research literatures (student achievement and finance, respectively).  Sergiovanni and Meier base their opinions on long and thoughtful practice.

Official policy has, however, also addressed the issue.  Florida recently adopted legislation setting 900 as the upper limit for new high schools, 700 for new middle schools, and 500 for new elementary schools.[168] Hawaii, with the largest high schools in the nation,[169]  adopted, and then scuttled, upper-limit legislation.[170] In a 1999 speech to the American Institute of Architects, former Secretary of Education Richard Riley suggested 600 as the upper limit for any school.[171]  The Education Commission of the States opined that 1,000 was the boundary between “large”  and “too large.”[172]  Finally, representing professional organizations, the National Association of Secondary School Principals proposed 600 as the upper limit for high schools.[173] Once again, all of these limits reflect the previously noted general public expectation and the professional norm that elementary schools require a lower size limit than middle or high schools.

To set an upper limit is to advise against the construction of schools larger than the limit. As has already been explained, however (see note 7),  just because a school’s enrollment falls under that limit does not necessarily make it small. This is an issue of logic and language, not of research findings.

Many schools, though not all, should probably be substantially smaller than the upper limits.  Additional information – aside from “authoritative opinion” – is clearly needed to make good judgments about locally appropriate size:  the findings from research summarized shortly suggest how much smaller they should be, at least for the purpose of maximizing the academic achievement of impoverished students.


The School Within a School


Because of the prevalence of the school-within-a-school (SWAS) strategy for coping with the organizational challenges of mega-schools, it’s worth reiterating the structural view of size adopted here. A structural view recognizes that a whole system is more than the sum of its parts; if a structure is broken apart, the advantages of the structural whole vanish.  On this view, larger schools that adopt administrative simulations of smallness are unlikely to exhibit the benefits of structurally smaller size.  In fact, research evidence of the effectiveness of SWAS is negligible.[174]

Educators tend to believe that a practice proven effective in one setting can be transferred to another.  This belief is the assumption behind “what works” and “validated programs.” When, however, the practice itself  and the setting (smaller school size) are one and the same, the assumption seems more especially dubious than usual.  Can one transfer a setting out of its setting? It seems illogical. Unfortunately, educators’ faith that processes can be effectively abstracted from the real structures that house them has popularized SWAS as a “small schools option.” In fact, separate schools housed under a single roof need to be truly autonomous. Otherwise, they will not be small schools, but just another grouping stratagem.


School Size and Student Achievement:

The Extant Literature


Despite widespread interest in small schools, few large-scale studies or replications have addressed the issue.[175] Certainly, a huge professional literature does address school size, (largely a result of the 20th-century push to build larger schools), but a surprisingly small proportion of this literature constitutes the research base, and even fewer studies jointly address the issues of school size and poverty as a major contemporary concern.

The ERIC database now indexes approximately 2,750 items with the terms “small schools” or “school size.”  Among this very large number of resources, however, just 47 research reports have addressed the relationship of achievement, school size, and poverty in some fashion between 1966 and 2001.  More surprising still, only 23 research reports – during the whole period from 1966 to 2001 – define school size, socioeconomic status, and student achievement as a major focus of investigation.[176] Within this surprisingly small literature, the studies that are conceptually related to the Matthew Project[177] are the only ones that pursue the issue systematically in multiple replications.[178]

Surprise at the thinness of the research base should be tempered by the realization that, until very recently, researchers, practitioners, and policy makers alike generally assumed that smaller schools, in general, must be academically inferior to larger ones, especially at the secondary level. Given this legacy, the early part of the research literature related to academic achievement and school size aimed to demonstrate that there was no significant difference between the achievement of larger and smaller schools, once statistical controls for socioeconomic status (SES) were imposed.[179]  The previous literature had not deployed such controls.

Subsequent investigations, building on the work of Noah Friedkin and Juan Necochea,[180] suggested that the interaction between size and socioeconomic status may explain the apparent absence of a significant difference at the school and district level. Another line of investigation focused not on school- or district-level test scores, but on student-level gains, and concluded that smaller high schools had an advantage, regardless of SES.[181]


Selecting the Best Research


Three bodies of research, contributed by three different teams of researchers, represent the best empirical work done to date examining the influence of school size on academic performance with particular attention to poverty or socioeconomic status.  The work done by these teams includes prominent peer-reviewed publication, quantitative methodologies, large scale research designs, and various replications and quasi-replications.  Issues of theory, method, and ability to generalize persist within this group of studies, and it would be wrong to say that all the evidence points to a single set of clearly demarcated conclusions.  Nonetheless, after presenting the evidence, the author offers a practical interpretation of the accumulated evidence for policy and administrative action.

The three bodies of work are those by:

1.)   Herbert Walberg and colleagues;[182]

2.)   Valerie Lee and colleagues;[183] and

3.)   Craig Howley and colleagues.[184]

The studies highlighted here all used some form of achievement test scores, not grades or GPA, as dependent variables.  They all used some form of regression analysis to estimate the influence of size on achievement.  The Walberg and Howley teams’ studies analyzed test scores at the school and district level at single points in time.  Lee and colleagues used individual students’ test scores, computed as gain scores (achievement change over time) rather than scores from a single point in time.[185]

Despite many similar qualities, then, these three bodies of work address somewhat different issues (school and district performance in the case of the Howley and Walberg teams, and growth in student learning in the case of the Lee team) and deploy different ways of looking at the issues (different regression models, national versus state data sets, and substantially different theoretical models and research questions).


School Size, Academic Achievement, and Socioeconomic Status


Circumstances influence student achievement complexly, of course, and simply comparing achievement levels in smaller versus larger schools will often show that smaller schools have lower achievement levels than larger schools, simply because smaller schools are often located in poorer communities in many (not all) states.


Dealing Responsively with SES


Valid comparison across schools and districts requires at least that the direct influence of poverty be accounted for in some legitimate fashion, since poverty (or SES) is one of the major influences on achievement; ignoring its well-documented influence is a mistake even worse than presuming that nothing can mitigate its influence.[186]  The three major lines of research assessed here adopted two methods of accounting for SES:  controlling for it (the usual method in educational studies) and theorizing about its particular interaction with school size.

Herbert Walberg and colleagues were among the first to control for SES in significant studies of the relationship school size and district size to achievement.[187] These studies, in effect, removed the influence of SES, leveling the playing field.  Lee and Smith’s studies controlled for the influence of SES in the same fashion as the Walberg team, but in a more complex fashion.[188]  For both teams, the relevant SES control variables are, in effect, additional (additive) terms in a linear equation.[189]

In contrast to the Walberg and Lee teams, the Howley team adopted a specific school- and district-size theory originated by Noah Friedkin and Juan Necochea,[190] which multiplies size and SES.    Friedkin and Necochea  viewed the size of both schools and districts as a structural feature presenting opportunities and constraints in the realization of student achievement.[191]  They postulate that schools and districts differ in their capacity to realize opportunities and to overcome restraints. If this is the case, the effects of size should vary rather than (as is assumed in other studies) remaining constant across settings.  The key question is what feature of settings might make such variation regular and predictable, rather than chaotic and unpredictable.  Friedkin and Necochea observed:

Studies of the distribution of public funds ... suggest that the power of a system to extract resources from its environment, the wealth of the environment from which a system draws its resources, and the priority accorded to the delivery of high quality services all are associated with the SES of a system’s client population.[192]


Hypothetically, then, affluent communities would be in a good position to maximize the opportunities and minimize the constraints of size, but the reverse would hypothetically be true in impoverished communities.  In this model, the interaction is realized as a multiplicative term in the equation.  SES, then, is an environmental condition hypothetically capable of regulating the effects of school and district size.


The Walberg Team


The small-schools-are-good line of evidence has been under development since the early 1980s, particularly by University of Chicago researcher Herbert Walberg in collaboration with various associates.[193] Others contributing significantly to this line of evidence include Mark Fetler.[194] Although Fetler is not part of the Walberg team, his important study on this issue is considered here because its findings favor smaller size generally rather than differentially, and his unit of analysis is the school rather than individual students.

Walberg’s investigations included a variety of influential variables such as various SES measures, expenditures, class size, teacher characteristics, and various measures of school and district size.[195]  With SES controlled, the Walberg team’s studies have focused on the influence of school and district size, using data sets from New Jersey.

As reported by Fowler and Walberg,[196] the influence of district size was several times as great as school size.[197]  The significance of this body of work, on the whole, is that it rigorously and consistently identified school and district size as negative influences on achievement.  The research established the possibility that smaller schools and districts were academically, and not just socially, advantageous regardless of SES.

In a study focusing on dropout rates, but using achievement as an independent variable, Fetler, working with data on California high schools, reported findings similar to those of Walberg and colleagues.[198] School enrollment in his study was negatively correlated to achievement without any controls for SES. [199] After controlling for size and SES, Fetler sought to determine whether schools with better aggregate achievement also exhibited higher dropout rates, which would suggest that the higher achievement was the result of lower-achieving students dropping out. His analysis showed the opposite:  With socioeconomic status (SES) and size controlled, higher achievement was actually associated with a lower dropout rate.  This finding suggests that equity and excellence not only can be realized simultaneously but might actually reinforce one another.[200]


The Lee Team


Lee and colleagues’ principal interest was school restructuring, and included school size as one feature of interest, rather than as the key focus of research.[201] Whereas the Walberg and Howley teams studied only public schools, the Lee team’s key study[202] also included Catholic schools and elite private schools, with sector an additional control variable.  This body of work is based exclusively on data from the National Center for Education Statistics’ National Educational Longitudinal Study of 1988 (NELS:88), and the focal study analyzes the individual achievement gains of students over the course of their time in high school.

Lee and Smith formed eight high school size categories.[203]  Compared with students attending high schools of 1,201-1,500 students, those in schools enrolling 601-900 students and those enrolling 901-1,200 students showed higher achievement gains. Students in the 301-600 student category performed somewhat better in reading and somewhat worse in mathematics than those in high schools of 1201-1500 students.  Students in high schools enrolling fewer than 300 students performed significantly worse, however.[204]

Improvements in the equity of achievement gains, [205] however, were robust in high schools attended by NELS:88 students in the three smallest size categories. In other words, disparities in achievement gains based on SES were smallest in those categories of school. The improvement in the equity of gains in reading achievement was stronger than improvement in the equity of gains in math achievement and was highest in the 301-600 category.[206]

Lee and Smith derived these recommendations for policy makers:

1.)      many high schools should be smaller than they are;

2.)      high schools can be too small;

3.)      ideal size does not vary by type of student enrolled (i.e., low-SES or minority); and

4.)      size is more important in some types of schools, because disadvantaged students suffer disproportionate achievement costs in very large or very small schools.[207]  

Overall, Lee and Smith concluded that a one-size-fits-all ideal size (600-901) was the best equity and excellence compromise. The next section of this review will take exception to some of these findings and recommendations.


The Howley Team


The author and his colleagues extended the Friedkin-Necochea theory and investigations to a series of state-level replication studies.[208]  Like the Lee team, this team was concerned with both achievement excellence and equity. The studies, along with the original Friedkin and Necochea study in California in 1988, show that in affluent settings, the influence of school size on the excellence of student achievement (at the school and district level as measured with state-mandated tests) is positive, but in impoverished settings, the influence is negative.  In other words, larger sizes are academically beneficial in affluent communities, but they are harmful in impoverished communities, producing a differential excellence effect.[209] In addition, as with the Lee studies, achievement equity was substantially enhanced in smaller schools (schools in each state were divided by the median size).  Importantly, these findings apply equally to district size.

The strength of the differential excellence effects, however, varied markedly from state to state. For instance, predominately rural Montana maintains many small schools. The state showed weaker differential excellence effects, and generally higher achievement equity across the board, than did other states. The smaller half of schools in the state exhibited lower socioeconomic status than the larger, somewhat more affluent, half of schools. Despite that difference, achievement equity was so high in Montana that the smaller half of schools exhibited higher achievement levels than the larger, more affluent half. Even with a reduced correlation of poverty and achievement across the board, however, equity was greatest in the smaller schools and districts in the state. At some grade levels within the smaller half of schools, the relationship between poverty and achievement was not statistically significant.[210]

Evidence of the differential excellence effect of school size was strong in California, Georgia, Ohio, West Virginia, and Texas.  The Alaska study,[211] unlike the others, used student-level data and a host of control variables relevant to students, schools, and communities.  But even with such extensive controls in place, the interaction between SES and school size remained a statistically significant influence on individual-level achievement.

Bickel and Howley extended the Matthew Project investigations to a multi-level analysis using their Georgia data set, and examining schools within districts.[212]  The single-level Georgia analyses had not found a differential excellence effect at the district level. The multi-level study, however, found influences interacting in a variety of ways. Poverty at the school level, for instance, interacted with the overall size of a district. A number of other such interactions between multiple influences also were found.  The multi-level study also discovered a remarkable pattern to equity results among four groups of schools, created by dividing schools and districts at the medians of school and district size.[213]  Achievement was least equitable in larger schools in larger districts (many of these “larger districts” were rural countywide districts) and most equitable in smaller schools in smaller districts (some of which operated in urban locales).  Smaller schools in larger districts were the second most equitable configuration, and larger schools in smaller districts were the second most inequitable configuration.  In general this study showed that school- and district-level variables interacted complexly to influence achievement excellence and equity.


Critiquing the Best Research


Three consensus implications seem to lurk in this body of work:

1.)   smaller school size is associated with higher achievement under some conditions;

2.)   smaller schools promote substantially improved achievement equity; and

3.)   smaller schools may be especially important for disadvantaged students.

Without a broader critique of the limitations and the sharp differences among the works cited, however, translating these vague implications directly into practice is unwise.

The Walberg studies seemed to suggest that smaller schools and districts were universally more efficient and effective, but the findings pertain almost exclusively to New Jersey and are hardly generalizable to other states or to the nation as a whole.  The norms of school and district size are quite different from state to state.[214]  It’s quite possible that replications in contrasting states would yield substantially different results.

Nonetheless, the studies of the Walberg team were among the first to suggest the possibility that smallness might harbor an achievement advantage, a hypothesis that had not previously been taken very seriously by prominent researchers.[215]  The Walberg team’s district-level findings have been almost entirely ignored, as have those of the Howley team.[216]

Lee and Smith analyzed a national data set (NELS:88), rather than state data sets, largely because their research questions focused not on school size, but on national efforts to sponsor school restructuring. Use of a national data set to study school size specifically is problematic if the state-level variations in the norms of size are not accounted for.  This critique, in the author’s view, compromises the external validity of the focal study.[217]  Policy makers must regard claims about “ideal high school size” as unproven in the context of actual practice in the various states.[218]

The Howley team’s studies, like the Walberg team’s, focused not on student-level achievement but on school and district performance on a variety of state-mandated tests.  The problem with such analyses, however, is that school- and district-level scores exhibit less variability than do individual-level scores.  A complete model would examine individual-level achievement within classroom, school, district, and state contexts of size.  This insight links the notion of the scaling of the educational system,[219] which has hardly been studied at all.[220]  Bickel and Howley (by examining schools within districts) and Lee and Smith (by examining individuals within schools), however, have made a beginning with two-level analyses.

In essence, the Walberg, Lee, and Howley teams studied different phenomena using different methods.  The Walberg team’s work was exploratory and conducted in one state; the Lee team modeled gains in student achievement, but ignored variability in state contexts and imputed a dubious “ideal” size; the Howley team replicated the California work of Friedkin and Necochea in six additional states, providing support for the original theory about school and district size.  Nonetheless, student-level variability is absent from this team’s work, which is more relevant to policy than to instructional manipulations themselves.


What Remains Unknown


Much more is unknown than is known about school size, despite the popularity of the issue in current writing for practitioners and in state and national legislation (e.g., the Feinstein amendment to the ESEA reauthorization).  In particular the author believes that assertions about “ideal size” are misleading abstractions, and that the school-within-schools strategy of simulating smallness has emerged with no basis in research to suggest that it will produce the achievement advantages confirmed for extant smaller schools operating under certain circumstances.

Several of the key unknowns have hardly been addressed in the research literature at all. In the author’s judgment, the following unknowns merit substantial attention from scholars of schooling (with student achievement the dependent variable):


·        To what extent do the popular but unresearched administrative simulations of smaller size (i.e., houses, pods, “academies” or other such within-school grouping arrangements) realize achievement advantages (including improvements in achievement equity) comparable to those reported for actually small schools?

·        To what extent does “ideal size,” as asserted by Lee and Smith, vary by state and under what conditions (type of locale, educational level, grade span configuration)?

·        Do minimum and maximum size thresholds actually exist (and under what conditions – state, type of locale, educational level, grade span configuration), beyond which larger or smaller size magnifies the negative effects of poverty?[221]

·        What is the relationship of grade span configuration to student learning, given differing state policy contexts and the likely influence of community socioeconomic status?

·        What are the simultaneous and interacting relationships of class size, school size, district size, and state context to the achievement level (particularly achievement gains) of individual students in respect of SES?  What are the relationships of these interacting contexts to school-level achievement equity?


Many, many other unanswered questions exist.  For instance, why is smaller school size (variously defined) associated with higher and more equitable levels of achievement for individuals, schools, and districts?  Hypotheses abound, with most having to do with the care, attention, and respect enabled by smallness in the conduct of personal relations. Links between achievement level and equity and such possible influences have hardly been investigated at all, however.[222]


Summary and Recommendations[223]


It would be an educational tragedy for current and future generations if, after a decade or so of experimentation with “small schools,” policy makers were to conclude that “small schools don’t work.”  The danger is real, however, because in the name of “small schools” as a reform tactic, there has been a tendency to confound schools-within-schools, established in the name of “small schools” reform, but which have not been seriously studied, with the school size research reviewed here. As a reform product, “small schools” has almost nothing to do with the extant research base on school size, and lacks a pertinent research base of its own.[224]

“Small schools” will become another fad unless approached thoughtfully in the realms of practice and policy.  Research can supply some, but not even most, of the necessary thoughtfulness.  Because so many small schools continue to exist, however, small schools are not principally a reform project, so far as the research into school size goes.  Some schools are smaller than others, and some smaller schools are awful places.  On average, though, smaller schools come out ahead of larger schools, but under certain conditions and not always.


Major conclusions


Three overarching implications seem warranted across all the works cited:

1.)   Many US schools are too large to serve students well.

2.)   Smaller schools are widely needed.

3.)   Smaller schools are particularly valuable in impoverished communities.

Common to many literature reviews on this topic,[225] such implications have been translated into practical decision-making principles for policy makers by the author:[226]

·        Find ways to sustain existing small schools, especially in impoverished rural and urban communities.

·        Acknowledge an upper limit for school size (even though not confirmed by research), acknowledgment that means many schools should be much smaller than the upper limit.

·        Don’t design, build, or sustain mega-schools (serving upwards of 500 to 2,000 students depending on educational level and grade-span configuration).[227]  Schools this large provide no detectable advantage to affluent students (the elite New England private high schools, for instance, enroll about 1,000 students in grades 9-12) and probably do academic harm to impoverished students.

·        Design, build, and sustain much smaller schools in impoverished districts or districts with a mixed social-class composition.  In very poor communities, design, build, and sustain the smallest schools.[228]

·        Don’t oversell smaller schools.  Like other schools, smaller schools can be wonderful or awful, but, all else equal, their odds of being awful are reduced as compared to larger schools.  Operating smaller schools in impoverished communities is good policy, but it is not a “magic bullet.”

·        Do not believe that mega-schools serving affluent areas are necessarily excellent or even very good.  Most accountability schemes obscure this fact because they do not generally take SES into account.  Graduates of such schools, however, can articulate the problems:  cliques, careerism, anti-intellectualism, de facto tracking, and so forth.[229]

·        Recognize that smaller schools in impoverished settings accomplish miracles even when test their scores are about average.  Such schools exhibit a very real but almost entirely unacknowledged degree of excellence, compared to which the vaunted “excellence” of large, well-funded suburban schools is more properly understood as mediocrity.


Increasing the number of smaller schools


Three efforts need to be engaged simultaneously retaining existing small schools in impoverished communities (especially necessary in rural communities), establishing new autonomous small schools in impoverished communities (especially necessary in urban communities), and helping struggling small schools to thrive.  Small in this instance means high schools enrolling approximately 400 or fewer students, and elementary schools enrolling approximately 200 or fewer students.  Recommendations for policy makers include the following:[230]

1.)   Provide capital outlay mechanisms not based on big-school norms.

2.)   Put an absolute enrollment cap of between 600 and 1,000 students on the size of new high schools and between 300 and 500 on the size of new elementary schools.[231]

3.)   In impoverished locales establish, sustain, and improve schools that are substantially smaller than the absolute upper limits.

4.)   Revise curriculum policies to implement small-school (rather than big-school) principles.

5. )  Implement a state-wide salary scale (which helps stabilize staffing, with stable staffing a foundation of school improvement).[232]

Superintendents, principals, and teachers [People] work in particular communities and schools, and, with this fact in mind, the author has offered the following counsel:[233]

1.)   Become better informed about the recent literature on small schools.

2.)   Take communities’ desires to retain or to re-establish smaller schools seriously, and not as a symptom of sentimentality or as a wild pipe dream.

3.)   Engineer the political will locally to support smaller schools, if the district currently operates mega-schools, or if it serves either mixed-social-composition or impoverished communities.  Engineering this political will is a lengthy process, and waiting to discuss the issue when construction funds materialize is dangerously reactive.  Obviously, stable leadership is required.

4.)   Develop community purposes for smaller schools; smaller schools are most sustainable when levels of community engagement are high.[234]

5.)    Work with other administrators and with policy makers to facilitate appropriate policy changes (see above).

6.)   Regard claims made about “schools-within-schools” with great skepticism.  Research on the variety of SWAS options does not exist, and  this review regards claims about achievement-related benefits for “pods,” “houses,” and non-autonomous “schools” as unwarranted.


Surely it makes sense to reorganize mega-schools in the attempt to foil the anonymity and impersonality of bureaucratically oriented high schools.  It is, however, not necessary to justify this move with reference to the school size literature related to student achievement; to do so misuses the literature and, worse, misrepresents the facts.

The best way to capture the achievement benefits of smaller size is to establish new smaller schools and to sustain and to exert effort to improve the ones already in existence.  Schools everywhere need not be small – unless by “small” one means a school enrolling fewer than 1,000 students, the benchmark used in the Feinstein amendment.  One thousand students, however, is a large school. Nothing in the empirically based research literature on school size and achievement suggests that academic benefits of any sort accrue in schools larger than this, even in schools serving a very affluent clientele.

4: Time for School: Its Duration and Allocation


Executive Summary


Research Findings


Small marginal increases (10-15%) in the time allocated to schooling show no appreciable gains in student achievement. Alternative calendars on which the typical 180 days of schooling are offered (e.g., year-round calendars) show no increased benefits for student learning over the traditional 9-months-on/3-months-off calendar. Summer programs for at-risk students are probably effective, though more research is needed.




·          Small – 10-15% – increases in the time allocated for schooling would be expensive and would not be expected to produce appreciable gains in academic achievement.


·          Furthermore, changes in the calendar by which those 180 days are delivered are very unlikely to yield higher levels of pupil achievement. In terms of pupil achievement, it matters not at all whether those 180 days are interrupted by one long recess or four short ones.


·          There is no reason not to expect – but little research to support – that three months summer school would result in the same rate of academic progress as any three months of the traditional academic calendar.


·          Within reason, the productivity of the schools is not a matter of the time allocated to them as much as it is a matter of how they use the time they already have.



4: Time for School: Its Duration and Allocation


By Gene V Glass

Arizona State University


On average, America’s children spend six hours each weekday and 180 days each year in school between the ages of 5 and 18. Roughly 25% of school districts have longer school years, and another 25% spend fewer than 175 days in school.[235] The questions addressed here have to do with the duration of schooling (allocated time) within the yearly school calendar, and the arrangement of that school time throughout the year. Would adding hours to the school day or days to the school year increase the amount that students learn? Would rearranging a fixed number of days schooling within the school year produce greater academic achievement? These are the central questions around which this review is organized. It is important to note that this report examines allocated time – the total amount of time students are in school. One commonly discussed and very visible aspect of school time will not be addressed in this review, namely, engaged time or academic learning time or time-on-task. Of the many hours children spend in school, the majority of them do not involve attention to learning the intended  curriculum. Berliner estimated that American students are actively engaged in learning for less than 40% of the time they are in school.[236]

We are here dealing with the question of the potential effect on academic achievement of increasing the length of the school day, or increasing the number of days of schooling during a calendar year, or both. In addition, the research on alternative yearly calendars will be reviewed to see what advice it might have for increasing achievement. Other options not investigated here involve the assignment of homework as a means of increasing students’ learning time (this area has been thoroughly investigated by Walberg, Paschal and Weinstein[237] and more recently by Cooper[238]), or the rearrangement of a fixed  amount of allocated time within the school day or week, as in block scheduling (see, for example, Cobb and Baker,[239]  Veal and Schreiber[240]).


School Time Research


Allocated Time and Achievement


Attention to allocated time as an important factor in accounting for differences in academic achievement received a huge boost in 1974 with the publication of research by Wiley and Harnischfeger.[241] These authors published the results of secondary analyses of the Equality of Educational Opportunity[242] dataset that seemed to indicate that the amount of schooling a student receives is a powerful determinant of the degree to which that student achieves academically.  Wiley and Harnischfeger (hereafter W&H) based their analysis and conclusions on a group of sixth-grade students in 40 elementary schools in Detroit, Michigan. They quantified the amount of schooling present in a particular school by combining measures of “average daily attendance,” “days in the school year,” and “hours in the school day.” When W&H related quantity of schooling to achievement holding constant a school’s socio-economic status (as measured by “percent white,” “average-items-in-the-home,” and “average number of siblings”), they discovered what they regarded as an impressive effect of quantity of schooling on achievement. Indeed, W&H christened quantity of schooling (allocated time) a “potent path for policy.”[243]

The Wiley and Harnischfeger findings did not go long unchallenged, even by researchers who were quite sympathetic with W&H’s conclusions. Karweit[244] worried that the number of schools in W&H’s secondary analysis of the Coleman data was small (40) and that the effect of quantity of schooling only appeared for a subset of schools in the central city of Detroit.  Moreover, the attempt to equate schools operating under very different circumstances by performing statistical adjustments on only three background variables, imperfectly measured, could well have left unaccounted for variability in the achievement data that might have been improperly attributed to quantity of schooling by W&H. The best corrective for these problems would be to attempt to replicate the effect on different data sets, each with its unique strengths and limitations. Karweit set out to do just that. Using the same Coleman EEO data set, Karweit analyzed the effect of quantity of schooling on achievement for schools in the inner city of Philadelphia, Milwaukee, Washington D.C., Cleveland and Baltimore. In none of these instances was the W&H effect of quantity of schooling found. Next, Karweit conducted similar analyses using data for all schools in the state of Maryland. In this analysis, school-level test scores on the Iowa Test of Basic Skills (vocabulary, reading comprehension, mathematics and language) at Grades 3, 5, 7 and 9 served as the dependent variable and “percent in attendance” was used as the quantity of schooling variable, with background equating variables of “mother’s education,” “family income,” “cognitive ability,” and “percent disadvantaged.” Again, no appreciable effect of variations of quantity of schooling on academic achievement was found. Still other analyses that Karweit performed failed to reveal the powerful effects that W&H had claimed. Karweit arrived, somewhat reluctantly it seemed, at the following conclusion: “Whether we use the school as the unit of analysis and incorporate quantity as a mediating variable, whether we examine central city or suburban schools, whether we control or do not control for ability, whether we use the individual as the unit of analysis, in no case do we obtain the sizeable effects reported by Wiley and Harnischfeger.”[245]

The Wiley-Harnischfeger and Karweit exchange did not end the matter of allocated time and achievement for researchers. Subsequent studies tended to confirm Karweit’s findings that there is little relationship between small marginal variations in allocated time for schooling and measured academic achievement.


Learning Curves


Smith[246] correlated allocated time and achievement in social studies for about 70 sixth-grade classes and found no statistically significant relationship (r = 0.17 for allocated time and achievement gain). Brown and Saks[247] employed data from the Beginning Teacher Evaluation Study to fit “learning curves” relating allocated time to achievement. Their analysis showed small relationships between the two variables. When curves were fit separately for high-ability and low-ability students, the latter showed a slightly stronger relationship between allocated time and achievement.

The list of researchers who have found no important relationship between the length of the school day or school year and the achievement of  students is long; a partial roster would include  Blai,[248] Borg,[249] Cotton and Savard,[250] Fredrick and Walberg,[251] Honzay,[252]  Karweit,[253] Lomax and Cooley,[254] Mazzarella,[255] and Walberg and Tsai.[256] It must be noted, however, that in every instance, the variation in the amount of allocated time is not great. No one has asserted, and no researcher believes, that students attending school for 100 days a year will achieve at the same level as students who attend school for 200 days a year.


Costs and Benefits


Proposals to increase the length of the school year must be looked at in terms of cost and returns on such expenditures. Odden[257] estimated that extending the school day to eight hours or lengthening the school year from l80 to 200 days (marginal increases of 11% in allocated time) would cost the nation more than $20 billion yearly in 1980 dollars (or roughly $40 billion in year 2000 dollars).  In a quantitative synthesis of  the existing research on the relationship of allocated time to student achievement, Glass,[258] Levin and Glass,[259] and Levin, Glass and Meister[260] sought to relate the cost of increasing allocated time to the returns in terms of grade equivalent gains on standardized achievement tests. Their analyses, using the results of prior research, simulated the addition of one hour to each school day for an entire school year; this hour would be used equally for instruction in reading and mathematics (30 minutes each). This additional time represents increases of between 25% and 50%, depending on subject and grade level, in baseline allocated time for basic skills instruction. These authors estimated that such increases in allocated time would result in yearly increases in achievement of less than one month in grade-equivalent units (seven-tenths of a month in reading and three-tenths of a month in mathematics). Levin[261] suggested that increasing teacher salaries, hiring remedial specialists, or buying new equipment are all superior in cost-effectiveness to increasing allocated time. Levin, Glass and Meister[262]  went on to compare the effects of a fixed financial investment in lengthening the school day with the effects of three other possible interventions intended to increase achievement in elementary school basic skills: computer-aided instruction, class-size reduction, and cross-age tutoring. Of these four interventions, increasing allocated time showed the smallest return per dollar spent. Levin and Tsang[263] supported this conclusion with analysis that drew upon economic theory; they concluded that large and costly increases in allocated time would be needed to effect even small increases in academic achievement.


International Comparisons


As has been pointed out, children in typical public schools in the U.S. attend school for six hours each weekday for 180 days a year. Some other industrialized countries, e.g., the United Kingdom, operate schools for up to eight hours a day for as many as 220 days a year. The sensational character of international comparisons of educational achievement has done much to obscure the issue of allocated time for schooling and mislead the public and policy makers. Stigler and Stevenson[264] attributed the superiority of Japanese students in mathematics to their longer school year. Barrett,[265] in a journalistic account of the duration of school years in various countries, claimed that the cause of the low performance, particularly at higher grades, of U.S. students in algebra, calculus, and science was the relatively short U.S. school year. Such international comparisons as TIMSS (Third International Mathematics and Science Study) are frequently read as supporting the conclusion that certain high-scoring nations, which have longer school years than the U.S., owe their superior status to the greater amounts of allocated time for teaching and learning. In most cases, the differences between allocated time in the U.S. and in other nations are small and statistically insignificant. But more important, the assessments of achievement are undertaken in such non-standardized ways as to render any conclusions suspect, or patently invalid.

Bracey forcefully criticized the attempt to base policy decisions about America’s schools on the TIMSS data.[266] For example, consider the TIMSS assessment in science and mathematics. Although the U.S. ranks relatively high in achievement at Grade 8, most public attention focuses on the poor performance of the U.S. at “Grade 12.” When this poor standing is linked – rhetorically, not scientifically – to the relatively short U.S. school year, bad research is compounded by being invoked as the basis for  bad policy recommendations. There are so many circumstances, particularly at the Grade 12 level, that differ among nations that little credibility is warranted for the TIMSS findings. For example, although the TIMSS assessment ostensibly assesses students in the “last year” of high school, the meaning of the “last year” differs from country to country, enrolling 19 year-olds in one nation and 17 year-olds in another; the U.S. high-school seniors are among the youngest assessed. The U.S. students were among a small minority of nations which chose to disallow the use of calculators on the TIMMS test. And to make matters worse, the U.S. is the only TIMSS assessment site in which most instruction is not in the metric system, although the TIMSS tests use only the metric system where measurements are involved. U.S. seniors score relatively low in international assessments of educational achievement, and they spend relatively fewer days in school during the year; but there are many other factors that intervene in this relationship, and the conclusion that small marginal increases in the length of the school year would lead to greater achievement is not warranted. 


Conclusion Regarding Increasing Allocated Time and Student Achievement


The import of a couple of decades of research on the effect on student achievement of small, marginal increases in the amount of time allocated to schooling is clear. Such increases have virtually no benefits for student achievement, and what small benefits there might be would not be justified by the increased cost of small increases in the length of the school day or the number of days per school year. This conclusion has a counter-intuitive ring to it: if any amount of schooling is effective – as it surely must be, or else unschooled children would achieve at levels equal to their schooled counterparts – then why shouldn’t more schooling be better? The answer probably lies in the intricacies of curriculum development and the organization of instruction. Virtually all of the research on allocated time for schooling has studied natural variation in the length of the school year and small differences therein. It is unlikely that an increase in the length of the school year of a few days (five or ten, for example) would prompt any important changes in the school curriculum. Most likely, teachers used the same textbooks and activities in the lengthened school year that they used in the shorter school year; more reviewing likely took place, and so on. Before major changes in curriculum and instruction take place, significant increases in the length of the school year would have to be attempted.


Changing the School Calendar (Year-Round Schools)


Given that increasing allocated time would likely yield small, insignificant increases in student achievement, are there ways of arranging the 180 days in the typical school year to promote greater academic achievement? Of all those ways of organizing a fixed amount of allocated time, only the proposals to deliver schooling on a “year-round” basis (equally spaced with intermittent vacations across twelve months) have gained much of a following among educators. Significantly, the original proposals to operate year-round schools (YRS) came from a consideration of the economics of school construction rather than any consideration of learning gains.

In year-round schools, as in traditional 9-month schools, students attend classes about 180 days spread throughout the twelve calendar months.  Typically, the student body is divided into three, four or five groups; school year starting dates are staggered so that at any one time, between one-third and one-fifth of the students are on vacation.  In the most popular year-round schedule, the 45-15 plan, four groups of students attend school for forty-five days, or about nine weeks, and then have fifteen days off.  Building capacity can be increased 25% because one-quarter of the student body is always on vacation.  The 45-15 plan is the most popular year-round attendance plan because all students have a summer vacation, even if it is shorter than the traditional 3-month summer recess of the 9-month/3-month calendar.  It is not, however, favored by high schools because the short, three-week vacations limit summer job opportunities.  In the Concept 6 year-round plan, the calendar year is divided into six 2-month blocks.  The students, in three tracks, have classes for four consecutive months and then a vacation for two months.  Concept 6 can accommodate a one-third increase in enrollment.  Because the students attend two 4-month terms a year, the administrative burdens of scheduling classes and recording grades are not as heavy as in the 45-15 plan.  One-third of the students will have no summer vacation at all; in areas with great seasonal temperature variations, this track will be unpopular.  Concept 6, then, can meet with a great deal of community resistance when the students’ tracks are mandated and not freely chosen.

 Another year-round schedule is the quinmester.  Five 45-day terms, or quins, make up the year; students attend four of the five quins.  In some districts, the fifth quin is optional; students who desire acceleration or enrichment, or who need remediation, attend all five terms.  Obviously, if many students take advantage of this option, the district does not save money, because the enrollment remains the same as in traditional schools.  There are many other year-round schedules, such as the trimester or quarter systems.  The rationale for most, however, is the same: to avoid construction of new schools by increasing enrollment at existing schools. 

Determining which of the claimed advantages are in fact true requires a look at what has actually happened in year-round schools.


Do Year-round Schools Improve Academic Achievement?


Year-round schools are principally a cost-cutting measure. Their success in reaching this goal and the many advantages and disadvantages that ensue from the change to a year-round calendar are the subject of some published policy studies[267]. But the subject of this review is the potential benefits to learning and achievement of converting schools on convention 9-month/3-month calendars to year-round calendars.

1.)    Proponents of the year-round calendar claim several advantages:

2.)    Students retain more over shorter vacations.

3.)    Learning proceeds via the psychologically more effective “distributed” rather than “massed practice” schedule.

4.)    Teachers spend less time reviewing previously learned material because of less forgetting during shorter vacations.

5.)    Because breaks will be more frequent, teachers experience less burnout.

Dempster[268] argued, in support of calendars such as the 45-15 year-round calendar, that spaced (or “distributed”) practice over several sessions is superior to the same amount of time concentrated into a single study session. These arguments often rely on data drawn from laboratory experiments where subjects memorize nonsense syllables or perform other non-meaningful tasks. The relevance of these studies to actual classroom practice is questionable.

Cherry Creek District 5 in the state of Colorado implemented year-round schools in 1974.  After one year, student achievement in three year-round schools was compared to achievement in traditional calendar schools.  Differences between standardized test scores in the two types of schools were found to be insignificantly small even after matching pupils on IQ.  Similar findings are reported for other year-round programs in Colorado and across the country. For example, examination of three years of standardized test scores for Mesa County Valley School District (CO) indicates that the year-round schedule does not in any way enhance learning. A closer look at the Mesa County (CO) study reveals a pattern common in research on the academic benefits of  the year-round calendar. In 1982, Chatfield Elementary School of the Mesa County Valley School District was converted from the conventional school calendar to the 45-15 year-round calendar. George and Glass[269] collected the SRA Achievement Test battery scores for all students at Chatfield who were tested in the Spring of 1981 (before conversion to the year-round schedule) and again in the Spring of 1982 (after one year on the year-round calendar). As a control, the district-wide SRA test scores were collected at the same two points in time; district averages were calculated after removing the scores of the Chatfield pupils. The results appear in the following table:


Average Percentile Gain (1981 to 1982)

for Chatfield (YRS) and District-wide Pupils






Chatfield (YRS)










These gains are statistically insignificant and should not be “over interpreted.” They indicate no superiority of one calendar over the other. They are indistinguishable from the kinds of yearly variation that all schools and school districts experience normally.

Many teachers and parents who favor year-round schedules believe that students learn more and faster when the learning process is interrupted for only short periods of time, as it is on the 45-15 plan.  Even in Concept 6 schools as in Colorado Springs, Colo., the most teachers in year-round schools rated their pupils’ vacation learning loss as less severe than in traditional schools.[270]  Smith and Glass[271] attempted to substantiate teachers’ perceptions in Colorado’s Cherry Creek District 5.  They found that although teachers in year-round schools spent less time reviewing pre-vacation material than teachers in schools on the traditional calendar, the actual achievement differences were insignificant on tests designed specifically to measure district objectives.


Other YRS Studies


The early findings in Colorado were replicated across the U.S. when researchers sought to compare achievement of students in YRS with their counterparts in schools on the traditional 9-month/3-month calendar. Several studies – by Naylor,[272] Zykowski,[273] Carriedo and Goren[274] – reached the conclusion that there is no significant difference in achievement between students in YRS and students in traditional calendar schools. Campbell[275] reported finding no significant achievement benefits due to year-round schools when compared with the traditional 9-month/3-month calendar in several Texas elementary schools. Webster and Nyberg[276] concluded that no evidence existed for the superiority of the year-round calendar at the secondary school level: “There appear to be no trends in any of the districts describing either improvements or decline in standardized achievement test scores as measured by district-administered tests and the California Assessment Program. Further evidence produced from interviews and a review of evaluation reports from Los Angeles Unified School District confirm that the impact of year-round education on achievement scores at the high school level has been inconclusive.”[277]

In a journalistic report on practitioners’ assessments of the learning benefits of the year-round calendar, Harp[278] cited the experiences of administrators in several states to the effect that the year-round calendar appeared to have no appreciable benefits for academic learning. For instance, Dr. N. Brekke, a Superintendent of Schools in Oxnard, reported that 17 years of the year-round calendar failed to raise students’ achievement to the California state average. Harp quoted administrators in Orange County, Florida, as saying that “‘many of the benefits associated with the year-round schedule have been more perceived than realized… people want you to prove that test scores are going up, but that’s a very difficult thing to do.”[279]

Not all studies have failed to find achievement advantages for the year-round calendar. Those that do claim advantages, however, stem disproportionately from an advocacy organization that has grown up around this issue: the National Association for Year-Round Education ( (Institutional memberships range from $350 to $750 per year depending on the number of students that a school or school district has enrolled in year-round education.) NAYRE publishes its own research reports, and avoids established peer-reviewed scholarly journals; copies of research reports outlining the benefits of the year-round calendar sell for about $30. “Negative” studies have tended to come from researchers working in universities.


The “Summer Forgetting” Argument for YRS


A primary argument in favor of YRS is that the long summer vacation of the 9-month/3-month calendar causes a large negative effect on student achievement.  Allinder et al.[280]  studied the summer break “forgetting” phenomenon for Grades 2 through 5. They found statistically significant losses in spelling, but not in mathematics, at Grades 2 and 3; they also found losses in mathematics, but not in spelling, for Grades 4 and 5.

Tilley, Cox, and Staybrook[281] studied summer regression in achievement for students receiving no educational services for three months. They found that most students experience some regression during the summer recess. Cooper et al.[282]  reviewed  39 such studies and found  that achievement test scores do indeed decline over the summer vacation. Their meta-analysis revealed that the summer loss equaled about one month on a grade-level equivalent scale, or one tenth of a standard deviation relative to spring test scores. The effect of summer break was more detrimental for math than for reading and most detrimental for math computation and spelling. Also, middle-class students appeared to gain on grade-level equivalent reading recognition tests over summer while lower-class students lost on them. Possible explanations for the findings included the differential availability of opportunities to practice different academic material over summer (reading is much more easily practiced than mathematics) and differences in the material’s susceptibility to forgetting (factual knowledge is more easily forgotten than conceptual knowledge).

Both the Allinder et al. and the Cooper meta-analysis of the summer forgetting phenomenon place estimates on the loss of achievement over the traditional 3-month vacation that are smaller than many expected. This may in part help explain why the YRS calendar does not produce the dramatic effects on achievement that some hoped to see.

Year-round schools can accomplish their principal goal of saving money by avoiding construction of new buildings.  However, there is no credible evidence that the year-round calendar causes improved academic achievement. How is it, then, that an idea whose benefits have eluded all objective attempts to discover them nonetheless engenders enthusiasm and loyalty to such a degree that it has its own national organization? Perhaps the answer lies in the problems administrators have “selling” the idea of YRS to parents and teachers. YRS calendars can disrupt family life, including vacation schedules and traditional summer activities (baseball leagues, camping programs and the like). These problems can be particularly severe when one child in a family is on a year-round calendar and another attends school on a traditional calendar. Convincing parents that the inconveniences caused by the year-round calendar are worth the trouble is a task that falls to school principals. One argument used to make the case for conversion is that the year-round calendar is much superior to the traditional calendar in terms of academic learning. Unfortunately, this position lacks empirical support.


The Extended School Year


Of course, the obvious antidote to summer forgetting is to extend the school year throughout the summer. Without thinking much about it, parents in surveys give strong support (85%) to the idea that students who fail to meet academic standards should attend summer school.[283]  Such an extension for all students would represent an astronomical increase in the cost of schooling in the U.S. – on the order of $80 billion in current dollars. No such proposals have been seriously advanced and no research exists to suggest the potential returns in terms of academic achievement. “Extended school year” proposals have been limited almost entirely to services for handicapped or disabled students.[284]  Heyns’s analysis of summer programs for at-risk students in Atlanta schools revealed gains in academic achievement, but at rates considerably slower than during the regular academic year.[285]  The absence of research on the effectiveness of extending schooling through the summer months should not deter reasonable judgments of the potential success of such proposals, however. The elements in the successful delivery of schooling are not mysterious, after all. Well-trained and experienced teachers, good curriculum materials, adequate physical facilities – these ingredients in combination succeed day-in and day-out in teaching our nation’s children. There is no reason to believe that the continuation with a high-quality program of the 9-month school year throughout the three months of the traditional summer recess would result in any less academic achievement than is observed during the regular school year. Cooper and his colleagues[286] have based their recommendations for quality summer school programs on a meta-analysis of the literature.

The absence of a relationship between small marginal increases in the length of the school year or the school day throughout the year must not be extrapolated to reach the conclusion that significant increases in allocated time for schooling (such as three months’ instruction throughout the summer) would not result in significant increases in academic achievement.




The research conducted on time allocated for schooling yields three broad conclusions:

·        Small – 10-15% – increases in the time allocated for schooling would be expensive and would not be expected to produce appreciable gains in academic achievement.

·        Furthermore, changes in the calendar by which those 180 days are delivered are very unlikely to yield higher levels of pupil achievement. To paraphrase a famous poet, “180 days is 180 days is 180 days.” And, at least in terms of pupil achievement, it matters not at all whether those 180 days are interrupted by one long recess or four short ones.

·        There is no reason not to expect – but little research to support – that three months summer school would result in the same rate of academic progress as any three months of the traditional academic calendar.

Within reason, the productivity of the schools is not a matter of the time allocated to them. Rather it is a matter of how they use the time they already have.


5: Grouping Students for Instruction


Executive Summary


Research Findings


Ability grouping has been found to have few benefits and many risks. When homogeneous and heterogeneous groups of students are taught identical curricula, there appear to be few advantages to homogeneous grouping in terms of academic achievement. More able students make greater academic progress when separated from their fellow students and given an accelerated course of study. Less able students who are segregated from their more able peers are at risk of being taught an inferior curriculum and consigned to low tracks for their entire academic career. Teachers assigned to higher tracks and parents of bright students prefer ability grouping. Teachers in lower tracks are less enthusiastic and need support in the form of materials and instructional techniques to avoid the disadvantages of tracking.




·          Mixed or heterogeneous ability or achievement groups offer several advantages:

1.)    less able pupils are at reduced risk of being stigmatized and exposed to a “dumbed-down” curriculum;

2.)    teachers’ expectations for all pupils are maintained at higher levels;

3.)    opportunities for more able students to assist less able peers in learning can be realized.

·          Teachers asked to teach in a “de-tracked” system will require training, materials and support that are largely lacking in today’s schools.

·          Administrators seeking to “detrack” existing programs will require help in navigating the difficult political course that lies ahead of them.



5: Grouping Students for Instruction


By Gene V Glass

Arizona State University


The sorting of students into homogeneous ability and achievement groups is nearly as old as universal compulsory education in the United States. The grouping of students by ability or achievement forms a continuum that extends from “reading groups” (the redbirds, bluebirds, and canaries) at one end to tracking and even segregation of students between school districts at the other. While the one extreme may be a matter strictly of professional pedagogical judgment, the other extreme may represent the impact of broad social forces outside the control of any on educator or group of professionals. This review will touch on each point across this continuum.

The seemingly simple notion of grouping pupils by their ability for instruction proves, upon closer examination, to be very complex with many variations.  Within-class grouping, between-class grouping, the Joplin plan, XYZ grouping, gifted classes, academic tracks, charter schools – the inclination to sort students comes in many forms and has a long history.  Otto found evidence of homogeneous achievement grouping of pupils as far back as the nineteenth century in America’s schools.[287] The Santa Barbara Concentric Plan of the early 1900s divided classes into A, B and C groups who received three levels of curriculum based on their past performance.

The pedagogical justification for homogeneous grouping centers on the role of the teacher: with students grouped by ability or achievement, the teacher is able to focus more instruction at the level of all the students in the group; thus, time is not wasted as bright students wait for elementary explanations to be given to their slower classmates, and slow students are not troubled with instruction that is over their heads. Bright students are thought to need a faster pace and enriched material; low-ability students are thought to require remediation, repetition and more reviews. Slower students, it is felt, will be better off shielded from competition with their brighter classmates; more able students will not become complacent by comparing themselves with slow students, and they will be spurred to higher levels of achievement by competing with their own kind. These images, not unfamiliar to teachers and parents alike, are rife with assumptions about the nature of human intelligence, the conditions of learning, the development of students’ self-perceptions, and the behavior of teachers, only a few of which are tested in the research literature.

Ability grouping enjoyed wide professional and public acceptance beginning in the heyday of the “scientific” movement in education (from Edward L. Thorndike, to Lewis M. Terman, to the post-WW II era) and extending to the post-Sputnik era of emphasis on enriching curriculum for the gifted.[288]  Homogeneous grouping in the form of tracking received severe criticism in the last quarter of the 20th century. James Rosenbaum’s Making Inequality [289] and Samuel Bowles and Herbert Gintis’s Schooling in Capitalist America[290] saw ability grouping as not just perpetuating but creating disadvantages for poor and minority students.  Jeannie Oakes’s Keeping Track[291] prompted vigorous debate regarding the effects of homogeneous grouping. Tracking’s detractors leveled charges of  stigmatizing students, and consigning them to inferior and “dumbed-down” instruction. Homogeneous grouping was not completely without its supporters. Thomas Loveless concluded: “The primary charges against tracking are (1) that it doesn’t accomplish anything and (2) that it unfairly creates unequal opportunities for academic achievement. What is the evidence? Generally speaking, research fails to support the indictment.”[292]


Student Grouping Research


Researchers approaching this policy question from different points on the disciplinary compass have reached different conclusions about the value of homogeneous grouping. The issue of homogeneous grouping not only separates researchers and scholars, it separates social classes and ethnic groups as well. Ability grouping is nearly universally condemned by scholars from minority ethnic groups (e.g.,  Braddock[293], Darling-Hammond[294], Esposito[295]).  Why these various groups have arrived at conflicting conclusions and what educators should make of their conflicting recommendations is the central question to be resolved in this review.


The Prevalence of Homogeneous Grouping


How common is it for teachers and schools to separate students into groups of similar ability or achievement for purposes of instruction?

In part, estimates of the incidence of homogeneous grouping depend on how one asks the question. Public sentiment and professional judgment have turned against strict ability (IQ) grouping of the XYZ-type that first made an appearance in the 1920s. Beginning in 1919, the Detroit public schools administered intelligence tests, divided the distribution of students into strictly ordered ability groups – X, Y and Z – and taught the same curriculum to all three groups. This Huxleyesque scheme, so reminiscent of the Alphas and Betas in Brave New World, would be found unconstitutional in the present day. (Indeed, in Hobson v. Hansen, the tracking of students into ability groups in the Washington, D.C. schools was ruled to be a violation of Fourteenth Amendment rights.[296])   Ask educators today if they track pupils into “ability groups,” and they will probably say “No.” Ask them if they group students homogeneously by achievement to facilitate instruction, and their answer is likely to be “Yes.” While grouping is currently based on past performance rather than measured academic aptitude, the results are probably not much different, given the reasonably  high correlation between achievement and aptitude.[297]

Hoffer[298] reported data from the Longitudinal Study of American Youth that addressed the incidence of tracking in mathematics and science in more than 50 middle schools of the late 1980s.  About 40% of the schools tracked students for science teaching in Grade 7; this figure rose to 50% in Grade 8.  For mathematics instruction, 80% tracked in  Grade 7 and more than 90% tracked in  Grade 8. These data are supported by Epstein and MacIver’s[299] survey, also performed in the late 1980s. Survey respondents were asked “For which academic subjects are students assigned to homogeneous classes on the basis of similar abilities or achievement levels?”  Homogeneous grouping was practiced in two-thirds of the middle schools in some or all subjects at Grade 5 and in three-quarters of the schools at Grade 8. Surveys of homogeneous grouping in elementary grades would show even higher incidences, where the proverbial redbirds, bluebirds and canaries are almost ubiquitous.


Middle Schools Classified by Tracking

in Some or All Subjects: 1988

With Percents by Columns

(After Epstein & MacIver, 1990)


Tracking in…

Grade 5

Grade 6

Grade 7

Grade 8

 All Subjects





 Some Subjects





 No Tracking






These surveys may underestimate the incidence of homogeneous grouping in the nation’s public schools. Even the most inexperienced administrator knows that this issue divides teachers, parents and other stakeholders in our schools. Complete candor on questionnaires received in the mail or in reply to questions posed by some ephemeral visitor to one’s school is not only unlikely, it could even be disruptive. Visitors to schools who enter them on different terms and who press for deeper answers might place the incidence of  homogeneous grouping at levels even higher than these surveys. In an interview on his book Savage Inequalities, Jonathan Kozol remarked: “Virtually every school system I visit, with a few exceptions, is entirely tracked, although they don’t use that word anymore.”[300] Whichever figure one might accept – two-thirds, three-quarters, or 100% – the conclusion seems inescapable that homogeneous grouping of students by ability or achievement is virtually endemic in American education.

Open enrollment plans in which students choose from among a set of courses also produces stratification of schools by ability groups.  Sam Lucas has documented this phenomenon in his book Tracking Inequality: Stratification and Mobility in American High Schools.[301]  Welner has observed the same pattern of sorting entering the school system in the form of choice programs:

…tracking under a choice regime resembles tracking under the more rigid tracking regimes of the past.[302]


for many district students, choice was more apparent than real. Scheduling conflicts constrained some students’ choices in ways that perpetuated tracking (e.g., taking a lower-level math class prevented scheduling of a higher-level English class). For other students, … the course selection process amounted to little more than accepting the schools’ recommendations.[303]


Who Wants to Group Students: Teachers or Parents?


A survey published by the National Education Association in 1968 indicated that at least 75% of teachers preferred to teach homogeneously grouped classes.[304] Teachers’ affinity for ability grouping disappears among teachers who are assigned the lower-tracked classes.[305] Contemporary surveys, though lacking, would likely duplicate this finding. It is not difficult to understand why teachers’ jobs are made easier by teaching students in groups of similar achievement levels. However, it is not clear whether homogeneous grouping is intrinsically more effective or whether it is preferred because of an absence of curriculum materials and instructional techniques designed for heterogeneous groups.

Teachers’ preferences for homogeneous grouping must surely be matched or even exceeded by parents’ preferences for the same, at least the preferences of educated and wealthier parents to have their children placed in the highest groups. Parents’  interventions into tracking decisions are common.  Highly educated parents have been found more likely to push for high track placements than other parents.[306] Oakes and Wells studied 10 middle schools and high schools in their research on “detracking” secondary education and found that middle-class suburban values and norms are strong reinforcers of tracking.[307]


Multiple Perspectives on the Effectiveness of Ability Grouping


The topic of grouping students for instruction has been studied by researchers from quite different perspectives. On the one hand, educational psychologists have focused on academic achievement narrowly construed as performance on paper-and-pencil tests and self-esteem scales. Sociologists have taken a broader view that encompasses students’ academic careers, and the opportunities and services offered students in different groups and tracks. Indeed, on this particular topic, it is fair to say that two different disciplines – psychology, and particularly educational psychology on the one hand, and sociology on the other – have focused on different aspect of this phenomenon and have arrived at different conclusions.


Educational Psychologists’ View


Research on ability grouping by educational psychologists has a very long history, dating from the very beginnings of educational research itself. As early as 1916, Whipple[308] studied students in the Urbana, Illinois, school system who had been grouped into homogeneous gifted classes. A handful of major studies – which themselves review and integrate the findings of dozens of primary studies extending over several decades – now forms the empirical basis of most persons’ opinions about the effects of ability grouping on achievement: the Kulik and Kulik[309] and the Slavin[310]  meta-analyses for elementary and secondary school ability grouping.

Since the meta-analyses[311] play a key role in forming an evaluation of the efficacy of homogeneous grouping, a brief explanation of this technique is in order. Meta-analysis is a statistical technique used to combine and integrate the findings – themselves expressed statistically – of many individual empirical studies. In its simplest form, as an example, a meta-analysis might collect a hundred studies of the correlation of achievement and ability and report that the correlation coefficients ranged from 0.25 to 0.85 with an average correlation of  0.62. When the primary research studies being “meta-analyzed” involve comparing two groups – for example, students taught in homogeneous (condition A) v. heterogeneous (condition B) groups – it is common to express the findings of each primary study in a form known as an effect size. An effect size that describes the difference between two groups is defined as a mean difference (between conditions A and B) in units of the within-condition standard deviation:


ES = Mean(A)    Mean(B)



The value of ES reveals the degree of superiority of condition A over condition B (or, B over A in the event that ES has a negative value). Under the assumption of normally distributed scores, an average ES  of +1.0 indicates that the average student in condition A scores above 84% of the students in condition B. The concept of the effect size applied to standardized achievement test data enjoys a fortuitous coincidence. It is an empirical fact that the standard deviation of most achievement tests is 1.0 years in grade equivalent units. Consequently, an effect size of 1.0 implies that the average superiority of condition A over condition B is one year in grade equivalent units. Likewise, an effect size of 0.50 implies that students in A achieve, on average, 5 months in grade equivalent units above students in condition B.


The Kuliks’ Meta-analyses.

Kulik and Kulik[312] integrated the findings of 52 experimental and quasi-experimental studies of the effect of ability grouping on achievement of secondary school students. The results of their analysis showed that the benefits in terms of academic achievement of ability grouping were virtually absent in all cases, with the exception of the comparisons of high-ability students in gifted classes vs. their counterparts in mixed-ability classes. When the effects for different subjects (math, science, reading, social studies), standardized vs. locally relevant tests, and objective vs. non-objective tests were examined, no consistent benefits were seen for ability grouping. When Kulik and Kulik examined the effects of ability grouping at the elementary school level, they found small but positives effects in reading and mathematics for both within-class and between-class ability grouping. Effect sizes were approximately 0.30 for high-ability students and declined to less than 0.20 for low-ability students. There emerges in the Kuliks’ meta-analyses the first hint that the benefits of ability grouping may be due to the fact that high-ability students receive an enriched curriculum in homogeneous classes (as described, for example,  by Oakes[313]). This conclusion was given further substantiation in Kulik’s  meta-analysis of enrichment and accelerated programs for gifted and talented students that, when compared to gifted students in heterogeneous classes, yielded effects sizes of  0.40 for enrichment classes and 0.90 for accelerated classes. The Kuliks’ meta-analyses were the first to challenge reviews like that of Good and Marshall[314] that recommended against all forms of ability grouping.


The Slavin Meta-analyses.

The Kulik and Kulik meta-analyses contrast somewhat with the meta-analyses (called “best evidence syntheses”) published by Robert Slavin[315] in 1986 and 1990. Slavin, who relied on a good deal more selectivity in forming the database of studies on ability grouping before attempting to integrate their findings, drew conclusions from the body of work that questioned the efficacy of homogeneous grouping for instruction at the secondary school level:

“Comprehensive between-class ability grouping plans have little or no effect on the achievement of secondary students, at least as measured by standardized tests. This conclusion is most strongly supported in Grades 7-9, but the more limited evidence that does exist from Grades 10-12 also fails to support any effect of ability grouping.”[316]


At the elementary school level, Slavin[317] concluded that the research supported modest but reliable benefits of within-class ability grouping for mathematics at the intermediate grades and benefits for reading achievement of the Joplin plan for all elementary grades. (In the Joplin plan, students are grouped across grades into intact classes for reading instruction, in which reading is taught in the same manner to the whole class, or at most two groups within the class; students then return to their principal grade assignment for all other instruction.) The seeming discrepancy between Slavin’s and Kulik and Kulik’s conclusions (Slavin being considerably more pessimistic about homogeneous grouping at the secondary school level than the Kuliks) is resolved when the criteria for inclusion of studies in the meta-analyses are examined. Whereas Kulik and Kulik threw a fairly broad net over the body of literature traditionally identified as ability grouping research, Slavin excluded studies that did not attempt to standardize curriculum among the various homogeneously formed groups that were compared. Slavin’s interest was in isolating the unique effect of having students learn in homogeneous groups, not in evaluating how curriculum may become differentiated (enriched in high ability groups, “dumbed down” in low ability groups) among homogeneous groups. Indeed, Slavin issued a warning that is seldom acknowledged in brief or journalistic accounts of this research:

…there is an important limitation to this conclusion [of no beneficial effect of ability grouping]. In most of the studies that compared tracked to untracked grouping plans…, tracked students took different levels of the same courses (e.g., high, average, or low sections of Algebra 1). Yet much of the practical impact of tracking, particularly at the senior high school level, is on determining the nature and number of courses taken in a given area. The experimental studies do not compare students in Algebra 1 to those in Math 9…. The conclusions drawn … are limited, therefore, to the effects of between-class grouping within the same courses, and should not be read as indicating a lack of differential effects of tracking as it affects course selection and course requirements.”[318]

[Added emphasis shown in boldface.]


The findings of the Kulik & Kulik and the Slavin meta-analyses are summarized in the following table:


Average Effect Sizes from the

Kulik & Kulik and Slavin Meta-Analyses

Of Ability Grouping Studies


Ability Grouping Type

Grade Level

Kulik & Kulik






Joplin Plan




XYZ Ability Grouping




Enriched for Gifted




Accelerated for Gifted





What becomes clear from examination of the above results is that, whatever benefits may accrue from the grouping of students into homogeneous ability groups for instruction, these benefits pale beside the benefits that accrue to gifted students when they are separated from their classmates and given enriched and accelerated curricula.

Proponents of ability grouping have sometimes made extraordinary reaches to supply their position with empirical warrants. Allan reached toward the research on “peer modeling” from educational psychology:

 Further, the idea that lower ability students will look up to gifted students as role models is highly questionable. Children typically model their  behavior after the behavior of other children of similar ability who are coping well with school. Children of low and  average ability do not model themselves on fast learners.[319] It  appears that “watching someone of  similar ability succeed at a task raises the observer’s feelings of efficiency and motivates them to try the task.”[320] Students gain most from watching someone of similar ability “cope” (that is, gradually improve their performance after some effort), rather than watching someone who has attained “mastery” (that is, can demonstrate perfect performance from the outset).[321]


These are extraordinary claims, if true, because they seem to oversimplify the complex dynamics of children’s lives in real classrooms. Indeed, the most generous thing that may be said for the research basis of this claim is that it is oversimplified and was never intended as justification for such positions. Schunk’s review of “peer models and children’s behavioral change” focuses entirely on short-term (a few minutes or hours), staged incidences in laboratories where children observe “models” performing artificial tasks, for the most part. In fact, Schunk excluded from his review studies of “natural peer interactions, [and] … tutoring or peer teaching.”[322] Moreover, this literature lacks any definition of what a “peer” is. At one point,  Schunk concluded that, of four experiments involving observational learning of cognitive skills or novel responses, “Each of these studies supports the idea that model competence enhances observational learning.”[323] Schunk continues:

Social cognitive theory [predicts both that] Children should be more likely to pattern their behaviors after models who perform successfully than to emulate less-successful models, [and that] models who are dissimilar in competence to observers exert more powerful effects on children’s behavior. … Similarity in competence may be more important in contexts where children cannot readily discern the functional value of behavior; for example, when they lack task familiarity, when there is no objective standard of performance, or when modeled actions are followed by neutral consequences.[324]


In other words, similar competence may be important – this theory seems to say – in those circumstances where children have no basis for inferring what the competence of the “model” is. If the reader thinks that this entire line of research bears scarcely a tenuous relationship to classroom practice and education policy, he or she is joined in those doubts by Schunk himself, who wrote: “Given the present lack of classroom-based research, drawing implications for educational practices is a speculative venture.”[325] No research appeared to correct this “lack” between Schunk review in 1987 and Allan’s use of it in 1991.


Sociologists’ View


Not surprisingly, psychologists acted like psychologists when they studied the effects of ability grouping: they contrived experiments, wrote paper-and-pencil tests, and sought objective evidence of superior test performance. When sociologists turned their attention to the tracking of students into ability groups, they acted like sociologists: spending time in schools observing; interviewing teachers, parents and students; asking questions about opportunities, preconceptions; and wondering about what this form of schooling had to do with the larger society of which it was one small part.

Gamoran[326] found that students in low tracks or ability groups were less likely to attend college than students in higher tracks. That lower tracks receive a poorer quality curriculum, less experienced teachers, and teachers with lower expectations for their students’ performance has been observed by several researchers, including Gamoran,[327] Oakes,[328]  Persell,[329] Rosenbaum.[330]

Jeannie Oakes has been a consistent critic of homogeneous grouping of students at all levels of the educational system. Her research,[331] dating from the late 1970s, has drawn on the evidence accumulated in literally thousands of person-hours of observation of teachers and students in tracked classes and schools. She has presented her findings forthrightly and forcefully:

Tracking does not equalize educational opportunity for diverse groups of students. It does not increase the efficiency of schools by maximizing learning opportunities for everyone…. Tracking does not meet individual needs. Moreover, tracking does not increase student achievement.


What tracking does, in fact, appears to be quite the opposite. Tracking seems to retard the academic progress of many students – those in average and low groups. Tracking seems to foster low self-esteem among these same students and promote school misbehavior and dropping out. Tracking also appears to lower the aspirations of students who are not in the top groups. And perhaps most important, in view of all of the above, is that tracking separates students along socioeconomic lines, separating rich from poor, whites from nonwhites. The end result is that poor and minority children are found far more often than others in the bottom tracks.[332]


Even proponents of tracking into ability groups have acknowledged that  research “has verified again and again . . . that many low-track classes are deadening, non-educational environments.”[333] What is more, assignment to a low track is seldom followed by later reassignment to middle or high tracks. The professed intention of assignment to lower tracks being a transitional remedial period for the purpose of bringing students back up to speed is seldom realized.[334]

In summarizing research on tracking from the sociological perspective, Welner and Mickelson wrote:

In a nutshell, this substantial body of research demonstrates that low-track classes are consistently characterized by lowered expectations, reduced resources, rote learning, less-skilled teachers, amplified behavioral problems, and an emphasis on control rather than learning. …The extant empirical research has also demonstrated that low-track classes are rarely remedial; that is, students placed in a lower track tend not to move later to higher tracks and, in fact, suffer from decreased ambitions and achievement…. Track placements, while increasingly subject to parental and student choice, remain highly rigid and highly correlated to race and class-over and above measured academic achievement….[335]

Although he does not present himself as a sociologist, Jonathan Kozol has earned a reputation over nearly forty years as a perceptive and credible observer of America’s schools, particularly the schools hat suffer the multiple insults of severe poverty. Kozol’s 1991 book, Savage Inequalities: Children in America’s Schools, detailed his observations of the extreme inequities experienced by the poor and particularly the ethnic minority poor in U.S. schools. Tracking played a prominent role in most of the schools he visited. In an interview for the magazine Educational Leadership, Kozol was asked the following question:

Interviewer: Let’s talk a little bit about curriculum innovations–for instance, the idea of reaching at-risk kids in ways that are usually reserved for the gifted. Teaching algebra to remedial students, for instance. Dissolving the tracking system. What are your opinions about these solutions to problems of inequity?


Kozol: Tracking! When I was a teacher, tracking had been thoroughly discredited. But during the past 12 years, tracking has come back with a vengeance. …We have these cosmetic phrases like “homogeneous grouping.” It’s tracking, by whatever name, and I regret that very much. It’s not just that tracking damages the children who are doing poorly, but it also damages the children who are doing very well, because, by separating the most successful students–who are often also affluent, white children–we deny them the opportunity to learn something about decency and unselfishness. We deny them the opportunity to learn the virtues of helping other kids. All the wonderful possibilities of peer teaching are swept away when we track our schools as severely as we are doing today.[336]


Why Such Different Views?


Two groups of scholars – educational psychologists on the one hand and educational sociologists on the other – come to quite different conclusions on the value of homogeneous grouping of students for instruction. Why?  The answer lies in what they look for and how they look for it. Psychologists have tended to focus on short-run comparisons of different ability groups exposed to the same curriculum; they have evaluated the effects of grouping with paper-and-pencil tests of achievement. For example, only nine of the 52 studies in the Kuliks’ meta-analysis of secondary school ability grouping involved any formal adaptation of the curriculum to the ability level of the students.

Sociologists have taken a broader view of the various effects that ensue from the separation of students into homogeneous groups: the curriculum they receive, the type of instruction they are given, the social climate that is created and how it might shape their long-range plans, and the like. In large part, then, these two groups have been observing different phenomena, and operating with different disciplinary assumptions that have led them to draw conclusions that, if they don’t contradict each other, at least place emphases on different outcomes. Psychologists’ efforts to control independent variables have led them to focus on experiments that held curriculum constant and varied group composition: homogenously formed groups in one school, heterogeneously formed groups in another. Sociologists, by contrast, have employed methods more akin to naturalistic observation, finding tracked schools and observing all of the consequences that ensue, including markedly differentiated curricula between tracks. These different perspectives account, perhaps, for the relatively benign view of tracking taken by educational psychologists.




One’s position on the ability grouping question will probably turn on the value one attaches to academic achievement of traditional types versus the broader goals of education. Those who construe the purpose of schooling as primarily preparing students – particularly the more academically able students – for higher education or the workforce, and who feel they see clearly the demands of those future roles, are likely to accept homogeneous grouping as an appropriate instructional strategy. On the other hand, those who see education as sorting children and reproducing social and economic class inequalities and protecting the privileges of already privileged social and ethnic groups are likely to regard homogeneous grouping as a principal means of achieving this goal. Loveless,[337] in his much cited book The Tracking Wars, sketches a view of education that virtually presupposes the superiority of ability grouping: Schools are “places for students to learn content that is designated, authoritatively, by someone else”[338] (p. 13). This authoritative designation involves “deciding what students should know (content), deciding what they are capable of learning (ability), and finally, reconciling the content with students’ ability to learn it.” [339] The educator’s responsibility is that of “matching students with curriculum” and having “a legitimate party [decide what] students should learn.” [340] This authoritarian, content-centered view of schooling has as many detractors and as it has supporters.

Welner summarized the situation with respect to tracking in language stripped of vagueness and euphemisms:

Ultimately, tracking is philosophically premised on the belief that some children are so academically different from other children that these two (or more) groups should not be in the same classroom. Accordingly, the academically inferior children are placed in separate classrooms where, in theory, they catch up (remediate) but where, in practice, they usually fall further behind. Tracking, then, is about the rationing of opportunities. From the perspective of the low-track student, it’s about deciding that this student should not be exposed to curriculum and instruction that would prepare him or her for subsequent serious learning. From the perspective of the high-track student, it’s about enhancing the schooling environment for some students by shielding (segregating) them from other students.[341]


The teacher who worries about the potential injustice to poor and minority students of tracking them into homogeneous groups will find little support for dealing with the special challenges that heterogeneous grouping presents. Commercially available curriculum materials are unlikely to aim at the same goals while differentiating the approach for students of differing levels of ability. Cross-ability tutoring, which has the potential to significantly raise the achievement of the tutors as well as those students being tutored,[342] is seldom provided for in today’s schools and almost never included among the techniques imparted during pre-service teacher training. Often, the most vocal and active parents in a school will request ability grouping, when their children stand a good chance of being assigned to the fast track. It is little wonder that teachers prefer homogeneous groups for instruction, unless they are confined to teaching the lowest tracks. However, the challenge that must be faced whenever students are separated into homogeneous achievement groups is to avoid the “dumbing down” of the curriculum, to make the content and activities of the class as engaging and interesting as the curriculum of the highest tracks, whether they are called “gifted,” “accelerated,” or “advanced.” One of the few efforts to reverse the ill-effects of tracking at-risk students into low-achieving homogeneous groups is Henry Levin’s[343] accelerated schools movement, in which curriculum and teaching methods thought to be appropriate only for high track students are adapted for the education of all students. Tomlinson has recently offered advice on how instruction can be differentiated in mixed-ability classrooms without suffering the many ills that can result from segregating students into homogeneous ability groups.[344]

Administrators wishing to “detrack” traditionally tracked schools will face a considerable challenge. Welner and Oakes have offered plans for navigating the choppy political waters that must be crossed when schools that have evolved to primarily serve the interests of the brightest students are transformed into schools that serve all students’ needs.[345]

Ability grouping, achievement grouping, within-class, between-class, Joplin plan, gifted programs, tracking, advanced placement – all of these devices may spring from the same basic motivation. Since the empirical research on academic progress shows nothing much more than small benefits to bright students of any of these forms of grouping per se, and large benefits from enriching and accelerating the curriculum for select students, the prevalence of these forms themselves probably represents another expression of the wish of middle-class and upper-middle-class parents to secure some advantage or privilege for their children within the public school system. Is this bad? In a schooling system already markedly segregated on the basis of housing patterns and in which poor and academically deprived children already suffer not just from sub-standard schooling but from the indignity of racial and socio-economic segregation (as noted by Kozol[346] and by Orfield and Eaton[347]), the homogeneous grouping of students for instruction is one more advantage conferred on those who already enjoy many. Jonathan Kozol has called the tracking of poor and minority students into “special-needs” classes while white middle-class students are accelerated in classes for the gifted “one of the great, great scandals of American education.”[348]




·        Mixed or heterogeneous ability or achievement groups offer several advantages:

1.)    less able pupils are at reduced risk of being stigmatized and exposed to a “dumbed-down” curriculum;

2.)     teachers’ expectations for all pupils are maintained at higher levels;

3.)    opportunities for more able students to assist less able peers in learning can be realized.

·        Teachers asked to teach in a “de-tracked” system will require training, materials and support that are largely lacking in today’s schools.

·        Administrators seeking to “detrack” existing programs will require help in navigating the difficult political course that lies ahead of them.


6: Parental and Family Involvement in Education


Executive Summary


Summary of research findings


       This paper reviews the research evidence relevant to understanding the relationship between parental involvement and children’s performance in school.  Indicators of parental involvement with school (e.g., attendance at school events, parent/teacher conferences, PTO) have mixed associations with children’s school performance.  In contrast, measures of parental involvement at home (e.g., talking to children about school-related matters, high educational expectations, warm and consistent discipline) show consistent associations with children’s school success.  But even this evidence – based on correlations – may not represent causal relationships, and so some critics maintain that what parents do has little effect on children’s school performance.




·          Programs designed to promote parent/teacher interaction should be continued, but with greater emphasis on initiatives designed to improve the parent/child relationship.

·          Programs should be promoted that increase the amount of time low-income children are exposed to school-based activities, whether through more after-school programs, summer activities, or year-round schooling.




6: Parental and Family Involvement in Education


By Douglas B. Downey

The Ohio State University


By the age of eighteen, children have typically spent only 13% of their waking life at school,[349] and there are credible reasons for believing that parents have a role in shaping whether the remaining 87% is spent in a way that promotes school success. The current research evidence provides some guidance for understanding the kinds of parental involvement that most likely improves children’s school performance, although limitations of this work merit attention.


Parental and Family Involvement Research


Research on parental involvement in their children’s education covers two broad areas: the effects of parental interaction and involvement in the school, and the impact of parental involvement in the home. Research has examined both the norms of parental-school interaction at various levels of society, and the efficacy of special efforts to enhance parental involvement with school activities.


Parents at School 


Parents and teachers


There are several reasons for believing that good parent-teacher relationships are conducive to children’s school performance.  Izzo, Weissberg, Kasprow, and Fendrich[350] explain: “When parents communicate constructively with teachers and participate in school activities, they gain a clearer understanding of what is expected of their children at school and they may learn from teachers how to work at home to enhance their children’s education” [351]   When parents attend parent/teacher conferences, for example, it creates continuity between the two dominant spheres of influence in the child’s life, home and school,[352] and likely signals to children the parents’ value for education.  In addition, some have argued that children learn more when they receive consistent messages from home and school.[353] Epstein[354] writes that the “main reason...for better communications and exchanges among schools, families, and community groups is to assist students at all grade levels to succeed in school and in life.” [355] 

But what is the evidence that children’s school performance in enhanced by a strong parent-teacher relationship? Stevenson and Baker report that children performed better in school (as measured by teacher ratings of how well the child performed in school and whether the child performed up to his or her ability) when teachers rated the parents as actively involved in school activities such as PTO and parent-teacher conferences in their sample of 179 children drawn from the Time Use Longitudinal Panel Study.[356] Similarly, Grolnick and Slowiaczek studied 300 11-14 year-olds and found a strong association between teachers’ reports of parental involvement (measured as frequency of attendance at parent-teacher conferences, open school night, and school activities and events, such as the PTO) and teacher reported grades, controlling for parents’ education.[357]

But several studies report the opposite pattern: an inverse relationship between parent/school contact and children’s school success.[358] Desimone analyzed the National Education Longitudinal Study (NELS), a nationally representative sample of nearly 25,000 eighth graders collected in 1988, and found negative associations between parents’ contact with the school regarding academic matters and students’ math and reading test scores and grades.[359]  Rigsby, Stull and Morse-Kelly suggest that one reason for this puzzling pattern is that parents may become involved with adolescents’ schooling when the youths experience either behavioral problems or poor grades.[360]  Unfortunately, cross-sectional data do not allow us to assess that possibility.

Some study designs avoid the limitations of correlational research by comparing children involved in an intervention program with those who did not experience the intervention. Moses et al. report the results of an intervention in which parents were involved in children’s schooling in several ways: as project leaders, through informational meetings, through participation in workshops, and by acting as voluntary classroom helpers.  In this study, students demonstrated a marked increase in math performance compared to the achievement of students from previous years lacking this parental involvement intervention.[361]  Although it is impossible to know if the intervention program was the only major difference in the children’s experiences across the different school years, the results of this study are consistent with the claim that nurturing parental involvement in the classroom can improve school performance.


School-level parental involvement


Children may experience some benefits from their parents’ involvement at school, but do they also fare better merely by attending a school where many other parents are highly involved?  One argument is that children benefit from school-level parental involvement because it promotes information sharing and greater normative control over children’s behavior.  Coleman described how “social closure,” i.e., environments in which parents know each other, facilitates children’s identification with school.[362]  Podolny and Baron[363] explain that “a cohesive network conveys a clear normative order within which the individual can optimize performance, whereas a diverse, disconnected network exposes the individual to conflicting preferences and allegiances within which it is much harder to optimize.” [364]  As an illustration, if most parents strictly enforced homework rules then it becomes more difficult for any single child to resist because they are exposed to an environment where doing homework is normative.  In this way, children benefit from their own parents’ school involvement but also by attending a school where many parents are involved.[365]

While this argument has face validity, to date it receives only modest empirical support.  Carbonaro found mixed support for Coleman’s claims.  He reported that social closure was related to better performance on mathematics test scores and a decrease in the probability of dropping out, but had no effect on reading test scores or grades.[366]  Importantly, other researchers analyzing the same data concluded that social closure was associated with lower math test scores,[367] and so the debate regarding the benefits of social closure in school persists.[368]

Other researchers have asked how much variation in students’ scores on achievement tests can be attributed to school-level differences in parental involvement.  The answer, apparently, is very little.  Sui-Chu and Willms analyzed NELS data and concluded that while schools did differ in the level of involvement associated with parental volunteers or attendance at parent-teacher conferences, school-level parental involvement plays only a very small role in explaining students’ math and reading test scores.[369]  They concluded that while schools vary in the degree to which parents are involved in school activities, relatively few schools have a strong influence in shaping the learning climate at home, the dimension of parental involvement most closely related to students’ school success.[370]

Intervention studies also show little evidence that school-level parental involvement has any significant impact on students’ school performance.  For example, a recent intervention termed CoZi (Co for James Comer and Zi for Edward Zigler) involved:

1.)    parent and teacher participation in school-based decision making that is grounded in child development principles;

2.)    parent outreach and education beginning at the birth of the child;

3.)    child care for preschoolers and before- and after-school care for kindergarten through sixth graders; and

4.)    parent involvement programs.

In initial evaluations comparing one CoZi and one non-CoZi elementary school, the CoZi school had better school climate and parental involvement than the comparable non-CoZi school, but parent-child interactions and children’s level of achievement were not improved.[371] Of course, it is possible that the children experienced no improvements in school performance because the program was only in effect one year.

Taken as a whole, the current research evidence suggests that parent involvement in children’s schools via attending parent-teacher conferences, contacting school officials, attending school events, and developing a close-knit community where many parents know each other, probably has  modest positive effects on children’s school performance.  If parents are serious about helping their children do well in school, improving their relationship with teachers and involvement in school activities are worthy goals. The bulk of research evidence, however, suggests that how parents interact with their children at home matters more.


Parents at Home


There are many reasons for believing that what parents do at home plays an important role in shaping children’s school-related skills.  One piece of evidence comes from the recently collected nationally representative Early Childhood Longitudinal Study–Kindergarten Cohort of 1998-99.  Eighteen percent of children entering kindergarten in the U.S. in the fall of 1998 did not know that print reads left to right, where to go when a line of print ends, or where the story ends in a book.[372] At the other end of the spectrum, a small percentage of children beginning kindergarten could already read words in context.  These large differences in beginning skills likely represent varying levels of exposure to print in the home. More evidence that what happens at home is important comes from researchers making seasonal comparisons–comparing students’ cognitive gains during the summer and winter.  Three independent longitudinal studies reach the same conclusion:  disadvantaged children lose ground primarily during the summer, when school is not in session and parents’ influence is primary.[373]  But if home practices matter so much, what exactly do parents do that promotes children’s school success?


Parenting Style


To the extent that school-related skills, both cognitive and social, are shaped by parenting approaches, parents play an important role in preparing children to meet teachers’ demands. One characteristic of parents that is consistently related to children’s school performance is the expectation parents have for their child’s educational future.  Children with parents who hope and expect them to do well are more likely to do well in school than their counterparts with parents who do not have high educational expectations for their children.[374]

But other work suggests that the best parenting approach combines high expectations with parental responsiveness or warmth.  One idea popular among developmental psychologists is that an authoritative parenting style, characterized by a balance between parents’ expectations and responsiveness, promotes children’s self-esteem, mastery, and ultimately school success.[375]  The argument is that children benefit from authoritative parenting because parents establish and consistently enforce rules and standards for their children’s behavior using nonpunitive methods of discipline.  Authoritative parents are warm and supportive and encourage communication with their children while validating the child’s individual point of view.  In contrast, children’s development is said to be less consistent when exposed to permissive parenting (low expectations and high responsiveness) or authoritarian parents (high expectations and low responsiveness).

Some empirical evidence is consistent with this view.  Dornbusch, Ritter, Leiderman, Roberts, and Fraleigh studied 7,836 high school students in the San Francisco Bay area and found associations between the parents’ style of interaction (reported by the student) and students’ grades that persisted despite statistical controls for parents’ education, race, family structure, and the child’s sex. Students describing their parents as employing an authoritative style performed best in school, while students with authoritarian and, to a lesser extent, permissive parents were more likely to have lower school grades, net of controls.[376]  Similarly, in their study of adolescents in nine different high schools, Steinberg, Lamborn, Dornbusch, and Darling found that students with authoritative parents took greater responsibility for their school outcomes.[377]

Other studies, although not employing Baumrind’s “authoritative/authoritarian” nomenclature, supplement our understanding of parental practices that are related to children’s school success.  The Children of the National Longitudinal Survey of Youth (CNLSY) and Infant Health and Development Program (IHDP) employ the Home Observation for Measurement of the Environment (HOME) scale to assess the quality of the child’s home environment.  The scale is based on interviewers’ observations and questions of the mother.  It includes measures of learning experiences outside of the home (e.g., trips to museums, visits to friends, trips to the grocery store), literary experiences within the home (e.g., child has more than ten books, mother reads to the child, family members read newspaper), cognitively stimulating activities within the home (e.g., materials that improve learning of skills such as recognition of letters, numbers, colors, shapes, and sizes), punishment (whether child was spanked during the home visit; maternal disciplinary style), maternal warmth (mother kissed, caressed, or hugged the child during the visit; mother praised the child’s accomplishments during the visit), and the physical environment (whether the home is reasonably clean and uncluttered; whether the child’s play environment is safe).  A one standard deviation increase on the HOME scale was associated with a 9-point gain on the PPVT-R vocabulary test.[378]  Phillips et al. conclude: “For parents who want their children to do well on tests (which means almost all parents), middle-class parenting practices seem to work.”[379]

Similarly, in once-monthly observations of 40 families over a two and a half year period, Hart and Risely[380] found several dimensions of parenting style that were related to the child’s subsequent performance on IQ tests.  They conclude that three primary dimensions of parenting are what matter:

1.)    The absolute amount of parenting per hour (e.g., how often the parent is in the child’s presence, the percentage of child activities in which the parent took a turn, the number of words the parent speaks to the child);

2.)    parents’ social interaction with the children (e.g., the percentage of child’s initiations the parent responds to); and

3.)    the quality of speech to the child (e.g., how often did the parent repeat child utterances, the percentage of parent utterances that were questions, and the absence of prohibitions such as “stop,” “quit,” or “don’t”).

The third factor, quality of speech to the child was the strongest predictor of the child’s later IQ.  They conclude that “[t]he major differences associated with differences in IQ were the extensive amount of time, attention, and talking that higher SES parents invest in their children and their active interest in what their children have to say.”[381] 

Clark’s 1983 study also is consistent with Baumrind’s emphasis on warm and responsive parenting.  Clark  studied 10 African-American children, half of whom were successful academically and half of whom were not.  Clark reported that parents of high-achieving students had a distinct style of interacting with their children.  They created emotionally supportive home environments and provided reassurance when the children encountered failure.[382] 

Other studies also show evidence of parental involvement in the child’s school planning as important.  Using the NELS, Sui-Chu and Willms developed four dimensions of parental involvement: 1.) home discussion, 2.) home supervision, 3.) school communication, and 4.) school participation.  They report that “of the four types of involvement, home discussion was the most strongly related to academic achievement.”[383] The pattern they found was replicated by others.[384] This association may represent greater parental interest in the child’s progress, greater involvement in negotiating course selection, guidance in how to handle school problems, or a number of other ways parents help their children with schooling.

Children whose parents provide structured, adult-supervised activities at home tend to do better on cognitive tests and earn better grades.[385] Clark found that parents of successful students actively helped them organize their daily and weekly schedules and monitored this schedule closely to ensure that it was followed.[386]  Similarly, Taylor reports that family routines (e.g., “My family has certain routines that help our household run smoothly”) are associated with success in school.[387] Children may benefit from structure because it promotes the development of school-related habits that teachers tend to reward (e.g., consistent attendance, attentiveness, consistently turning in homework, not disrupting class).

Parents’ linguistic styles are also related to children’s school success.  Children do better in school when their parents verbalize instructions frequently and specifically .[388]  Parents use of verbal variety and detailed instruction are features of language associated with high academic achievement among children.  Further, the parents of high-achieving children tend to be closely attuned to the cognitive level of their children and to respond more to individual cues their children give than to preconceived expectations or status rules for children.[389] 


Reading to Children


Not surprisingly, several research strains suggest that children whose parents read to them 20 minutes or more a day during the pre-school years have substantially higher pre-reading skills when they enter kindergarten.[390] When analyzing the Children of the National Longitudinal Survey of Youth (CNLSY), Phillips et al. note that five- and six-year-olds’ vocabulary scores are about 4 points higher (one-quarter of a standard deviation) when their mothers read to them daily as opposed to not at all, net background controls.[391] Furthermore, a British study suggests that parental reading may be more effective than reading with someone else.  The authors report that children benefited more from being read to aloud two to four times a week from books sent home from school than did children receiving additional assistance at school from a tutor.[392]


Educational opportunities


Some research suggests that children’s school performance is better when the home has a variety of educational objects, such as books, newspapers, a computer, magazines, and a place to study.[393] While children would obviously not benefit from books in the home if never opened, the presence of books or a computer provide the child with the opportunity to develop school-related skills.  In addition, there is an association between the amount of money parents save for children’s educational future and school performance.[394]  It is not clear whether this money directly influences children (by providing the message that they are expected to go to college) or if it is merely correlated with other parental characteristics that matter. 




Finn suggests that helping with homework is a concrete way that parents demonstrate the commitment they have to education.[395]  Surprisingly, however, there is little research support for this claim.  Many studies based on the NELS sample of eighth graders show an inverse association between parents’ help with homework (or rules about homework) and youths’ performance in school,[396] although most suspect that this association is a result of parents deciding to help a struggling child.   In addition, parents’ effectiveness may depend on their level of education. Balli, Wedman, and Demo found that students whose parents held a college degree benefited more from parental involvement with homework than did students whose parents lacked a college degree.[397]


Cultural capital


Bourdieu posited that students receive academic rewards not just for course knowledge, but also for signaling affiliation with elite groups (i.e., “cultural capital”) through their speech, style, mode of dress, and other habits.[398]  Bourdieu viewed cultural capital as arbitrary – he argued that the cultural practices of the elite are not inherently “better” than those of the disadvantaged – but cultural capital associated with elite culture tends to be rewarded in the classroom.  From this perspective, some of the skills or habits children need to develop for school success are not necessarily “good” but are simply the ones rewarded by teachers.[399]  Consistent with these claims, DiMaggio reported that U.S. high school students received higher grades, net of socioeconomic status, when they reported interests (e.g., interest in being a composer) and involvement (e.g., attending literature readings) in art, music, and literature.[400]  Other researchers have also noted that children tend to do better in school when they have been exposed to events or activities outside of school such as art and history museums, or music and dance lessons.[401] 

What are some of these cultural skills for which children are rewarded?  Swidler describes a tool kit of cultural skills, habits, and styles as largely ingrained behaviors.[402]  These might be as simple as understanding appropriate kinds of responses to teachers’ questions about a book[403] or understanding that print reads left to right, where to go when a line of print ends, or where the story ends, three skills that nearly one in five children entering kindergarten in America lack.[404]  This cultural tool kit may also contain non-cognitive skills that are important for negotiating the student role.  For example, students who can demonstrate the appropriate level of attentiveness, persistence at tasks, eagerness to learn, and organizational skills are more likely to earn good grades.[405]


Low-Income Parents


Many of the parental practices described above are highly correlated with socioeconomic status, and so it is likely that one of the reasons children from disadvantaged backgrounds do less well in school than their more advantaged counterparts is because their parents’ interaction style less successfully prepares them for school.  Indeed, some scholars report that the typically positive effects of socioeconomic status on children’s school performance are mediated entirely by parenting practices.[406]  It is difficult to discern precisely how related parenting styles and socioeconomic status are, but it is clear that there is substantial overlap. 

In his classic 1969 book, Class and Conformity, Melvin Kohn offers one reason for this overlap.  He argued that parents’ style of interaction with their children is influenced in important ways by the parents’ occupations.  Parents who work in jobs with little autonomy (e.g., data entry) and are rewarded for adherence to external standards (e.g., being on time, being neat, obedience to authority), tend to parent in ways that prepare their children for success in these same kinds of jobs.   Kohn found that working-class parents put more emphasis on obedience than did middle-class parents.  In contrast, parents in occupations that allow for more self-determined activities  and decision-making tend to promote their children’s skills for assuming these kinds of middle-class occupations.  The middle-class parent, therefore, uses a less punitive style of discipline and puts greater emphasis on developing children’s internal controls. From Kohn’s perspective, both low- and high-socioeconomic parents want what is best for their children; they are simply teaching their children the skills they deem necessary for success in the world.  Through the working-class parent’s interaction style, however, he or she unwittingly increases the likelihood that the child will remain in the same social class position.[407]

Socioeconomic position is also related to how parents interact with teachers and school officials. Lareau observed parent/teacher relationships in a working-class and a middle-class community and reported that teachers in both communities made active efforts to involve parents, but that low-income parents were less involved. Working-class parents were less likely to attend parent-teacher conferences, for example, in part  because the costs of attending – in terms of obtaining transportation, securing child care,  and rearranging work schedules – were typically greater for working- than for middle-class  parents.  In addition, working-class parents were more likely than middle-class parents to espouse a view that it is the school's job to educate their children.[408]  Lareau writes: “Working-class culture ... promotes independence between the spheres of family life and schooling.”[409]  In contrast, middle-class parents were more likely to view their child’s education as partly their own responsibility, along with the school’s.  Working-class parents were less involved with teachers for other reasons too.  They were less comfortable interacting with teachers, in part, because they reported feeling unqualified to discuss academic problems.  When they did have contact with teachers, working-class parents often discussed non-academic issues such as bus schedules or playground activities.

Others note how language differences across class end up shaping success in school. Bernstein describes how parents of low socioeconomic status tend to use a “restricted” language code in which language is embedded in context, reflects the status of individuals, and minimizes the need to make one’s meaning explicit.  In contrast, higher socioeconomic parents use an “elaborate” code that is less context-based and more individualistic so that language is used to make meaning more explicit.[410]  To illustrate this difference Bernstein offers two vignettes of a mother and a child riding a bus.  In the lower socioeconomic pair, the mother’s mode of control relies on commands with little explanation (e.g., “Hold on tight”) and reflects the hierarchical view of the adult-child relationship (“I told you to hold on tight, didn’t I?”).  In the middle socioeconomic group, the interactions are less hierarchical, and the mother provides a learning opportunity by using language to explore the situation (“If you don’t hold on tight, you will be thrown forward and you will fall,” “If the bus stops suddenly, you’ll jerk forward on to the seat in front.”) Bernstein notes that an important educational consequence of these two different approaches to language is that the relatively context-independent style used by the middle-class parent matches that expected by school teachers.[411]

In addition, low-income parents experience greater financial stress and health-related problems than other parents, and both of these may impede their ability to develop consistent routines.  Children perform better in school when their learning is not compromised by hunger, distracting physical ailments, lack of adequate sleep, unattended visual limitations, or other health related problems.  Ear infections during the early years (before age four) pose a special problem because they can alter the functioning of the middle ear and thus affect the child’s hearing and, consequently, language development. A report from the National Institute on Early Childhood suggests that treating middle ear infections is crucial to children's language development.[412] Kellaghan et al. note that iodine deficiency during pregnancy, zinc deficiency, and iron deficiency have long-lasting consequences for children’s development.[413]  There is also greater drug and alcohol abuse among the poor, factors that work against consistent routines.  While some low-income parents may benefit from instruction on developing home routines, for those low-income parents who suffer from drug and alcohol abuse or experience stress related to financial problems, health problems, or both, it unlikely that they will make substantial progress in developing home-based routines while these underlying problems persist.


Implications For Practice And Policy


The current evidence suggests that there may be some profit in improving parent/teacher relations, but that a more effective way to improve children’s school performance involves improving parent/child relations.  This is disconcerting news for policymakers, because parent/child relations are much more difficult to affect via policy than parent/teacher relations.  And for low-income families, part of parents’ interaction style – linguistic style, for instance – is likely rooted in class position and may not be fundamentally altered unless  class position changes.  If parents are unlikely to change what they do at home unless their class position is improved, one policy approach is to increase the amount of time that children are with teachers via after school programs or year-round schooling.  Given what has been already noted from seasonal  comparison research – that the gap in cognitive skills between advantaged and disadvantaged children emerges primarily during the summer – low-income children would likely benefit the most from more exposure to schooling.

The record for changing parents’ home behaviors in ways that affect children’s school performance is not encouraging.  White, Taylor and Moss carefully reviewed the results of 172 studies ranging from those training parents to improve children’s developmental skills (e.g., motor, language) to those where parents were classroom aides.[414] Surprisingly, they find little evidence that parental involvement matters. They conclude that “there is no convincing evidence that the ways in which parents have been involved in previous early intervention research studies result in more effective outcomes.”[415]

However, one recent study shows success. Children participating in the Chicago Child-Parent Center Program enrolled in half-day preschool at ages three to four years and were exposed to rigorous reading lessons in small classes while their parents were involved in activities with other parents (e.g., educational workshops, reading groups, and craft projects), volunteered in the classroom, attended school events and field trips, and were encouraged to complete high school.  Further, the program included health and nutrition services, health screening, speech therapy, and nursing and meal services. Results suggest that children in the program were more likely to graduate from high school and less likely to be arrested 15 years later than similarly matched children.[416]  Because involvement in the program was not a result of parent initiative – parents were actively recruited for the program – it is unlikely that the advantages for participants merely represent the selectivity of more involved parents.[417]


What Have Critics Said?


The vast majority of research on parenting practices is correlational, and so an important concern is that the observed associations between parenting practices and students’ school performance represent mere correlations, not causal relationships.  This position has received considerable attention lately from behavioral geneticists who assert that the role of parental behaviors has been overstated in the social sciences and that genetic influences have been understated.[418]  With respect to the impact parents have on children through shaping the home environment, Scarr writes:

It is clear that there are family differences, but it is also clear that most of those differences are not environmental.  Among families in the mainstream of Western Europe and North American societies, differences in family environments seem to have little effect on intellectual and personality differences among their children, unless they are seriously deprived of opportunities and support... [g]ood enough, ordinary parents probably have the same effects on their children’s development as culturally defined super-parents.[419]


For purposes of understanding how parental involvement influences children’s school performance, this debate is especially important because, if the behavioral geneticists’ position is correct, most parents cannot affect their children’s school success much. 

An example of the problem of determining causality with correlational studies can be illustrated with the frequently found negative relationship between how often parents help with homework and children’s school performance.[420] The idea that more help is associated with poorer performance strikes one as counterintuitive.[421]  But this association probably represents parents’ response to children’s need for help.  The kinds of children needing help are different (probably poorer students) than the kinds of children who easily complete their homework on their own.  In a typical correlational study, researchers try to address this possibility by statistically equalizing the two groups on characteristics such as income, education, family structure, race, urban/suburban/rural location, and other factors they suspect might be different between the kinds of parents who supervise children’s homework versus those who do not.  They would also statistically control for the child’s previous performance in school. 

These attempts to isolate the unique effect of “parental involvement in homework” are limited, however, because we usually cannot measure or even conceive of all of the ways the two groups of parents and their children may be different.  As a result, despite statistical controls we probably fail to obtain unbiased estimates of the true effect of parental help with homework.  Of course it is possible that children’s school efforts really are hampered rather than helped by their parents, but few researchers espouse this view.  A more likely explanation is that the statistical controls used to equalize the two groups are inadequate.   But if this kind of problem affects our ability to estimate accurately the effects of parental involvement with homework, it likely affects our estimates of other parental behaviors too, casting doubt on nearly all of the parental involvement literature.[422]

Another example of the behavioral geneticists’ position can be understood by considering the associations between “good” parenting practices and children’s school success.  Behavioral geneticists note that parents influence children in two ways, by providing their home environment but also by passing on genes.  If parents who create good environments are also parents with good genes, associations between good parenting behaviors and students’ school success may have little to do with parenting actions and may simply represent the genetic advantages typical of parents who also happen to use good parenting practices.  This line of thinking posits that the correlation between “good” parenting (authoritative) and children’s school success may be a function of parents with genetic advantages (high intelligence, easy disposition) having children with similar advantages and also parenting in the culturally prescribed way.  Among middle-class Americans of European descent this means an authoritative approach.  Because other racial/ethnic groups favor other parenting styles, genetically advantaged parents in other groups might not use authoritative parenting.  Asian Americans, for example, more often use authoritarian parenting, yet Asian-American children often do well in school.[423]  Correlations from most parenting studies could be reinterpreted as the effects of good genes rather than good parenting.

It is difficult to discern between environmental and genetic explanations with correlational data, so one approach is to look at whether adopted children are more like their adopted parents (who provide their environment) or their biological parents (who provide their genes).  In terms of scores on intelligence tests, it appears that adopted children are more like their biological parents, even if they were adopted at birth [424]   Another analytically powerful comparison is to look at children who are similar genetically but who have experienced different environments: identical twins reared apart.  Of course, identical twins are rare themselves and so finding identical twins raised apart is nearly impossible. Researchers at the University of Minnesota have collected data on more than 100 pairs, however. Analyses of these data show surprising results – the twins are more alike each other than we would expect, even when unaware of each other’s existence for most of their lives.[425]  While the position that children’s outcomes are more readily understood via genetics rather than environment may strike many as unlikely, it is not easily dismissed based on the current empirical evidence.

The implications of this position – that parental involvement matters little – are clear for policy:  Only programs designed to raise children out of the very worst environmental conditions would be effective.

Several important issues regarding heritability studies are still debated.  For example, critics point out that it is not clear how much contact occurred between some of the identical twins raised apart in heritability studies.  Identical twins in these studies vary in many ways (e.g., the age at which they were separated and the difference in the kinds of home environments they were raised); ideally all would have been separated at birth and raised in randomly different environments, but, of course, it would not be ethical to set up such an experiment prospectively. Another issue has to do with the attempt to neatly separate environmental and genetic effects.  Critics claim that genetic and environmental effects interact and so typical heritability studies underestimate environmental contributions.[426] For example, a temperamentally difficult child may be difficult for genetic reasons, but this child also evokes harsh parenting.  Perhaps identical twins reared apart are similar to each other because they end up evoking similar environments (they look alike), and only modestly so because of shared genes.  If this interpretation of heritability studies proves true, then how parents interact with their children matters.


Summary and Recommendations


The research available to date on the subject of parental involvement in education yields conclusions about what we know as well as what we don’t know.

It is unlikely that increasing parents’ participation at PTA meetings and in helping with homework alone will have a substantial impact on children’s school performance.  Programs that successfully raise children’s school performance via parental involvement do so by meeting the broad needs of parents. For example, the success of the Chicago Child-Parent Center Program is probably a function of the wide range of services provided to parents, including educational workshops, reading groups, and craft projects, health and nutrition services, health screening, speech therapy, and nursing and meal services.

There is little reason to believe that the kinds of policy initiatives employed in the past – even the Chicago Child-Parent Center Program – will dramatically affect the gap in performance among students from low- and high-income families.  After-school and year-round programs will probably benefit low-income children the most.

It is not clear that children’s performance in school is solely or perhaps even primarily a function of parenting style.  While children’s school success is associated with parenting approaches, this association is culturally specific (e.g., authoritative parenting is used among the parents of successful white students but authoritarian parent is used among the parents of successful Asian-American students in the U.S.) and may represent, in part, the genetic similarity between parent and child.

These conclusions support the following policy recommendations to enhance parental involvement:

·        Programs designed to promote parent/teacher interaction should be continued, but with greater emphasis on initiatives designed to improve the parent/child relationship.   Programs designed to meet the broad needs of parents (e.g., improving parents’ reading skills, reducing financial stress, meeting health and nutritional needs) are likely to be the most successful.

·        Programs should be promoted that increase the amount of time low-income children are exposed to school-based activities, whether through more after-school programs, summer activities, or year-round schooling.



7: Public Schools and Their Communities


Executive Summary


Summary of Research Findings


Although limited largely to case studies, research has documented a wide range of programs that have expanded public schools’ involvement with the communities in which they operate. Such programs face a variety of challenges that range from institutional rivalries to competition for scarce financial resources. Operated effectively, however, than can contribute to improved achievement by students living in poverty.




·          Basic parental involvement programs should be enhanced to include multiple opportunities for formal and informal communication between school personnel and parents.

·          Parental involvement programs should be developed that embrace the ethnic, linguistic, cultural, racial, and religious diversity of the parents.

·          Parental involvement programs should be designed to be sensitive to the special needs of poor parents, single parents, parents with large families, and those families where both parents work outside of the home.

·                 Written materials should be provided in the language with which parents are the most familiar.

·          Schools and other social organizations wishing to provide school-linked services should carefully consider the scope, funding needs, organizational and professional complexities, and types of services to be offered.

·          Funding for new community involvement projects should be kept consistent and stable. The bigger and more complex the project, the greater the need for adequate funding.

·          Extra-curricular programs should be kept vital to help foster strong parental involvement.

·          Educational leaders and policy makers should be encouraged to reconceptualize the public school as a vital economic resource that must be nurtured.



7: Public Schools and Their Communities


By Catherine Lugg

Rutgers University


The interplay between public schools, their respective communities and child welfare has been an area of public policy concern for well over a century. From as early as the late 19th century, various educational and social reformers have sought to strengthen the ties between schools and communities in hopes of bolstering better outcomes for children, as well as building stronger, more functional communities.[427] Yet, many of the same problems reformers faced over a century ago stubbornly remain: low parental involvement, the deleterious effects of concentrated poverty, inappropriate pedagogy and policy, racial and ethnic economic isolation, dysfunctional families, and ineffectual political leadership.[428] Each issue can hinder an individual child’s educational achievement, but the interaction of multiple factors can be devastating.[429]

This report seeks to map out this history and the contemporary research literature regarding the interaction of public schools, their communities, and student outcomes, especially academic achievement. It reviews some of the major consistencies within the research literature, particularly during the past 15 years. It also notes some of the major criticisms regarding school to community outreach, including some of the lingering paradoxes. This report pays particular attention to what reforms seem to work best with poor children and concludes with recommendations for the best choices in educational practice and policy making.

Schools and Community Research


An Historical Overview


During the late 19th and early 20th centuries, educational and social reformers pushed for an expanded role for public education. Many were deeply concerned by the exploding numbers of poor and destitute children who seemed to overwhelm local schools, particularly in the nation’s booming urban centers. Cities were also faced with an ever-enlarging immigrant population, many of whom had little education or economic resources.[430] In hopes of improving the lives of children, educators and social reformers sought to expand the mission of the public school. Not only would the public school educate, but it would also bathe, feed and inoculate needy children. Their mission did not stop there: all children, many of whom were either immigrants themselves or children of recent immigrants, would be “Americanized.” They would learn the dominant social, political, and cultural norms of mainstream – and at that time, largely Anglo – America.[431] Reformers of the era viewed the public school as a linchpin in the process of “child-saving.”[432] By the 1910s, numerous city schools offered gyms, school nurses, playgrounds, shower facilities, and even school lunches.[433] Some locations offered adult education classes for parents, held typically at night, not only to build their own language skills and knowledge base, but also to learn new parenting skills. In other instances, teachers would visit their students’ homes in hopes of fostering better communication between the school and parents, as well as building a consistency of academic and behavioral expectations. Urban districts began to use the “school newsletter” as a means of communication with parents and the public at large.[434]

These efforts to better link the schools with their communities were rooted in the late 19th century sociological notion of building “social ecology,” or improving the overall environment in which children and their parents lived. For many children, their lives did improve. Public schools not only ameliorated the harshest effects, but also offered children the promise of a way out of poverty. Attendance rates soared as immigrant children in particular streamed into the public schools. By 1908, a larger percentage of immigrant children attended public schools than did their “native-born” peers.[435]

These services came at costs that were both personal and fiscal, however. The personal costs were generally borne by those who were receiving the help. To become “Americanized” meant that children had to relinquish the cultural practices and norms from the “old country.” In practical terms, it meant that many immigrant children were taught that their heritage (and by implication, their parents) were inferior. Accordingly, teachers and administrators treated with immigrant parents more than a whiff of condescension. As one educator explained:

They must be made to understand what it is we are trying to do for the children. They must be made to realize that in forsaking the land of their birth, they were also forsaking the customs and traditions of that land; and they must be made to realize an obligation, in adopting a new country, to adopt the language and customs of that country.[436]


In addition to problematic relations between the schools and parents, both the textbooks and teachers could be hostile towards non-Anglo children, with more than a few hurling racial, ethnic and religious slurs.[437] In 1903, reporter Adele Marie Shaw recounted that one elementary teacher bellowed at one child, “You dirty little Russian Jew, what are you doing?”[438] 

Finally, the help tended to be imposed whether or not students and their parents believed they needed assistance.[439] The assumption of the era was that professional educators were far better prepared to assess the welfare of children than were their immigrant parents.

The greatest drawbacks to extending more services to “children at risk,” though, were fiscal, and these financial drawbacks were rooted in the politics of the era. These efforts were subjected to intense scrutiny on the heels of the 1917 Russian Revolution, with some commentators noting that such social programs were dangerously “socialistic.”[440] For years, public education had been under the policy microscope regarding its seeming lack of fiscal accountability, possible political radicalism,[441] and instructional inefficiencies. Thanks to the churning political environment, much of the tax money for greater social services, which, in some places, was extensive and expensive, evaporated.[442] In hopes of maintaining political and fiscal support, public school leaders scrambled to deflect criticism, and many embraced the new “science” of public relations, touting public education’s ever-increasing efficiency.[443] Until the late 1980s, efforts to do community outreach and communication to bolster student academic outcomes would become largely one-way –with information flowing only from the schools to the families.[444]


Schools and Communities Today


By the late 1980s and early 1990s, researchers, educators, social service providers, and policy makers were alarmed at the rising number of children in crisis, particularly in poor urban areas. Many states had curtailed social service provisions to offset budget shortfalls. Additionally, the federal government had greatly reduced its level of fiscal involvement with poor children beginning in the early 1980s.[445] Concurrently, the number of children in poverty was rising. As researcher Joy Dryfoos observed in 1994:

By 1991, more than fourteen million children – 22 percent of all children – lived in families below the poverty line, the highest number and rate since 1965. As in no other period of time, disadvantage shifted from the oldest people to the youngest. And those children living in mother-only households have become the most deprived of all, with more than 55 percent living in poverty.[446]


Such social and economic turbulence was adversely affecting many students and their academic achievement. This turbulence was also coupled with increased political concern regarding public education and its possible adverse effects on the nation’s economic competitiveness.[447] Public schools leaders, community members, social service providers, policy makers and researchers took a renewed interest in rebuilding the social ecology of local public schools in hopes of fostering better academic outcomes, and in turn, stabilizing the social environment—thus revitalizing a national economy.[448]

States and the federal government began to explore the notion of “systemic reform,” or coordinating the various governmental policies that affect children in a more holistic fashion to improve both their current lives and their long-term life chances.[449] For education, and urban education in particular, this meant involving various branches of government in efforts to better link schools to the communities they serve.[450]

Many of these new reform efforts drew on the work of the sociologist James S. Coleman. In the early to mid-1980s, Coleman and his colleagues had studied the academic effectiveness of urban Catholic schools. He theorized that the reason Catholic schools seemed to generate better outcomes for their students was that these schools and their students enjoyed a high degree of “social capital.” Coleman further theorized that these schools in their particular communities were “functional communities,” because their members shared a high degree of what he called “intergenerational closure.” Additionally, the communities and the schools shared a strong interest in the general welfare of the students. Parents knew each other and each other’s children. The implications were that these schools functioned in relatively close-knit communities. Parents, school personnel, and community members cultured the relationships and shared norms (i.e. the social capital) that were critical to successful child rearing and schooling.[451]

There were criticisms of the Coleman studies, particularly regarding their possible utility and applicability for public schools. The critics noted three key differences between Catholic urban and public urban schools. First, Catholic schools tended to “cream” the academically strongest students (and their parents) away from the distressed urban schools. Additionally, students who attended Catholic schools did so voluntarily, unlike many of their public school peers. Finally, Catholic schools were free to expel students who failed to conform to either academic or behavioral expectations.[452]

Nevertheless, researchers and policy makers began to explore the possibilities that public schools, in conjunction with other community and social service groups, could build, rebuild or even expand the social capital of their communities. Reformers also drew on the earlier efforts of Progressive-Era reformers to strengthen the social ecology of school neighborhoods. Subsequently, multiple and various blueprints were designed; all aimed at bringing various stakeholders together.[453]


Full-Service Schools


By the early 1990s, well over 800 projects were aimed at fostering greater ties between schools and their communities.[454] States from California to New Jersey were experimenting with vastly expanded social service provision as well as experimenting with differing organizational structures,  including interagency collaboration and full-service school programs. These terms, as well as school-linked services, have been used in the research literature. They describe efforts to bring various social service providers together within a formal organizational structure—sometimes sharing a building, typically a public school—to share staff, resources, and responsibilities. All were to better serve children, their parents and the larger community.[455] In an age of continuing budget constraint, some early proponents of this approach argued that providers might even realize budgetary cost savings if the collaborating agencies could eliminate needless service duplication.[456]

These projects tended to be idiosyncratic in nature. As Joy Dryfoos noted in 1994, full-service schools, by design, were to be highly sensitive to the local contexts. There has been no one model of a “full-service school.” The disparate interagency collaborations have included personnel from public schools, child protective services, juvenile justice agencies, mental health agencies, public health departments, the medical system, as well as parents and other community members.[457]

Most of the extant evaluation research of these projects has been in the form of single-case or multi-case studies. However, common similarities across project sites include better attendance rates, lower substance abuse, and lower dropout rates. Additionally, “[s]tudents, parents, teachers, and school personnel report a high level of satisfaction with school clinics and centers and particularly appreciate their accessibility, convenience, and caring attitude.”[458]

Despite the encouraging signs, stubborn organizational and legal issues have been hard to resolve in these expansive undertakings. Some of the most vexing issues have been those of professional “turf,” client confidentiality and budgetary authority.[459] In the area of professional turf, for example, some school counselors have felt threatened by the presence of social workers from child protective services and were reluctant to share information.[460] Furthermore, child protective agencies and the criminal justice system at times were barred by law from sharing crucial information regarding children with school officials.[461] Finally, a number of collaborative efforts got snarled in various budgetary directives, many of which demanded single, rather than shared, lines of fiscal accountability.[462]

Another issue facing proponents of full-service schools has been maintaining consistent funding. A mixture of state and federal funds and private foundation grants has paid for many of these collaborative projects. These projects have been particularly vulnerable to shifting political winds. For instance, the movement suffered a setback by the withdrawal of funding by one major foundation. After a disappointing preliminary evaluation of an inter-organizational collaboration in June of 1994, the Pew Charitable Trust withdrew from a highly ambitious 10-year, $60 million project dubbed The Children’s Initiative. Pew concluded that to realize the positive changes envisioned by the initial project, even greater expansion of social service provision was needed. For the initiative to have even greater influence on children’s lives, it was going to move into areas such as housing, employment and drug abuse. It was already a large-scale and highly complex initiative that called for various service providers to fundamentally reconceptualize their professional roles and behaviors, while they continued to work in traditional bureaucratic environments. The weak initial evaluations regarding student outcomes in a political climate that had been hostile to tax-based social service provision made the project too politically risky for Pew to maintain its presence.[463]

Disappointing as this has been, the demise of Pew’s Children’s Initiative is congruent with what we know about educational reform. Historians David Tyack and Larry Cuban surveyed more than 100 years of educational and social reforms to determine which ones had staying power. They concluded the reforms that were institutionalized all had the following characteristics:

1.)        The reforms were adaptable to local circumstances.

2.)        Successful reforms were modest in approach and design.

3.)        Policy makers and regulators solicited and incorporated continuous input from those who had to implement the reforms (teachers, administrators, parents, etc.).

4.)        Successful reforms enjoyed strong and consistent political and fiscal support. Popular at the grassroots, these reforms encountered little opposition.

5.)        Successful reforms were relatively easy to implement and maintain (for example, structural or programmatic add-ons—adding kindergarten programs, the development of the junior high school, expanding the school lunch program to include a breakfast program, offering computer classes to parents after hours, etc.).[464]

Given these findings it is understandable that the Pew initiative was not sustainable. Yet, as Tyack and Cuban have demonstrated, there are effective reform efforts targeted at community building and parental outreach, which this report now explores.


Successful efforts at linking schools and communities


Parental involvement


In bolstering school community outreach, public school educators have used numerous strategies. Many of these are centered on increasing parental involvement in their children’s education and school. As researchers Daniel J. McGrath and Peter J Kuriloff observe:

For policy makers, parent involvement in schools represents a method for, first, improving schools' services to families by making schools more accountable to parents; second, strengthening ties between schools and families traditionally underserved by schools; and, third, better serving students by taking advantage of parents' rich stores of knowledge about their children.[465]


Additionally, the research base regarding the efficacy of parental involvement is strong, and these findings have generally demonstrated that parental involvement can have positive effects on student academic achievement. Students whose parents are involved with their education tend to have fewer behavior problems in school, fewer absences, and higher rates of academic achievement and graduation than those students whose parents do not get involved.[466] Additionally, those students who are failing can improve dramatically if parental assistance is cultivated by school personnel. In particular, ethnic minority students or those with learning disabilities can enjoy significant benefits if their parents are involved with their schooling.[467]

Many public schools use the traditional methods of soliciting parental involvement: hosting the open house or “parents’ night,” soliciting parent volunteers to help work during a special event, maintaining and PTA/PTO, sending parents a school newsletter, using infrequent notes and phone calls, and of course, issuing a regular report card. While these efforts are a good start, they have significant limitations. First, the more traditional approaches to cultivating parental involvement can lead to parents being guided and sometimes manipulated by teachers and administrators. Parents are carefully steered away from voicing concerns around contentious professional issues like grading policies and teaching style. The second criticism is that the traditional forms of parental involvement tend to be constrained. By design, the information is to flow in one tightly controlled direction – from the school to the parents. Parents tend to be viewed as “non-professionals” and hence have limited value in shaping the larger policy issues of the school. The third criticism is that the traditional forms of parental involvement tend to be representational, since many contemporary parents cannot participate. The traditional forms assume a stable, two-parent family, with one parent (typically the mother) working full-time as a homemaker. Given that the vast majority of adults with children work outside of the home (whether they are parents, grandparents, aunts, uncles, etc.), only a few “representative” parents can participate. The parents who can participate tend to be white and middle- to upper-middle-class. And finally, the forth criticism of the traditional forms of parental involvement is they expect parents to be passive. Parents are to receive information from the school, but the school does not seen to want much information from the parents.[468]

While the traditional forms of parental involvement do include some parents, there remains the potential to do more. Additionally, the traditional forms of parental involvement are strikingly ineffective at reaching out to families that are: (a) large, (b) headed by a single-parent (usually female), (c) poor, (d) non-English speaking, (e) abusive, or (f) include parents and older siblings who dropped out of school.[469]


Experiments to expand participation


Recognizing the problems of the traditional forms, public educators have experimented with a variety of reforms to encourage greater parental participation. One of the more recent innovations is the “open school.” In this approach, the school opens itself up to each member of the community and actively seeks their input. This has been called the “warts-and-all” approach, because community members get to see the school staff at their best, and possibly at their worst. Parents and other adults can drop in at any time of the day, to see how their children are doing, what else is going on within the school, and have meaningful conversations with teachers and educational leaders regarding their child’s education. Contemporary parents and guardians may not have schedules or consistently reliable transportation that permit them to visit the school during a scheduled (and formal) meeting. The open school demands a fair measure on flexibility on the part of the school personnel, many who have been socialized to view “their” school as a pedagogical island, removed from external forces and pressures, including parents. Yet, an open school grants parents and the community greater and very real access. It also provides parents with a meaningful sense of ownership, not only of the school, but also of their children’s education.[470]


Parental Education


Another reform aimed at boosting parental involvement is parental education. Some parents, particularly those who have had poor school experiences themselves, may need experiences as a co-learner, advocate, and decision-maker, so they can become their child’s educational advocate. Parent education programs encourage parents to become their children’s resident teacher, as well as the critical caretaker and nurturer.[471]

For example, Norwood and her colleagues designed a model program of parental involvement through the University of Houston’s Graduate School of Social Work and College of Education. They provided parent education to a school within the Houston public school system. The school had a high-percentage of students who were considered at risk for academic failure and came from poor socioeconomic backgrounds. All were African-American. Their parents were recruited to participate in an experimental parental education program that was focused on skills and that was also culturally and linguistically sensitive. Additionally, a sense of community among the researchers and participating parents was carefully cultivated, and the researchers took pains to blur distinctions between parents and the researchers. Participants were surveyed prior to the beginning of the program to determine what their needs and concerns were. Soliciting detailed input from parents also helped to establish ownership of the program by parents.[472]

The actual program focused on building parenting skills, as well as parents’ teaching or coaching skills. Throughout the sessions, parents were invited to share their knowledge and experiences in raising their children. This helped to validate parents’ knowledge and broadened the knowledge base of all participants. The parents also engaged in role-playing and various school- and home-related scenarios with Norwood and colleagues, so parents could practice their newly acquired skills.[473]

Six months after the program concluded, parents were asked to evaluate the program. All were very enthusiastic, and they had put their newly acquired skills immediately to work. As one woman explained:

Most parents just run up to the school, but she [the instructor] helped us to see there are two sides to every story. We did role-playing, one was the parent and one of us was the teacher. We also practiced how to ask teachers for the things we need. I used this when my little boy didn't have homework. I went to his teacher and she gave me some homework for him. (Ms. C)[474]


Another participant noted:

I felt the information on the parent-teacher conferences was very good. Now when I have to talk to my child's teacher, I am not as timid or afraid as I used to be. I have some say in his education. (Ms. W)[475]


The researchers then examined the subsequent academic achievement of these parents’ children. They scored significantly higher on standardized measures of achievement, in math and reading, than did the children whose parents did not participate in the program. The degree of difference surprised Norwood and her colleagues, since they were expecting only modest academic gains at best.[476]

The Houston parental education program succeeded in large part because it was attentive to the needs and concerns of urban, minority parents, as well as being respectful of their backgrounds. Parents were treated with respect and their cultural, linguistic, and racial backgrounds honored. This result is consonant with other researchers’ recommendations–that public school personnel who solicit parental involvement need be sensitive to the needs of an increasingly multicultural parent population to have greater and meaningful parental involvement with the schools.[477]


School-Based Management


Another parental involvement program that has been part of a larger school reform effort is school-based management (SBM). In this model, the authority for most decisions is delegated to the school site. In turn, the individual school establishes an SBM committee or council that is typically composed of teachers, parents, administrators, and perhaps, additional community members. The idea behind the reform is that those closest to real students know what policies, programs and budget expenditures will serve them best. Hence, the SBM council is empowered to make most of the policy decisions that affect that specific school. The ultimate goal is to improve the decision-making process and empower those closest to children to such an extent that student achievement improves.  The research regarding SBM’s effectiveness in bolstering student achievement is conflicted, although it does seem to improve the morale of the teachers who participate.[478]

A final note is in order regarding some of the more overlooked and undervalued parental involvement programs. Extra-curricular offerings have been a traditional form of parental involvement, perhaps the most popular of all informal forms of parental involvement. These programs, which range from athletics, music, drama, and arts programs to various student interest clubs, have historically involved highly disparate students. In terms of socioeconomic status, race, ethnicity, religious, gender, etc., these activities have produced enthusiastic parental involvement, regardless of background. Additionally, members from the larger community tend to get involved, if only as spectators. Research indicates that extra-curricular activities promote student academic achievement, in that they inhibit students from dropping out. The direct influence on improving student achievement is more tenuous.[479] In many distressed urban areas, extra-curricular venues are most vulnerable to budget reductions. This may be unwise given the strong connections that appear to be generated by these activities among students, their families, the schools, and members of the larger community.


Community Development


A more recent notion of strengthening school, community and parental interaction views the public school system as a critical economic resource. That is, like any other industry or business, it provides both services and employment to individuals located within a certain geographic area.[480] Researcher Charles Kerchner argues that instead of viewing the school systems as a sometimes-crushing municipal burden, cities should aggressively support their public schools.

Schools build cities in two ways. They develop the economy, both indirectly by adding to a location's stock of human capital and directly through programs that enhance neighborhoods. Schools become part of a microeconomic policy. Schools also serve as agents for community development, the creation of cohesion and civic relations among neighbors.[481]


Kerchner theorizes that a public school system could greatly enhance a community’s economic stability in four ways, by:

1.) providing jobs for professional and service staff;

2.) enhancing the human capital of the children in the community through quality education;

3.) encouraging local businesses through targeted contracts for goods and services; and

4.) enhancing property values while concurrently holding down property taxes.[482]

This economic vitality, in turn, can rebuild the social stability of the area. A stronger local economy reduces many of the social pathologies faced by urban areas.

Greater social and economic stability for parents also has a direct positive influence on student achievement, since the social capital of the area is enhanced. An additional benefit is that more people, those with and without children, will move to the area because of its relative economic health and growing social stability. When these new residents become involved in the services that the school district offers, such as concerts, plays, athletic events, computer classes, and the like, the community’s social appeal is further enhanced.[483]

When Lawrence Picus and Jimmy Bryan examined the economic and fiscal influences that the Los Angeles area school systems have on both the local and state economies, they discovered public education in the LA region to be an enormous enterprise:

In Los Angeles County, the school districts provide education for nearly 1.6 million children, spend almost $6.9 billion, and employ some 133,500 people. As a business concern, the Los Angeles County schools would rank 190th on the Fortune 500, larger than such companies as Northrop Grumman (192), Coca-Cola (196), Levi Strauss (198), and even Microsoft (219). On its own, the Los Angeles Unified School District (LAUSD), with its $4.9 billion budget, 55,767 employees, and 650,000 students, would rank 270 on the Fortune 500.[484]


Obviously, large urban school systems have a far greater economic influence on an area’s economy and possible social stability than many businesses—including professional sports franchises.


A New Conceptual Lens


Viewing the public school system as a major community economic resource, as well as a social service, may provide local and state educational policy makers with a new conceptual lens. Public schools will no longer be seen as a never-ending drain on the community, but as a source of the community’s economic and social well being. This vision of the public schools will also enable community members to view the system as critical to the welfare of the local economy and, therefore, a vital social institution. While there is much research to be done in this specific area, these initial explorations do offer intriguing possibilities regarding building stronger links between public schools, their communities and student academic achievement.


What Cannot be Concluded from the Research


This report has presented historical and contemporary overviews of the research pertaining to schools, their communities, and student academic achievement. While much of this research is stimulating and offers school personnel and policy makers various conceptual plans, a basic problem in the research base is that it is almost entirely comprised of single or multi-case studies. This is not surprising given the degree of autonomy that some districts have in developing and implementing community outreach programs. Additionally, since public education is largely a state responsibility, there is great variation and among the 50 states and Washington, DC. Fragmentation is a hallmark of the US educational system. Unfortunately, this makes conducting large scale, experimental studies most difficult. While case studies do provide us with some compelling insights and broad guidelines, they make specific and highly prescriptive recommendations difficult.


Summary and Recommendations


Public schools have been reaching out to parents and communities since their inception. With over 100 years of data, we know what programs can enhance student academic achievement. The challenge is to devise the right mixture of services and programs in organizational situations that are highly idiosyncratic.[485] As Joy Dryfoos notes, there is no model for social service provision.[486] This is congruent with educational historian David Tyack’s observation that there is “no best system” for public education.[487] Yet there is enough information to permit informed decisions regarding what might be feasible to implement.

Despite the limitations of the case-study approach, the documented results of efforts to create deeper ties between schools and the communities in which they operate and which they serve warrant efforts to further enhance school-community relationships. The history of more than 100 years of research and experience in involving schools in community life points to several potential policy initiatives:

·        Basic parental involvement programs should be enhanced to include multiple opportunities for formal and informal communication between school personnel and parents. Open, engaged, mutual, and honest communication should be encouraged. As much as possible, public schools should move towards an open, or “warts and all,” approach to school-community relations.

·        Parental involvement programs should be developed that embrace the ethnic, linguistic, cultural, racial, and religious diversity of the parents.

·             Parental involvement programs should be designed to be sensitive to the special needs of poor parents, single parents, parents with large families, and those families where both parents work outside of the home. This might mean providing transportation and child care for some, while planning meetings around work/home schedules for others.

·             Written materials should be provided in the language with which parents are the most familiar.

·        Schools and other social organizations wishing to provide school-linked services should carefully consider the scope, funding needs, organizational and professional complexities, and types of services to be offered. While perhaps not as compelling or intellectually stimulating, incremental types of school-linked services should be pursued if providers are dedicated to institutionalizing the project.[488]

·        Funding for new community involvement projects should be kept consistent and stable. The bigger and more complex the project, the greater the need for adequate funding.

·        Extra-curricular programs should be kept vital to help foster strong parental involvement.

·        Educational leaders and policy makers should be encouraged to reconceptualize the public school as a vital economic resource that must be nurtured.


What works best with poor urban children?


In addition to the above recommendations, programs particular targeted to assist children and communities living in poverty should take into account the following principles:

·        Programmatic offerings need to be stable, consistent and long-lived. Poor urban children’s lives are marked by chaos. Public schools and the services they provide may be the only stable “thing” in many children’s lives.

·        New services should be carefully expanded, ensuring they become institutionalized over time. For example, it might be advantageous to expand the free lunch program to include all students. Once this has been established and consistently maintained over several years, it might be time to include or expand a school breakfast program.

·        Schools facing a budget shortfall should focus on maintaining extra-curricular activities that are relatively low cost and can serve broader numbers of students. This logic might make the choral program more appealing than the more expensive and litigation-prone football program.

·        Parental education programs should focus on parents’ knowledge and skills in child raising and work to build on this foundation. School personnel and other service providers must be aware of the parents’ own needs and wishes for their children, and design programs so these are addressed.

·        Parental education programs need to be sensitive to the racial, ethnic, cultural and religious backgrounds of participating parents. These programs must also attend to the realities that families in poverty confront. This might include offering transportation to the program and offering on-site child care, and even providing an evening meal for the families attending.

·        City and educational leaders need to view the public school system as a foundation for community revitalization initiatives.


8: Teacher Characteristics


Executive Summary


Research Findings


       Traditional psychometric techniques (using ability, achievement, other paper-and-pencil tests, GPAs, and the like) to predict teaching effectiveness (in terms of student achievement) have failed. Certification status appears to be causally related to improved student achievement: regularly certified teachers produce higher student achievement than non-certified or emergency certified teachers. Teacher experience generally has been shown to be positively related to student achievement when other variables are statistically controlled. Little research has published on the unique characteristics of teachers that make them successful in teaching children in poverty.




·          Paper-and-pencil tests are not useful predictors of teaching candidates’ potential to teach successfully and accordingly should not be used for that purpose.

·          A teaching candidate’s academic record (e.g., GPA) is not a useful predictor of his or her eventual success as a teacher. A candidate’s record of success in pre-service (undergraduate) technical courses (mathematics and science, for example) may contain useful information about that candidate’s success in teaching secondary school mathematics and science.

·          Other things equal, 1) students of regularly licensed teachers achieve at higher levels than students of emergency certified teachers; and 2) more experienced teachers produce higher student achievement than less experienced teachers. Teacher selection policies should reflect these facts.

·          The selection of teaches who will best contribute to their students’ academic achievement should focus on peer and supervisor evaluation of interns, student teachers, substitute teachers and teachers during their probationary period.



8: Teacher Characteristics


By Gene V Glass

Arizona State University




How can one identify in advance of a decision to hire which teachers will most improve their students measured achievement? What are the characteristics of promising teachers that will permit an accurate prediction of their ability to teach children well?

This review deals with those characteristics of teachers that might be identified and used in the initial hiring of teachers to increase their students’ achievement. These characteristics can include qualities of teachers that are viewed as personal – such as mental ability, age, ethnicity, gender and the like – or as “experiential” – such as certification status, educational background, previous teaching experience and the like. Some characteristics are combinations – in unknown amounts – of personal and experiential qualities, e.g., candidates’ performance on teacher-certification tests such as the National Teacher Examinations and state-mandated tests.

This review will not examine characteristics of teachers that it would be impractical to assess in the initial hiring and selection process, such as deep personality traits. The term “teacher characteristics” typically refers to qualities of teachers that can be measured with tests or derived from their academic or professional records. It does not generally refer to the direct observation of their impact on students’ learning in terms of either students’ test performance or teaching behaviors (both of which are addressed elsewhere in the present work). Rather, the approaches dealt with here are those that fall traditionally into the province of personnel psychology or personnel selection.[489] These distinctions are particularly important because of the conclusions at which the present review arrives, namely, that psychometric selection is inappropriate in the initial selection of teachers and should defer to the evaluation of probationary teachers (teachers in the first few years of their employment).


Research on Teacher Characteristics


Micro-Studies and  Macro-Studies


The research literature on teacher characteristics and student achievement encompasses two quite different kinds of study. One type – here referred to as micro-studies – uses individual teachers as the unit of analysis. Correlation coefficients are calculated from data descriptive of individual teachers and their students’ achievement (usually expressed as a class average). Studies of this type yield findings most relevant to the question whether there are characteristics of teachers that predict their ability to improve the achievement of their students.

The second type of study is here called macro-studies. These studies measure characteristics of groups of teachers, such as “percentage of teachers in the school district who hold Masters degrees.” Macro-studies attempt to exercise statistical controls by means of complex multiple regression analyses, often taking account of the multiple levels (states, districts, schools) of organization that tie individual teachers together. Macro-studies often inform policy at high levels but give limited direction to administrators who face individual selection decisions. Frequently, they do not express relationships in a form that permits the calculation of the actual benefits of selecting an elementary school teacher in terms of increased student achievement. Moreover, these macro-studies – useful though they are for addressing state or national level policy questions – seldom achieve the levels of control needed to reach  consensus among their readers. In spite of their contribution, macro-studies of the relationship between teacher characteristics as a school, district, or state “input” and student achievement as an “output” have several limitations: they must rely on imperfectly measured “background characteristics” of students to equate unequal conditions; they can not, without substantial and seldom realized extensions, resolve the ambiguity of the direction of the causal influence (Does a high percentage of Masters degrees raise student achievement, or do districts with able students who learn quickly and easily attract teachers with Masters degrees?); they typically fail to address the ambiguities present in ecological correlation analysis.  (For instance, it is unclear whether the teachers holding the Masters degrees in the school district are the teachers actually responsible for the increased student achievement).  Nevertheless, macro-studies of the relationship between teacher characteristics and student achievement are visible and influential at policy levels and will be reviewed here.


The Micro-Studies


Aptitude and Intelligence


Two major reviews of research[490] on the relationship of teachers’ measured intelligence and their students’ achievement arrived at the same conclusion:  there is no important correlation between the two variables. Various explanations have been advanced for the failure to find a relationship that many expected would exist:  the truncated variability of the intelligence scale for a population of teachers already highly selected for academic aptitude; the unreliability and lack of content validity of measures of student achievement; as well as the essential irrelevance of high levels of measured intelligence for effective teaching, particularly at the elementary school level.


Academic Preparation


Research suggests that there is a modest relationship between teachers’ college course work in the subject area in which they subsequently teach and their students’ achievement.[491]  Monk[492] analyzed data for almost 3,000 high school students from the Longitudinal Study of American Youth.  Students took tests in mathematics and science, and supplied information on their backgrounds.  Their math and science teachers were also questioned.  Monk correlated teacher characteristics with student achievement, taking into account students’ earlier achievement, background characteristics and teacher inputs.  The greater the number of college-level mathematics or science courses (or math or science teaching courses) teachers had taken, the better their students did on the mathematics and science tests.  Goldhaber and Brewer[493] found similar relationships in a secondary analysis of more than 5,000 high school sophomores and their teachers. College-level math courses taken by the teachers was the only variable that accounted for any appreciable variation in students’ achievement.


The National Teacher Examinations (NTE)


The National Teacher Examinations (NTE), developed and administered by the Educational Testing Service of Princeton, New Jersey, are widely used and an  influential model for the state-level paper-and-pencil licensure exams that are currently proliferating throughout the United States. The validity of the NTE was the subject of an extensive review published in 1973 by Quirk, Witten and Weinberg.[494] Subsequent reviews have not substantially added to nor altered their conclusions. Quirk et al. documented the nearly 30-year history of NTE research attempts to correlate NTE scores with such “concurrent validity” measures as high school GPA, undergraduate GPA, graduate GPA, ability tests (GRE-V, GRE-Q), as well as grades in specialized education courses. (Such correlations are referred to as “concurrent validity” coefficients because the two measures correlated are taken at roughly the same stage, in this case, during a prospective teacher’s pre-service career.) Such criteria are only presumptively related to student learning; but even so, the concurrent validity evidence was not impressive. The highest correlations were with paper-and-pencil tests of academic ability and were in the region of  0.60. Paper-and-pencil tests correlate with other paper-and-pencil tests; that much might have been expected. Correlations of NTE scores with GPAs were in the region of  0.30. Most significantly, the two studies that produced correlations of NTE with grades in practice teaching yielded the following results: Shea[495] correlated NTE scores with grades in practice teaching for 110 pre-service teachers who had graduated from Worcester State Teachers College and obtained a r of  –0.01; Walberg[496] correlated performance on the NTE with practice teaching grades for 280 pre-service teachers and found an r of  –0.04. These are sobering findings indeed for those who hope for paper-and-pencil test information that will predict teaching effectiveness.

The usefulness of the NTE for predicting principals’ ratings of various qualities of in-service teachers is similarly wanting. Research over 30 years in a wide variety of settings has shown correlations of NTE test scores and principals’ ratings ranging from -0.15 to 0.50 with an average r of about 0.10.[497] In the face of these discouraging results, researchers have been prone to blame the professionals’ evaluations of their peers and subordinates, suggesting that they are unreliable or biased or distorted by friendships or prejudices or unsophisticated views of quality teaching. The fault, however, may lie more with the inadequacies of paper-and-pencil tests as measures if teachers’ abilities to manage the complex demands of educating groups of children.

Quirk, Witte and Weinberg found only a single study in which NTE scores were correlated with students’ average gain in performance from pretest to posttest, and this study by Lins,[498] published in 1946, produced data on only seven teachers. The correlation of NTE score with pupils’ gain scores was 0.45; unfortunately, one can only assert with reasonable statistical confidence that a much larger sample would produce a validity coefficient somewhere between –0.50 and +0.90.[499]

The State of Massachusetts has instituted one of the most controversial paper-and-pencil teacher licensure tests. Haney and his colleagues found no empirical evidence that the Massachusetts teacher tests could predict student learning.[500]


Certification (Licensure)


A job candidate’s certification status has become a visible consideration in recent decades as a result of a variety of reforms and economic pressures placed on the educational system. Class-size reduction efforts, most notably in California in the mid-1990s, not surprisingly created an acute need for teachers that could not be met by the existing supply of regularly certified personnel. The difficulty of recruiting certified teachers for schools in the deteriorating core of large cities prompted the hiring of college graduates without pre-service training or teaching experience – “Teach for America” being the most visible program of this type.[501] In addition, the market ideology that has influenced both the discussion and the implementation of education policy proposals since the 1980s questioned the need for state-operated systems of teacher certification. Some believe that any educated person, with or without a college degree, can teach.[502] Educators are left with the question, What value is represented by the teacher license? Should certification status be considered in the hiring of new teachers?

Darling-Hammond wrote that “…reviews of research over the past thirty years, summarizing hundreds of studies, have concluded that even with the shortcomings of current teacher education and licensing, fully prepared and certified teaches are … more successful with students than teachers without this preparation.”[503]  Ashton[504] noted that teachers with regular state certification receive higher supervisor ratings and student achievement than teachers who do not meet standards, but this observation was based on data with virtually no statistical controls having been imposed.  In spite of the quantity of research on the benefits of teacher certification for student learning that Darling-Hammond refers to, little of the past research exercised controls over student “inputs” that would give the critical reader confidence in the findings. One recent study addressed the effect of certification status with a series of controls that engendered this missing confidence.

Laczko and Berliner[505] studied the impact of certification status on student achievement in two large urban school districts. These school districts provided information about teachers hired for the 1998-1999 and 1999-2000 school years.   Information included the school where they were currently teaching, the grade level taught, the teacher’s certification status, highest degree earned, date and institution where it was achieved, age, and number of years teaching experience.  Teachers were eliminated from the sample if they taught a grade level or subject that was not assessed (e.g., art and music) by the Stanford Nine (SAT 9) achievement test battery, the measure of achievement used in the study.

Emergency certified teachers were matched with regularly certified teachers in the following manner: matches were first made by grade level; secondarily, matching was based on highest degree attained; whenever possible, matches were made within the same school,  otherwise, matches were made within the same school district; cross-district matching was not allowed.  Matching the two samples produced 23 pairs of teachers for the 1998-1999 school year and 29 pairs of teachers for the 1999-2000 school year.

Stanford Achievement Test-Version 9 scores aggregated at the class level for the 52 matched pairs of teachers were collected.  Correlated t-tests were conducted to analyze the difference in the student achievement scores between emergency certified and standard certified teachers.  The principal findings from the Laczko and Berliner study appear in Table 1.


Table 1


NCE Differences and Effect Sizes (ES)[506]

for 1998-1999 and 1999-2000

(After Laczko & Berliner, 2001)



SAT 9 Sub-Test                             1998-1999                  1999-2000     Mean ES


Reading                                            13.9                              9.2             0.50

Math                                                 -0.2                          11.1                     0.24

Language                                            9.4                           10.7                     0.44


Note. The NCE differences are between certified and emergency teachers. Effect sizes (ES) were calculated with a standard deviation of 23 NCE units.


Using the NCE (Normal Curve Equivalent) scale to express the results, Laczko and Berliner found, for example, that in the 1998-’99 school year, students taught by certified teachers outscored their counterparts taught by uncertified teachers by almost 14 NCE points in Reading. The similar margin in the 1999-2000 school year was greater than 9 points. Expressed as a proportion of the standard deviation of the NCE scale, these differences averaged across the two years yield an effect size of one-half (0.50) standard deviation (equivalent to five months grade-equivalent units). One would expect, based on these findings, then, that the students of certified teachers would make an additional five months academic growth in reading when compared to the students of uncertified teachers across an entire school year. The advantage for students of certified teachers in mathematics and language is one-quarter (0.25) standard deviation (about 2.5 months in grade-equivalents) and four-tenths (0.40) a standard deviation (about four months GE), respectively. These are, perhaps, the most convincing data yet produced by research on the effect of teacher certification on student achievement. (It should be noted that these differences in means expressed in standard deviation units correspond to correlations between certification status and student achievement of roughly 0.25, for effect sizes of 0.50, and 0.15, for effect sizes of 0.30 to 0.25.[507])


Successful  Teachers of Poor Students 


Poor students are disproportionately taught by less experienced teachers who are less likely to be licensed and who leave the profession sooner than teachers of the children of middle-class or wealthy families. Researchers have largely ignored the question of whether there are special characteristics of  teachers who will be successful in teaching poor children.

One of the few quantitative studies of the relationship between teacher characteristics and student achievement for poor children is due to Murnane and Phillips.[508]  Using data collected in a study of a federal welfare reform project in a large Midwestern city, the researchers fit regression equations to account for the variability of vocabulary scores on the Iowa Test of Basic Skills in terms of teacher behaviors and other characteristics. The teachers were predominantly black, female and held Masters degrees. The researchers concluded: “Overall, the results … suggest that variables describing teacher behavior and variables describing teacher characteristics are both important in predicting teacher effectiveness.”[509] Teacher characteristics of race, prestige of the undergraduate college, whether the teacher earned a Masters degree and verbal ability were not significantly related to students’ achievement. However, “years of teaching experience” was related to student achievement. This relationship for Grades 4 and 6 is depicted in Figure 1. The relationship for Grade 1 was weaker, but still positive, and non-existent for Grade 5. No reasonable explanation for the interaction of the relationship with grade level exists, and a prudent conclusion would hold that teacher experience and student achievement are positively related in these circumstances.

Figure 1. Teacher Experience and Student Achievement for Inner-city Children (After Murnane, 1981).

Another one of the very few attempts to address this question was made by Martin Haberman in his book Star Teachers of Children in Poverty.[510] Drawing on years of interviewing hundreds of teachers in poor urban schools, Haberman advanced a view of what makes for success for a teacher of poor children. These successful teachers, which he named “star teachers,” display the following characteristics: star teachers do not punish students, but instead use “logical consequences” to direct students to learn appropriate behaviors; star teachers believe that discipline problems are best handled by making learning interesting, meaningful, and engrossing; star teachers are persistent. Haberman saw these teachers dealing with the organization of the school in a uniquely productive way. They did not attempt to undermine the school’s administration, nor did they ignore the directives of officials; however, they did not use bureaucratic directives as excuses to keep from achieving their objectives in the classroom. Star teachers engaged in what Haberman called “gentle teaching.” Gentle teaching promotes kindness in classroom interactions; it pointedly avoids the discord that can characterize interactions in schools that emphasize compliance with rules instead of learning.

Haberman suggested that there may be ways to predict which teachers will be the star teachers. Candidates for teaching positions should be selected on the basis of criteria other than good grades and high test scores. New teachers, if they are to develop into Haberman’s star teachers, should  not be judgmental;  they should be tolerant and avoid moralistic attitudes; they must be open, understanding, and not easily shocked; and they must be capable of open and authentic communication with their superiors and colleagues.

Haberman has produced one of the few research-based works aimed at understanding the characteristics of teachers that make for success with poor children, and yet, his work has been criticized as methodologically weak.[511] No demographic description of the group of teachers interviewed is given; no explanation of the criteria by which the star teachers were recognized as successful is offered. Haberman may well be right, but the path traveled to reach his understandings is hidden from view.


The Macro-Studies


Large-scale studies that use school districts or states as the unit of analysis and attempt with multiple regression analysis to control for pre-existing differences among these units have addressed many of the same concerns analyzed in the micro-studies. The first large study of this type was Coleman’s Equality of Educational Opportunity.[512]  Coleman et al.  measured seven characteristics of teachers: years of experience, highest degree attained, vocabulary test performance, ethnic group, parents’ educational attainment, whether the teachers grew up where they were teaching, and the teacher’s attitude toward teaching middle-class students.  These teacher characteristics  accounted for less than 1% of the variation in student achievement – meaning that a correlation of teacher characteristics with student achievement, holding other factors constant, would be less than +0.10.  Coleman et al.,  as well as Bowles and Levin,[513] felt that they detected slight relationships between teachers’ verbal intelligence and student achievement. Summers and Wolfe[514] indicated that this relationship, though quite weak in statistical terms, was more important in some areas of the curriculum than in others.  Hanushek[515] joined these early researchers in finding no strong relationship between teacher characteristics and student achievement.

A pair of meta-analyses of macro-level studies arrived at differing conclusions on the question whether teachers’ measured ability influences student achievement. Greenwald, Hedges and Lane[516] reviewed a number of studies of the relationship between school inputs and student outcomes and concluded that teacher ability, teacher education, and teacher experience appeared to be related to student achievement. Hanushek’s[517] synthesis of research studies arrived at a contrary conclusion regarding the relationship between teacher characteristics and student achievement.  Less than a year later, Hanushek[518] published an “update” of his 1996 article in which he reported the following summary of studies that investigated the relationship (in terms of regression coefficients) between student achievement and their teacher’s “years of experience.”



Table 2


Direction and Statistical Significance of

Regression Coefficients for Student Achievement

Related to Teacher Experience

(After Hanushek, 1997)


                         Stat. Signif.         Non. Signif.

Ind. Var.               # of studies            +                                                  +            ? 


Teacher Exper.        207                    29%    5%                                     30%          24%   12%



Although a statistically significant regression coefficient for “teacher experience” was six times more likely to be positive than negative, Hanushek nonetheless read the results of Table 2 as negative for the effects of teacher experience on achievement. He wrote of the results: “A higher [than class size or teacher education] proportion of estimated effects of teacher experience are positive and statistically significant: 29%. Importantly, however, 71% still indicate worsening performance with experience or less confidence in any positive effect.”[519] The logic of this conclusion is illusive. Of results that reach statistical significance, 85% (60/70) are positive, indicating that students of more experienced teachers achieve at higher levels. Of the statistically non-significant results that can be determined, 55% are positive, but fail to reach conventional levels of significance. Hanushek creates an impression of no effect of teacher experience by lumping together the category “indicative of worsening performance or less confidence of beneficial performance” all significant but negative coefficients (5%), all non-significant coefficients whether positive or negative (30% + 24%) and, remarkably, the 12% of the coefficients that were so incompletely reported that it could not be determined whether they were positive or negative.  The treatment of these data is hardly even-handed. By such logic, ten “positive studies,” “no negative studies” and 100 studies so poorly reported that the results could not be discerned would lead to a conclusion of no confidence in a positive result. This author’s reading of Table 2 is much different from Hanushek’s. The data therein can be reasonably interpreted as evidence that regression studies have generally shown a positive relationship between teacher experience and student achievement.

Fetler[520] investigated the relationship between measures of mathematics teacher skill and student achievement in California high schools. Test scores are analyzed in relation to teacher experience and education and student demographics. The results are consistent with the hypothesis that there is a shortage of qualified mathematics teachers in California and that this shortage is associated with low student scores in mathematics. After controlling for poverty, teacher experience and preparation significantly predict test scores.

Darling-Hammond[521] utilized data from a survey of all 50 states’ policies, the 1993-’94 Schools and Staffing Surveys of the U.S. Department of Education, and the National Assessment of Educational Progress to study the relationship between teacher qualifications and student achievement. The findings suggested that policy investments in the quality of teachers may be related to improvements in student performance. Measures of teacher preparation and certification were the strongest correlates of student achievement in reading and mathematics, both before and after controlling for student poverty and language status (limited English fluency v. full English fluency). “The most consistent highly significant predictor of student achievement in reading and mathematics in each year tested is the proportion of well-qualified teachers in a state: those with full certification and a major in the field they teach (r between 0.61 and 0.80, p<0.001). The strongest, consistently negative predictors of student achievement, also significant in almost all cases, are the proportions of new teachers who are uncertified (r between -0.40 and -0.63, p<0.05) and the proportions of teachers who hold less than a minor in the field they teach (r between -0.33 and -0.56, p<0.05).” (It must be noted that these correlation coefficients, in the area of 0.50 and above, are calculated on state-level aggregated data and are much higher than would be obtained if similar variables were correlated at the level of individual teachers.) Darling-Hammond’s analyses suggest that state policies regarding teacher education, licensing, hiring, and professional development may make an important difference in the qualifications and capacities of teachers, and, as a consequence, in the achievement of their students.


Implications for Personnel Selection


Correlations and Base Rates


It is common in research on the relationship of teacher characteristics and student achievement to express the relationship in terms of correlation coefficients. Such coefficients have distinct disadvantages in communicating the benefits of selecting teachers on the basis of their entry characteristics (such as college GPA, NTE scores, scores on teacher certification exams, Teacher Perceiver profiles and other similar measures of potential). Correlations of beginning teacher characteristics and their students’ eventual achievement are typically in the range of 0.15 to 0.35, as was seen in the research reviewed above. The lay reader is frequently misled into thinking that such relationships possess a practical benefit when the finding is referred to as “statistically significant.” This may not and – in the present application of psychometrics – probably is not the case. “Statistical significance” is a quality of statistical findings that refers only to their reliability or “inferential stability,” that is, the likelihood that a particular finding has not arisen by chance sampling from a population in which the two variables correlated are completely unrelated. Statistical significance results from taking large samples, and generally means nothing more than that the statistical finding was based on a large sample. The finding itself could be of no practical value and still be “statistically significant.” Persons’ heights and their IQs might correlate 0.02 in a sample of 100,000 persons and be deemed “statistically significant”; but that finding will be of no value whatsoever.[522]

 The benefits, if there are any, of selecting teachers on the basis of such weak correlational evidence – validity coefficients in the range of 0.35 and below – are not clearly seen in correlation coefficients. The meaning of these relationships is more clearly seen in statistics such as “hit rates” or measures of “false positives” and “false negatives” – for  example, the differences in percentages of teachers who will not survive their probationary evaluation between those who score high on some characteristic, such as college GPA, and those who score low on that characteristic. 

Consider what will prove to be a typical situation: the district’s assistant superintendent for personnel has available the college GPA of all applicants for openings in elementary education.  There are twice as many applicants as there are openings, so she selects the top half of the applicants on the basis of their GPA. Suppose further that the correlation between teaching candidates’ GPA and their students’ learning is 0.35 – a not unreasonable assumption, surely not an underestimate. Furthermore, suppose that 5% of the probationary teachers in this district are not rehired after two years and that the rehire decision is based solely on their ability to engender student learning.[523]


Table 3

Hypothetical Relationship Between

Selection Criterion and Success Criterion




Not re-hired
















The above table shows counts of teaching candidates selected or rejected on the basis of their college GPAs and the result of the decision to continue employment after their probationary period. The data in Table 2 correspond to a correlation of GPA and “teaching success” of approximately 0.35 with a selection rate of 50% and a success rate of 5%. Meehl and Rosen[524] pointed out nearly 50 years ago that the utility of a correlation in predicting an event (like success in teaching as evidenced by continuing employment) depends on:  a) the size of the correlation, b) the costs of errors in prediction (of rejecting a person who would succeed or accepting a person who will eventually fail), and c) the “base rate” of the event being predicted. (Also see Wainer’s application of these concepts to the Massachusetts Teacher Tests).[525] The major implication of Meehl and Rosen’s argument is this: if the event being predicted has a very low incidence of occurring (a “low base rate”), then very large correlations of predictors with the criterion are needed or else one makes fewer errors by using no predictor whatsoever.

One can see this phenomenon at work in the above table. If teaching candidates are selected because they have high (top half) GPAs, 10 out of 500 candidates will not be re-hired, and 460 out of 500 who would have succeeded if they had been hired will never get a chance to show that they could have succeeded. Applicants with a high GPA (and who are selected) have a 2% probability of  “failing” (i.e., not being rehired). But applicants with a low GPA (who would not have been selected) have only an 8% probability of failing (i.e., not surviving the probationary period). The use of the GPA in selecting new teachers represents a gain in detecting “success” of from only 2% to 8%, but this gain comes at the cost of rejecting 92% of new hires would eventually would prove to be successful. In most people’s system of values, rejecting 92% of potentially successful applicants in order to achieve a 98% success ratio in prediction is unfair to a large number of applicants. Psychometricians say that in these circumstances the cost of “false negatives” is too high.

Furthermore, when an administrator can control the overall rate of “success” (say, for example, when 95% of teachers receive “merit pay” bonuses and the discretion exists to raise that rate to 100%), it is frequently the case that even a good predictor of that 95% will create more erroneous decisions than declaring all 100% of teachers successful, hence using no selection criterion at all. Validity coefficients are not sufficient for evaluating the practical utility of a test or other selection technique:  “... when the base rates of the criterion classification deviate greatly from a 50 percent split, use of a test sign having slight or moderate validity will result in an increase of erroneous clinical decisions.”[526]


Between and Within District Variation


A second problem exists in translating the research on teacher characteristics into the real world of personnel decisions. In research studies, an effort is made to sample a full range of subjects (persons) along the continuum of the characteristics being correlated with student learning. But in the real world of schools, teacher applicants and students are clustered into schools and districts that represent selected portions of these continua. It may often be the case that a teacher characteristic that has shown modest correlations with student achievement in research studies will have no relationship with achievement within the particular school district attempting to select the best teachers for its students. This possibility – which is a highly likely circumstance – is illustrated in Figure 2.


Figure 2. Illustration of Between and Within School District

Relationships of a Teacher Characteristic and Student Learning


Figure 2 illustrates a hypothetical situation in which 12 teachers are measured in each of four school districts on a characteristic (such as college GPA, for example) and on their contribution to their students’ learning. It should be noted that the degree of relationship between a teacher characteristic and student learning depicted in Figure 2 is far greater than anything ever demonstrated in an actual research study, but this exaggeration will strengthen rather than vitiate the point being illustrated.  Within each school district there is zero correlation between the measured teacher characteristic and the students’ learning; however among the four districts, the teacher characteristic and student learning are highly correlated, perhaps as high as a coefficient of 0.80. The import of this situation is significant, however. What this arrangement of variation between and within districts implies is that the teacher characteristic is of no use whatsoever for selecting teachers within any one school district. And since it is within particular school districts that administrators live and work, knowledge of the teacher characteristic is of no value to them in selecting teachers who will enhance their students’ learning.

This point may appear to be simply argumentative and counter-intuitive. The implication of this observation is real, however, and not simply some statistical sleight of hand. It dampens enthusiasm for the meager correlations that have been found; and coupled with the earlier observation on the relationship between correlation coefficients and hit ratios, it underlies the ultimate recommendation made here on the matter of initial teacher selection.

Finally, one more point must be raised that will further temper one’s expectations of finding here clear statistical evidence for selecting teachers who can promote student learning. A proper predictive validity study would involve randomly assigning students to groups (or some careful matching of students across groups to ensure their initial equivalence), then randomly assigning groups to teachers, measuring teacher characteristics, allowing instruction to proceed for some substantial period, measuring student learning, and then correlating the groups’ learning gains with the teacher characteristics for many teachers. It would be crucial to measure student learning by means of their gains in performance from before to after instruction. Simply to correlate teacher characteristics with students’ achievement, as has been done repeatedly in the research literature, would not accomplish the purpose of relating teacher characteristics to student learning. Because of the many factors that influence which teachers are employed in which schools in the world outside the research laboratory – teachers with higher GPAs, and measured aptitude, perhaps, are employed in schools whose students enjoy many advantages over schools that face the challenges of poverty and discrimination – the correlation of teacher characteristics with (uncorrected) student achievement test scores measures little more than the often remarked upon sorting of more able teachers into privileged schools.  Nothing like this research has ever been published, in part because of the obvious expense, the impracticality of arbitrarily constituting actual school classes of students and randomly assigning them to teachers, and, perhaps, because of researchers sense that the payoff in terms of useful predictive information would be meager. (The “micro-teaching” studies of the 1960s and early 1970s at Stanford University approximate this ideal design in terms of controls, but the focus there was on teacher behaviors that promote student learning.) A thorough literature review in the preparation of the current work revealed a single study that even approached the conditions stated above for a proper study, and that study[527] was published  more than 50 years ago.


Summary and Recommendations


The early promise of psychometric techniques for the initial selection of teachers seems to have all but disappeared from the agenda of researchers; it may never have held a prominent place in the actual practice of educators.[528] Though rare exceptions can be found (e.g., the Montgomery County, Va., schools in the 1980s, as described by Wise et al.[529]), actual selection of teachers in America’s schools is today based on interviews and personal interactions that reveal evidence of the candidate’s appearance, enthusiasm, personal style and similar attributes. Measurement of ability, past achievements, or the candidate’s ability to produce learning gains for students plays virtually no role in the selection of new teachers. This is not to say that the current practice is to be disapproved of. Current practice in teacher selection probably reflects an understanding that the cohesiveness of a school’s staff is more critical to the success of the school and its students than is the level of teachers’ performance on paper-and-pencil tests of dubious validity.

The customary procedure for selecting new teachers is based more often on first-hand experience with the candidate’s teaching than it is on psychometric evidence in the form of test scores, GPAs or other evidence of personal characteristics believed to be predictive of successful teaching.[530] Schools often choose their new teachers from among interns and student teachers for whom the teaching staff has direct knowledge of their teaching abilities. Alternatively, substitute teachers are observed and evaluated as potential candidates. The arguments marshaled here against psychometric selection of new teachers, because of low correlations of teacher characteristics with student learning and very low base rates of releasing probationary teachers, have already worked their way into the existing system of evaluating candidates for new hires. The need is not for better instruments to measure initial teachers’ aptitudes and dispositions, but for better methods of evaluating more directly the ability of probationary teachers to foster learning in their students.

The measurement of the direct contribution that a teacher makes to the learning of his or her students is an enormously difficult technical problem that, in the opinion of the author, has no adequate solution that can be applied with confidence under real world conditions. The attempt to base teachers’ rewards (salary increases, for example) on measured student progress is even more problematic,[531] as is noted elsewhere in this report.

The claim that psychometric measures of teacher characteristics are not useful for initial teacher selection implies that candidates be selected by other means – staff interviews, recommendations by peers or past supervisors, and the like. Some might think that this approach is an abrogation of responsibility; but instead, it is a realization of the limits of psychometric approaches to personnel selection. The true abrogation of responsibility is when professional educators – whether they are tenured teachers, administrators or professors engaged in pre-service education of teachers – fail to conduct adequate evaluations of pre-service and in-service teachers who are practicing their profession under the supervision of their superiors.

These findings, then, yield the following recommendations:

·        Paper-and-pencil tests are not useful predictors of teaching candidates’ potential to teach successfully and should not be used as such.

·        Teaching candidates’ academic record (e.g., GPA) is not a useful predictor of their eventual success as teachers. A candidate’s record of success in pre-service (undergraduate) technical courses (mathematics and science, for example) may contain useful information about that candidate’s success in teaching secondary school mathematics and science.

·        Other things equal, 1) students of regularly licensed teachers achieve at higher levels than students of emergency certified teachers; and 2) more experienced teachers produce higher student achievement than less experienced teachers. Teacher selection policies should reflect these facts.

·        The selection of teaches who will best contribute to their students’ academic achievement should focus on peer and supervisor evaluation of interns, student teachers, substitute teachers and teachers during their probationary period.


9: Converging Findings on Classroom Instruction


Executive Summary


Summary of Research Findings


The past 30 years have seen major advances in research on cognitive processing; in studies of teachers whose classes made the highest achievement gains compared to other classes; and in research on helping students learn and apply cognitive strategies in their learning. The research on cognitive processing underlies a major goal of education: helping students develop well-organized knowledge structures. A number of strategies have been found that consistently help students effectively acquire strong knowledge structures.




·          Present new material in small steps to that the working memory does not become overloaded.

·          Help students develop an organization for the new material.

·          Guide student practice by supporting students during initial practice, and providing for extensive student processing.

·          When teaching higher-level tasks, support students by providing them with cognitive strategies. 

·          Help students learn to use the cognitive strategies by providing them with procedural prompts and modeling the use of these procedural prompts.

·          Provide for extensive student practice.



9: Converging Findings on Classroom Instruction


By Barak Rosenshine

University of Illinois at Urbana


The past 30 years have seen three major advances in research on instruction and teacher behavior. These advancements are:

1.) research on cognitive processing,

2.) studies of teachers whose classes made the highest achievement gain compared to other classes, and

3.) research on helping students learn and apply cognitive strategies in their learning.

This report examines the impact that teacher behavior can have on the achievement of students, particularly of students living in poverty.


Classroom Instruction ResearchCognitive Processing:

The Importance of Well-Connected Knowledge Structures


A major area of research, one with important implications for teaching, has been the research on cognitive processing, research on how information is stored and retrieved. It is currently thought that the information in our long-term memory is stored in interconnected networks called knowledge structures.  The size of these knowledge structures, the number of connections between pieces of knowledge, the strength of the connections, and the organization and richness of the relationships are all important for processing information and solving problems.

There is no underestimating the importance of background knowledge.  Simon and Hayes wrote that “there is no substitute for having the prerequisite knowledge if one is to solve a problem.”[532] In discussing how expertise is acquired, Chase and Chi wrote:

The most obvious answer is practice, thousands of hours of practice. For the most part, practice is by far the best predictor of performance.  Practice can produce two kinds of knowledge ... a storage of patterns and a set of strategies or procedures that can act on the patterns.[533]


It is easier to learn new information and easier to solve new problems when one has 1.) a rich, well-connected knowledge structure and 2.) stronger ties between the connections, When the knowledge structure on a particular topic is large and well-connected, new information is more readily acquired and prior knowledge is more readily available for use.  When information is “meaningful” to students, they have more points in their knowledge structures to which they can attach new information.  Education is a process of developing, enlarging, expanding, and refining our students’ knowledge structures.[534]

Helping students to organize information into well-connected patterns has another advantage.  When a pattern is unified, it only occupies a few bits in the working memory.  Thus, having larger and better-connected patterns frees space in our working memory.  This available space can be used for reflecting on new information and for problem solving.  For example, when U.S. history is organized into well-connected patterns, these patterns occupy less space in the working memory and the learner has additional space in the working memory to use to consider, assimilate, and manipulate new information.  A major difference between an expert and a novice is that the expert’s knowledge structure has a larger number of knowledge items, the expert has more connections between the items, the links between the connections are stronger, and the structure is better organized.  A novice, on the other hand, is unable to see these patterns, and often ignores them.  This development of well-connected patterns and the concomitant freeing of space in the working memory is one of the hallmarks of an expert in a field.[535]

To summarize, well-connected and elaborate knowledge structures are important because they allow for easier retrieval of old material; they permit more information to be carried in a single chunk, and they facilitate the understanding and integration of new information.


Helping Students Develop Background Knowledge


What can be done to help students develop well-connected bodies of knowledge?  One important instructional procedure is providing for extensive reading, review, practice, and discussion.  These activities serve to help students increase the number of pieces of information that are the long-term memory, organize those pieces, and increase the strength and number of these interconnections.  The more one rehearses and reviews information, the stronger these interconnections become.  Thus, the research on cognitive processing supports the need for a teacher to assist students by providing for extensive reading of a variety of materials, frequent review, testing, and discussion and application activities.


Providing for Student Processing


New material is stored in the long-term memory when one processes it.  The quality of storage can depend on the “level of processing.”  For example, the quality of storage is stronger when we read a passage and focus on its meaning than it would be if we read to find a single word answer.  Similarly, the quality of storage would be stronger if one summarized or compared the material in the passage, rehearsed, reviewed, and drew connections.  The connections would be weaker if one hurriedly skimmed the material.

Thus, the research on cognitive processing supports the importance of a teacher initiating activities that require students to process and apply new information.  Such processing strengthens the knowledge network that the student is developing.  Classroom discussion and projects that require students to organize information, summarize information, or compare new material with prior material are all activities that should help students develop and strengthen their cognitive structures.  In addition, Palincsar and Brown wrote:

Understanding is more likely to occur when a child is required to explain, elaborate, or defend his position to others; the burden of explanation is often the push needed to make him or her evaluate, integrate, and elaborate knowledge in new ways.[536]


Other examples of such processing activities include asking students to do any of the following: read a variety of materials; explain the new material to someone else; compare material from different sources; justify their conclusions; write papers and engage in inquiry; or write daily summaries.


Helping Students Organize Knowledge


Information is organized into knowledge structures.  Without these structures, new knowledge tends to be fragmented and not readily available for recall and use.  However, students frequently lack these knowledge structures when they are learning new material. Without direction, there is the danger that students will develop a fragmented, incomplete, or erroneous knowledge structure. 

Graphic organizers.

One way of helping students expand their knowledge structures in content areas and also allowing for a check on misconceptions  is to teach students to use graphic organizers and develop concept maps.  These structures allow a student to show connections between concepts. An outline is an example of such an organizer; concept maps are another example. These structures help students organize the elements of the new learning and such organization can serve to facilitate retrieval.  In addition, having such organizers can enable the student to devote more working memory to the content.

Another approach is to teach students how to develop their own graphic organizers for new material.  Providing students with a variety of structures that they can use to construct their own graphic organizers facilitates this process.  When teaching students to develop a graphic organizer, it is useful for the teacher to model the process and also provide models of thinking and thinking aloud while constructing the maps.

When students are encouraged to construct ideas and develop conclusions, there is also a danger they will develop misconceptions. Research shows we sometimes develop misconceptions in an effort to make sense of our environment.[537] (A notorious example is the belief that the sun is closer to the Earth during summer.) Allowing students to work independently before they are ready increases the danger that they may develop misconceptions. Therefore, teachers need to supervise students when they are working independently and to check their understanding before they begin independent work.

In summary, the research on cognitive processing has identified the importance of developing well-connected knowledge structures. Encouraging extensive reading and practice might develop such structures, student processing of new information, and helping students organize their new knowledge.


Research on Teacher Effects


A second important body of research is the teacher effects studies.  This line of research, which took place in the 1960’s and 1970’s, used extensive classroom observation in an attempt to identify those teacher behaviors that were most related to student achievement gain.

Design of the studies.  There were three parts to the design of these studies.  The first part consisted of systematic observation of the instructional behaviors of teachers and students.  Observers sat in a number of existing classrooms, usually 20 to 30 classrooms, and observed and recorded the with which those teachers used a variety of instructional behaviors such as the cause, frequency, and type of praise, the cause frequency and type of criticism, the number and type of questions that were asked, the quality of the student answers, and the responses of a teacher to a student's answers. Many investigators also recorded how much time was spent in activities such as review, presentation, guided practice, and supervising seatwork.  Others recorded how the teachers prepared students for seatwork and homework, and the attention-level during teacher-led discussion and during seatwork.

At the end of the observation period or at the end of the semester, each class took a posttest in the subject that was observed, usually reading or mathematics.   These class posttest scores were then statistically adjusted, using a variety of regression techniques, for initial or pretest scores of these students.  That is, the pretest was used as the independent variable in the analysis, and was used to statistically adjust the posttest scores, the dependent variables, for initial standing. In the final step, each of the observed teacher and student behaviors, in each classroom were then, correlated with the adjusted posttest scores.  

       In effect, these are studies of master teachers.  That is, based on the test scores, the investigators were able to identify those teachers whose classrooms made the greatest adjusted achievement gain during the semester, and those teachers whose classrooms made the least adjusted gain during the semester.  The investigators were able to take the results of their systematic observation and use this to identify the instructional procedures that the master teachers used and compare these instructional procedures with those procedures used by the less-effective teachers.   The significant results are described later in this section.

Usually 20-30 classrooms were in each study, although the study by Stallings and Kascovitz[538] involved 108 first grade and 58 third grade classrooms, and studies by Robert Soar and Ruth Soar involved 55 middle grade classrooms,[539] 59 fifth grade classrooms,[540] and 289 Follow Through and comparison classrooms.[541]

Although a number of studies of this type were conducted as early as 1948 by Barr,[542] the two most famous studies that initiated the teacher-effects research were those by Flanders[543] and by Medley and Mitzel.[544] The best known of the later studies were those by Stallings and Kascovitz[545] in Follow Through classrooms,  Good and Grouws [546] in fourth-grade mathematics, and Brophy and Evertson[547] in first grade reading.

These correlational studies were frequently followed by experimental studies in which one group of teachers – the experimental group – was taught to use the findings of the correlational studies in their teaching and another group of similar teachers continued to teach in their usual manner.  By and large, these studies were successful in that the teachers in the experimental groups used more of the new behaviors and the posttest scores of their classrooms – adjusted by regression for their initial scores – were significantly higher than scores in classrooms taught by the control teachers.

Rosenshine summarized the earliest studies in 1971.[548] The correlational studies and the experimental studies in this tradition are described in detail by Brophy and Good,[549] and the experimental studies are also described by Gage and Needles.[550]

Validity.  One argument for the validity of these findings is that the correlational results were replicated in subsequent correlational studies. These studies represented cumulative research.  Second, the correlational results were also replicated in a number of experimental studies.  Finally, the instructional findings that emerged from this research also appear in an independent line of research, that of cognitive strategy instruction, a topic which will be covered later.

Rosenshine and Stevens concluded that those teachers whose classrooms made the greatest gains in reading or mathematics usually used the following procedures:[551] 

·        Begin a lesson with a short review of previous learning.

·        Begin a lesson with a short statement of goals.

·        Present new material in small steps, providing for student practice after each step.

·        Give clear and detailed instructions and explanations.

·        Provide a high level of active practice for all students.

·        Ask a large number of questions, check for student understanding, and obtain responses from all students.

·        Guide students during initial practice.

·        Provide systematic feedback and corrections.

·        Provide explicit instruction and practice for individual exercises and, where necessary, monitor students during their individual work.


Rosenshine and Stevens further grouped these instructional procedures under six teaching “functions” as shown in Table 1.[552] 


Table 1

Functions for Teaching Well-Structured Tasks

1. Review

a)      Review homework

b)      Review relevant previous learning

c)      Review prerequisite skills and knowledge for the lesson


2. Presentation

a)      State lesson goals or provide outline

b)      Present new material in small steps

c)      Model procedures

d)      Provide positive and negative examples

e)      Use clear language

f)      Check for student understanding

g)      Avoid digressions


3. Guided Practice

a)      Spend more time on guided practice

b)      High frequency of questions

c)      All students respond and receive feedback

d)      High success rate

e)      Continue practice until students are fluent


4. Corrections and Feedback

a)      Provide process feedback when answers are correct but hesitant

b)      Provide sustaining feedback, clues, or re-teaching when answers are incorrect

c)      Re-teach material when necessary


5. Independent practice

a)      Students receive overview and/or help during initial steps

b)      Practice continues until students are automatic (where relevant)

c)      Teacher provides active supervision (where possible)

d)      Routines are used to provide help for slower students


6. Weekly and monthly reviews




Small Steps, Practice, and Success


Four strategies that are particularly relevant to teaching are:

1.)    teaching in “small steps,”

2.)    guiding student practice,

3.)    ensuring a high student success rate, and

4.)    providing extensive practice. 


Present New Material in Small Steps


When the most effective teachers in these studies taught new material, they taught it in “small steps.”   That is, they only presented small parts of new material at a single time, and then guided students in practicing this  material. In contrast, the least effective teachers in these studies would present an entire lesson, and then pass out worksheets and tell students to work the problems.

The importance of teaching in small steps fits well with the findings from cognitive psychology on the limitations of our working memory.  Our working memory, the place where we process information, is small.  It can only handle a few bits of information at once – too much information swamps our working memory. The procedure of first teaching in small steps and then guiding student practice represents an appropriate way of dealing with the limitation of our working memory.


Guide Student Practice


A second major finding from the teacher effects literature was the importance of guided practice.[553]

As noted, the most effective teachers presented only small amounts of material at a time. After this short presenting,  these teachers then guided student practice.  This guidance often consisted of the teacher working a few problems at the board and discussing the steps out loud.  This instruction served as a model for the students.  This guidance also included asking students to come to the board, work problems, and discuss their procedures.  Through this process the students at their seats would see additional models.

In contrast, the least effective teachers would present an entire lesson, and then pass out worksheets and tell the students to work the problems.  When this happened, it was observed that many students were confused and made errors on the worksheets and the teachers would be seen going from student to student and explaining the material.  In this case, the amount of material that was presented was too large, and swamped the working memory.

The process of guiding practice also includes checking the answers of the entire class in order to see whether some students need additional instruction.  Guided practice has also included asking students to work together, in pairs or in groups, to quiz and explain the material to each other.  Guided practice may occur when a teacher questions and helps a class with their work before assigning independent practice.

Guiding practice also fits the cognitive processing findings on the need to provide for student processing.  Guided practice is the place where the students – working alone, with other students, or with the teacher – engage in the cognitive processing activities of organizing, reviewing, rehearsing,  summarizing, comparing, and contrasting.  However, it is important that all students engage in these activities.  The least effective teachers often asked a question, called on one student to answer, and then assumed that everyone had learned this point.  In contrast, the most effective teachers attempted to check the understanding of all students and to provide for processing by all students.

Another reason for the importance of guided practice comes from the fact that we construct and reconstruct knowledge.  We cannot simply repeat what we hear word for word.  Rather, we connect our understanding of the new information to our existing concepts or “schema,” and we then construct a mental summary: “the gist” of what we have heard.  However, when left on their own, many students make errors in the process of constructing this mental summary.  These errors occur, particularly, when the information is new and the student does not have adequate or well-formed background knowledge.  These constructions are not errors so much as attempts by the students to be logical in an area where their background knowledge is weak.  These errors are so common that there is a literature on the development and correction of student misconceptions in science.  Providing guided practice after teaching small amounts of new material, and checking for student understanding, can help limit the development of misconceptions.


Provide for Extensive Practice


The most effective teachers also provided for extensive and successful practice. As noted in the cognitive processing research, students need extensive practice in order to develop well-connected networks.  The most effective teachers made sure that such practice took place after there has been sufficient guided practice, so that students were not practicing errors and misconceptions.


Provide For a High Success Rate


In two of the major teacher-effects studies the investigators found that students in classrooms of the more effective teachers had a higher success rate as judged by the quality of their oral responses and their individual work.  The need for a high success rate follows from the previous research on the need to provide extensive and successful practice. 

Yet, teachers often struggle to obtain a high success rate, particularly when they are teaching whole-class to heterogeneous students. One solution is the above-mentioned “teaching in small steps.”  Another solution  is for students to meet in heterogeneous groups during the independent practice and work problems together.  In these settings, students who have learned the material re-explain the material to the other students. 

Other schools have dealt with this problem by regrouping students, by achievement, across classrooms, for reading and for mathematics.  In such settings, it is easier for the teachers to explain, supervise, and re-teach to the entire class because all the students in this setting are at similar levels.

The need for a high success rate, and the need for students to master one step before they proceed to the next step is the major idea behind Mastery Learning. In Mastery Learning there is explicit provision for bringing all students to mastery on one section of the material before they proceed to the next section.


Teaching Cognitive Strategies


The third, major instructional advance has been the development and teaching of cognitive strategies.  Cognitive strategies are guides that support learners as they develop new internal procedures, procedures that enable them to perform higher-level operations in areas such as reading comprehension and scientific problem solving.

Until the late 1970’s, students were seldom provided with any help in reading comprehension.  Durkin[554] observed 4,469 minutes of reading instruction in fourth-grade classrooms and noted that only 20 minutes of this total was spent in comprehension instruction.  Durkin found that teachers spent almost all of the instructional time asking questions, but spent little time teaching students comprehension strategies they could use to answer the questions.  Duffy and Roehler[555] noted a similar lack of comprehension instruction in elementary classrooms:

There is little evidence of instruction of any kind. Teachers spend most of their time assigning activities, monitoring to be sure the pupils are on task, directing recitation sessions to assess how well children are doing and providing corrective feedback in response to pupil errors.  Seldom does one observe teaching in which a teacher presents a skill, a strategy, or a process to pupils, shows them how to do it, provides assistance as they initiate attempts to perform the task and assures that they can be successful.[556]


As a result of these astonishing findings, and as a result of emerging research on cognition and information processing, investigators began to develop and validate cognitive strategies that could help students.  For example, one approach that has been used successfully to help students improve their reading comprehension has been to teach students to ask themselves questions about their reading.  In these studies students would read passages and use prompts such as “who” and “why” to ask questions about the passage.  And, as a result of this practice, comprehension improved when the students were tested on new passages. 

What happened?  Asking oneself a question, obviously, does not lead directly to improved comprehension on new passages. Rather, it is believed that the process of asking questions changed the way students read – it led them to search the text and combine information – and it was this change in processing that led to improved comprehension on new passages.

Throughout the 1980’s, investigators began to develop and teach students specific cognitive strategies such as question-generation and summarization that could be applied to reading comprehension.[557] Cognitive strategy procedures have also been developed and taught in mathematics problem solving,[558] physics problem solving,[559] and in writing.[560] These intervention studies, in reading, writing, mathematics, and science, together with a description of the cognitive strategies and the instructional procedures were used, has been assembled in an excellent volume by Pressley et al.[561]

The concept of cognitive strategies provides a general approach that can be applied to the teaching of higher-order tasks in the content areas.  The profession has made much progress.  In place of Durkin’s observation that there was little evidence of cognitive strategy instruction in reading, there are now studies that have succeeded in providing instruction in cognitive strategies in a number of content areas.


Instructional Elements in Teaching of Cognitive Strategies


The process of teaching students cognitive strategies is distinctive in that the investigators used a variety of supports, or scaffolds, to teach students to use the strategies.  Many of these instructional elements had not appeared in the teacher-effects literature.  These elements – which are described in this section – can now be used by teachers, profitability, to help students not only in the learning of cognitive strategies, but also in variety of other learning situations.



Cognitive strategies are taught by providing students with cognitive supports or scaffolds.[562] A scaffold is a temporary support that is used to assist a learner during initial learning.  Scaffolds operate to reduce the complexities of the problems and break them down into manageable chunks that the child has a real chance of solving.[563] Scaffolds help students bridge the gap between their current abilities and the goal. The scaffolds are gradually withdrawn as learners become more independent, although some students may continue to rely on scaffolds when they encounter particularly difficult problems. Scaffolds include simplified problems, modeling of the procedures by the teacher, thinking aloud by the teacher as he or she solves the problem, prompts, suggestions, and guidance as students work problems. Scaffolds may also be tools, such as cue cards or checklists, or a model of the completed task against which students can compare their work.[564]

Collins, Brown, and Neuman originated the term Cognitive Apprenticeship to refer to the entire process of teaching cognitive strategies and providing scaffolds to aid students. [565] Students are learning strategies during this apprenticeship that will enable them to become competent readers, writers, and problem solvers.  They are aided by a Master who models, coaches, provides supports, and withdraws the supports and scaffolds as the students become independent. 

A number of these supports and scaffolds, drawn from the research, are presented here.

1. Provide Procedural Prompts That Can Guide Student Processing.

In these studies, the first step in teaching a cognitive strategy was the development of a procedural prompt.[566] Procedural prompts are concrete aids that supply the students with specific procedures or suggestions that facilitate the completion of the task.  Learners can temporarily rely on these hints and suggestions until they create their own internal structures.[567]

As noted, the words “who,” “what” “why” “where” “when” and “how” are procedural prompts that help students learn the cognitive strategy of asking questions about the material they have read.[568] These prompts are concrete references on which students can rely for support as they learn to apply the cognitive strategy. 

Another example of procedural prompts comes from a study by King,[569] who also taught students to generate questions. In her studies, however, she provided students with a list of question stems:

How are _____ and _____ alike?

What is the main idea of __________?

What do you think would happen if __________?

What are the strengths and weakness of __________ ?

In what way is _____ related to ______ ?

What do you think causes __________?

How does _____ tie in with what we have learned before?

What do I (you) still not understand about . . .?


Students practiced in groups, using these stems to ask each other questions about passages.  King found that students who practiced using these prompts were superior to control students in comprehension of new material.  Apparently, using these stems to develop and answer questions led the students to develop new internal approaches to reading text, and these approaches helped them when they now read new material.

A wide variety of excellent procedural prompts have been developed for reading comprehension, for writing, and for vocabulary.  Investigators have also developed a number of “concept maps” and “graphic organizers” that have been shown to help students learn from text.  Twenty-four or these procedural prompts and details on their use – mostly derived from successful studies – have been assembled in a useful book published by the Wisconsin State Department of Public Instruction.[570]


2. Demonstrate use of the prompt through modeling and thinking aloud.

On one hand, demonstration of use of the procedural prompts  is similar to traditional demonstrations by a teacher.  What is new is the addition of two cognitive supports: modeling of the cognitive strategy, and “thinking aloud” that provides an insight into how experts solve problems.  These supports would seem useful in a variety of instructional situations.


Provide  Models of the Appropriate Responses.  The literature on cognitive strategies has introduced us to the concept of a teacher modeling appropriate responses.  Excellent teachers have undoubtedly modeled difficult learning for centuries, but it was the cognitive strategy literature that highlighted this important instructional procedure.

As noted, prompts were “who,” “what” and “where,” then a teacher would model questions starting with those words.  This modeling occurred at the start of the lessons and also during the lesson when students were having problems developing questions.[571]

Modeling is particularly appropriate when using prompts for writing essays or arguments. The author once watched a class where the teacher spent the entire period modeling and leading the class as he completed an essay prompt using material from the play Macbeth, which the class had read and discussed.  The next day, he led the class as they completed the prompt using a second argument. The third day, he supervised the students as they used worked alone and used the prompt to develop a third argument.


Think Aloud, as Choices are Being Made. Another scaffold, similar to modeling, is thinking aloud: literally vocalizing the internal thought processes one goes through when using the cognitive strategy.  A teacher might think aloud while summarizing a paragraph –  illustrating the thought processes that occur as one first determines the topic of the paragraph then uses the topic  to generate a summary sentence. 

Thinking aloud by the teacher and more capable students provides novice learners with a way to observe “expert thinking” that  is usually hidden from the student.  Garcia and Pearson (1990) refer to this process as the teacher “sharing the reading secrets” by making them overt.[572]  Indeed, identifying the hidden strategies of experts so that they can become available to learners has become a useful area of research.[573]

Anderson[574] worked with adolescent readers who were competent decoders but poor in comprehension.  These readers were reluctant to identify or to attempt to solve problems that occurred during their reading.  The students met in groups, read somewhat difficult passages, and attempted to make sense of the passages.  Anderson illustrated the procedure she was trying to teach by modeling how one might attempt to clarify a difficult passage:

I don’t get this.   It says that things that are dark look smaller.   I know that a white dog looks smaller than a black elephant, so this rule must only work for things that are about the same size.   Maybe black shoes would make your feet look smaller than white ones would.


Anderson also modeled how they might summarize important information:

I’ll summarize this part of the article.   So far, it tells where the Spanish started in North America and what parts they explored.   Since the title is “The Spanish in California,” the part about California must be important.   I’d sum up by saying that Spanish explorers from Mexico discovered California.   They didn’t stay in California, but lived in other parts of America.   These are the most important ideas so far.


3. Guide Initial Practice through Techniques That Reduce the Difficulty of the Task.

Typically, after the modeling, the teacher guided students during their initial practice.  As they worked through text,  the teacher gave hints, reminders of the prompts, reminders of what was overlooked, and suggestions of how something could be improved.[575]

Much of this guided practice is similar to the guided practice that emerged from the teacher effects research.[576] Now, however, the guided practice is being applied to the learning of  higher-level tasks.  A number of investigators also developed procedures that facilitate practice by reducing the initial demands on the students.  These procedures, described next, would seem useful for teaching a variety of skills, strategies, and concepts, not just those illustrated here.


Regulate the difficulty of the task.  One approach to guiding practice has been to regulate the difficulty of the material by having the students begin with simpler material and then gradually move to more complex materials.  For example, when Palincsar taught students to generate questions, the teacher first modeled how to generate questions about a single sentence.[577] This was followed by class practice.  Next, the teacher modeled and provided practice on asking questions after reading a paragraph.  Finally, the teacher modeled and then the class practiced generating questions after reading an entire passage.  Similar procedures were used by other investigators.[578]

The same simple to complex procedure were used to teach the strategy of summarizing.[579] Students first learned to write summaries of single paragraphs, and then progressed, with guidance and modeling from the teacher, to producing a summary of longer passages. 

In another study where summarization was taught, the initial difficulty was reduced by starting the practice with material that was one or two grade levels below the students’ reading level.[580]  In a study by Blaha,[581] the teacher divided up the strategy.  She first taught a part of a strategy, then guided student practice in first identifying and then applying the strategy.  After that, the teacher taught next part of the strategy, and guided student practice.