Further debate over mindset interventions


Following up on this post, “Study finds ‘Growth Mindset’ intervention taking less than an hour raises grades for ninth graders,” commenter D points us to this post by Russell Warne that’s critical of research on growth mindset.

Here’s Warne:

Do you believe that how hard you work to learn something is more important than how smart you are? Do you think that intelligence is not set in stone, but that you can make yourself much smarter? If so, congratulations! You have a growth mindset.

Proposed by Stanford psychologist Carol S. Dweck, mindset theory states that there are two perspectives people can have on their abilities. Either they have a growth mindset–where they believe their intelligence and their abilities are malleable–or they have a fixed mindset. People with a fixed mindset believe that their abilities are either impossible to change or highly resistant to change.

According to the theory, people with a growth mindset are more resilient in the face of adversity, persist longer in tasks, and learn more in educational programs. People with a fixed mindset deny themselves of these benefits.

I think he’s overstating things a bit: First, I think mindsets are more continuous than discrete: Nobody can realistically think that, on one hand, that hard work can’t help you learn, or, on the other hand, that all people are equally capable of learning something, if they just work hard. I mean, sure, maybe you can find inspirational quotes or whatever, but no one could realistically believe either of these extremes. Similarly, it’s not clear what is meant by hard work being “more important” than smarts, given that these two attributes would be measured on different scales.

But, sure, I guess that’s the basic picture.

Warne summarizes the research:

On the one side are the studies that serious call into question mindset theory and the effectiveness of its interventions. Li and Bates (2019) have a failed replication of Mueller and Dweck’s (1998) landmark study on how praise impacts student effort. Glerum et al. (in press) tried the same technique on older students in vocational students and found zero effect. . . .

The meta-analysis from Sisk et al. (2018) is pretty damning. They found that the average effect size for mindset interventions was only d = .08. (In layman’s terms, this would move the average child from the 50th to the 53rd percentile, which is extremely trivial.) Sisk et al. (2018) also found that the average correlation between growth mindset and academic performance is a tiny r = .10. . . .

On the other hand, there are three randomized control studies that suggest that growth mindset can have a positive impact on student achievement. Paunesku et al. (2015) found that teaching a growth mindset raised the grad point averages of at-risk students by 0.15 points. (No overall impact for all students is reported.) Yeager et al. (2016) found that at-risk students’ GPAs improved d = .10, but students with GPAs above the median had improvements of only d = .03. . . .

So, mixed evidence. But I think that we can all agree that various well-publicized claims of huge benefits of growth mindset are ridiculous overestimates, for the same reason that we don’t believe that early childhood intervention increases adult earnings by 42%, etc etc etc. On the other hand, smaller effects in some particular subsets of the population . . . that’s more plausible.

Then Warne lays down the hammer:

For a few months, I puzzled over the contradictory literature. The studies are almost evenly balanced in terms of quality and their results.

Then I discovered the one characteristic that the studies that support mindset theory share and that all the studies that contradict the theory lack: Carol Dweck. Dweck is a coauthor on all three studies that show that teaching a growth mindset can improve students’ school performance. She is also not a coauthor on all of the studies that cast serious doubt on mindset theory.

So, there you go! Growth mindsets can improve academic performance—if you have Carol Dweck in charge of your intervention. She’s the vital ingredient that makes a growth mindset effective.

I don’t think Warne really thinks that Dweck can really make growth mindset work. I think he’s being ironic and that what he’s really saying is that the research published by Dweck and her collaborators is not to be trusted.


I sent the above to David Yeager, first author of the recent growth-mindset study, and he replied:

I [Yeager] don’t see why there has to be a conflict between mindset and IQ; there is plenty of variance to go around. But that aside, I think the post reflects a few outdated ways of thinking that devoted readers of your papers and your blog would easily spot.

The first is a “vote-counting” approach to significance testing, which I think you’ve been pretty clear is a problem. The post cites Rienzo et al. as showing “no impact” for growth mindset and our Nature paper as showing “an impact.” But the student intervention in Rienzo showed an ATE of .1 to 18 standard deviations (pg. 4 https://files.eric.ed.gov/fulltext/ED581132.pdf). That’s anywhere from 2 to 3.5X the ATE from the student intervention in our pre-registered Nature paper (which was .05 SD).  But Rienzo’s effects aren’t significant because it’s a cluster-randomized trial, while ours are because we did a student-level randomized trial. The minimum detectable effect for Rienzo was .4 to .5 SD, and I’ve never done a mindset study with anywhere near that effect size! It’s an under-powered study.

In a paper last year, McShane argued pretty persuasively that we need to stop calling something a failed replication when it has the same or larger effect as previous studies, but wider confidence intervals. The post you sent didn’t seem to get that message.

Second, the post uses outdated thinking about standardized effect sizes for interventions. The .1 to .18 in Rienzo are huge effects for adolescent RCTs. When you look at the I3 evaluations, which have the whole file drawer and pre-registered analyses, you can get an honest distribution of effects, and almost nothing exceeds .18 (Matt Kraft did this analysis). The median for adolescent interventions is .03. If the .18 is trustworthy, that’s massive, not counterevidence for the theory.

Likewise, the post says that an ATE of .08, which is what Sisk et al. estimated, is “extremely trivial.” But epidemiologists know really well (e.g. Rose’s prevention paradox) that a seemingly small average effect could mask important subgroup effects, and as long as those subgroup effects were reliable and not noise, then depending on the cost and scalability of the intervention, an ATE of .08 could be very important. And seemingly small effects can have big policy implications when they move people across critical thresholds. Consider that the ATE in our Nature paper was .05, and the effect in the pre-registered group of lower-achievers was .11. That corresponded to an overall 3 percentage point decrease in failing to make adequate progress in 9th grade, and a 3 point increase in taking advanced math the next year, both key policy outcomes. This is pretty good considering that we already showed the intervention could be scaled across the U.S. by third parties, and could be generalized to 3 million students per year in the U.S. I should note that Paunesku et al. 2015 and Yeager et al. 2016 also reported the D/F reduction in big samples, and a new paper from Norway replicated the advanced math result. So these are replicable, meaningful policy-relevant effects from a light-touch intervention, even if they seem small in terms of raw standard deviations.

Unfortunately, unrealistic thinking about effect sizes is common in psychology, and it is kept alive by the misapplication of effect size benchmarks, like you see in the Sisk et al.. Sisk et al. stated that the “average effect for a typical educational intervention on academic performance is .57,” (pg. 569) but Macnamara is citing John Hattie’s meta-analysis. As Slavin put it, “John Hattie is wrong.” And in the very paper that Macnamara cites for the .57 SD “typical effect,” Hattie says that those are immediate, short-term effects; when he subsets on longer-term effects on academic outcomes, which the mindset interventions focus on, it “declined to an average of .10.” (pg. 112). But Sisk/Macnamara cherry-pick the .57. I don’t see how Sisk et al. reporting .08 for the ATE and more than twice that for at-risk or low-ses groups is “damning.” .08 ATE seems pretty good, considering the cost and scalability of the intervention and the robust subgroup effects.

The third outdated way of thinking is that it is focused on main effects, not heterogeneous effects. In a new paper that Beth Tipton and I wrote [see yesterday’s post], we call it a “hetero-naive” way of thinking.

One way this post is hetero-naive is by assuming that effects from convenience samples, averaged in a meta-analysis, give you “the effect” of something. I don’t see any reason to assume that meta-analysis of haphazard samples converges on a meaningful population parameter of any kind. It might turn out that way by chance sometimes, but that’s not a good default assumption. For instance, Jon Krosnick and I show the non-correspondence between meta-analyses of haphazard samples and replications in representative samples in the paper I sent you last year.

The post’s flawed assumption really pops out when this blog post author cites a meta-analysis of zero-order correlations between mindset and achievement. I don’t see any reason why we care about the average of a haphazard sample of correlational studies when we can look at truly generalizable samples. The 2018 PISA administered the mindset measure to random samples from 78 OECD nations, with ~600,000 respondents, and they find mindset predicts student achievement in all but three. With international generalizability, who cares what Sisk et al. found when their meta-analysis averaged a few dozen charter school kids with a few dozen undergrads and a bunch of students on a MOOC?

Or consider that this post doesn’t pay attention to intervention fidelity as an explanation for null results, even though that’s the very first thing that experts in education focus on (see this special issue). I heard that, in the case of the Foliano study, up to 30% of the control group schools already were using growth mindset and even attended the treatment trainings, and about half of the treatment group didn’t attend many of the trainings. On top of that, the study was a cluster-randomized trial and had an MDE larger than our Nature paper found, which means they were unlikely to find effects even with perfect fidelity.

I don’t mean to trivialize the problems of treatment fidelity; they are real and they are hard to solve, especially in school-level random assignment. But those problems have nothing to do with growth mindset theory and everything to do with the challenges of educational RCTs. It’s not Carol Dweck’s fault that it’s hard to randomize teachers to PD.

Further, the post is turning a blind eye to another important source of heterogeneity: changes in the actual intervention. We have successfully delivered interventions to students, in two pre-registered trials: Yeager et al., 2016, and Yeager et al., 2019. But we don’t know very much at all yet about changing teachers or parents. And the manipulation with no effects in Rienzo was the teacher component, and the Foliano study also tried to change teachers and schools. These are good-faith studies but they’re ahead of the science. Here’s my essay on this. I think it’s important for scientists to dig in and study why it’s so hard to create growth mindset environments, ones that allow the intervention to take root. I don’t see much value in throwing our hands up and abandoning lots of promising ideas just because we haven’t figured out the next steps yet.

In light of this, it seems odd to conclude that Carol Dweck’s involvement is the special ingredient to a successful study, which I can only assume is done to discredit her research.

First, it isn’t true. Outes et al. did a study with the world bank and found big effects (.1 to .2 SD), without Carol, and there’s the group of behavioral economists in Norway who replicated the intervention (I gave them our materials and they independently did the study).

Second, if I was a skeptic who wondered about independence (and I am a skeptic), I would ask for precisely the study we published in Nature: pre-registered analysis plan, independent data collection and processing, independent verification of the conclusions by MDRC, re-analysis using a multilevel Bayesian model (BCF) that avoids the problems with null hypothesis testing, and so on. But we already published that study, so it seems weird to be questioning the work now as if we haven’t already answered the basic questions of whether growth mindset effects are replicable by independent experimenters and evaluators.

The more sophisticated set of questions focuses on whether we know how to spread and scale the idea up in schools, whether we know how to train others to create effective growth mindset content, etc. And the answer to that is that we need to do lots of science, and quickly, to figure it out. And we’ll have to solve perennial problems with teacher PC and school culture change—problems that affect all educational interventions, not just mindset. I suspect that will be hard work that requires a big team and a lot of humility.

Oh, and the post mentions Li and Bates, but that’s just not a mindset intervention study. It’s a laboratory study of praise and its effects on attributions and motivation. It’s not that different from the many studies that Susan Gelman and others have done on essentialism and its effects on motivation. Those studies aren’t about long-term effects on grades or test scores so I don’t understand why this blog post mentions them. A funny heterogeneity-related footnote to Li and Bates is that they chose to do their study in one of the only places where mindset didn’t predict achievement in the PISA — rural China — while the original study was done in the U.S., where mindset robustly predicts achievement.