Understanding the “average treatment effect” number

In statistics and econometrics there’s lots of talk about the average treatment effect. I’ve often been skeptical of the focus on the average treatment effect, for the simple reason that, if you’re talking about an average effect, then you’re recognizing the possibility of variation; and if there’s important variation (enough so that we’re talking about “the average effect” rather than simply “the effect”), then maybe we care enough about this variation that we should be studying it directly, rather than just trying to reduce-form it away.

But that’s not the whole story. Consider an education intervention such as growth mindset. Sure, the treatment effect will vary. But if the treatment’s gonna be applied to everybody, then, yeah, let’s poststratify and estimate an average effect: this seems like a relevant number to know.

What I want to talk about today is interpreting that number. It’s something that came up in the discussion of growth mindset.

The reported effect size was 0.1 points of grade point average (GPA). GPA is measured on something like a 1-4 scale, so 0.1 is not so much; indeed one commenter wrote, “I hope all this fuss is for more than that. Ouch.”

Actually, though, an effect of 0.1 GPA point is a lot. One way to think about this is that it’s equivalent to a treatment that raises GPA by 1 point for 10% of people and has no effect on the other 90%. That’s a bit of an oversimplification, but the point is that this sort of intervention might well have little or no effect on most people. In education and other fields, we try lots of things to try to help students, with the understanding that any particular thing we try will not make a difference most of the time. If mindset intervention can make a difference for 10% of students, that’s a big deal. It would be naive to think that it would make a difference for everybody: after all, many students have a growth mindset already and won’t need to be told about it.

That’s all a separate question from the empirical evidence for that 0.1% increase. My point here is that thinking about an average effect can be misleading.

Or, to put it another way, it’s fine to look at the average, but let’s be clear on the interpretation.

I think this comes up in a lot of cases. Various interventions are proposed, and once the hype dies down, average effects will be small. Of course there’s no one-quick-trick or even one-small-trick that will raise GPA by 1 point or that will raise incomes by 44% (to use one of our recurring cautionary tales; see for example section 2.1 of this paper). An intervention that raised average GPA by 0.1 point or that raised average income by 4.4% would still be pretty awesome, if what it’s doing is acting on 10% of the people and having a big benefit on this subset. You try different interventions with the idea that maybe one of them will help any particular person.

Again, this discrete formulation is an oversimplification—it’s not like the treatment either works or doesn’t work on an individual person. It’s just helpful to understand average effects as compositional in that way. Otherwise you’re bouncing between the two extremes of hypothesizing unrealistically huge effect sizes or else looking at really tiny averages. Maybe in some fields of medicine this is cleaner because you can really isolate the group of patients who will be helped by a particular treatment. But in social science this seems much harder.