The latest Perry Preschool analysis: Noisy data + noisy methods + flexible summarizing = Big claims

Dean Eckles writes:

Since I know you’re interested in Heckman’s continued analysis of early childhood interventions, I thought I’d send this along: The intervention is so early, it is in their parents’ childhoods.

See the “Perry Preschool Project Outcomes in the Next Generation” press release and the associated working paper.

The estimated effects are huge:

In comparison to the children of those in the control group, Perry participants’ children are more than 30 percentage points less likely to have been suspended from school, about 20 percentage points more likely never to have been arrested or suspended, and over 30 percentage points more likely to have a high school diploma and to be employed.

The estimates are significant at the 10% level. Which may seem like quite weak evidence (perhaps it is), but actually the authors employ a quite conservative inferential approach that reflects their uncertainty about how the randomization actually occurred, as discussed in a related working paper.

My quick response is that using a noisy (also called “conservative”) measure and then finding p less than 0.10 does not constitute strong evidence. Indeed, the noisier (more “conservative”) the method, the less informative is any given significance level. This relates to the “What does not kill my statistical significance makes me stronger” fallacy that Eric Loken and I wrote about (and here’s our further discussion)—but only more so here, as the significance is at the 10% rather than the conventional 5% level.

In addition, I see lots and lots and lots of forking paths and researcher degrees of freedom in statements such as, “siblings, especially male siblings, who were already present but ineligible for the program when families began the intervention were more likely to graduate from high school and be employed than the siblings of those in the control group.”

Just like everyone else, I’m rooting for early childhood intervention to work wonders. The trouble is, there are lots and lots of interventions that people hope will work wonders. It’s hard to believe they all have such large effects as claimed. It’s also frustrating when people such as Heckman routinely report biased estimates (see further discussion here). They should know better. Or they should at least know enough to know that they don’t know better. Or someone close to them should explain it to them.

I’ll say this again because it’s such a big deal: If you have a noisy estimate (because of biased or noisy measurements, small sample size, inefficient (possibly for reasons of conservatism or robustness) estimation, or some combination of these reasons), this does not strengthen your evidence. It’s not appropriate to give extra credence to your significance level, or confidence interval, or other statement of uncertainty, based on the fact that your data collection or statistical inference are noisy.

I’d say that I don’t think the claims in the above report would replicate—but given the time frame of any potential replication study, I don’t think replication will be tested one way or another, so a better way to put it is that I don’t think the estimates are at all accurate or reasonable.

But, hey, if you pick four point estimates to display, you get this:

That and favorable publicity will get you far.

P.S. Are we grinches for pointing out the flaws in poor arguments in favor of early childhood intervention? I don’t think so. Ultimately, our goal has to be to help these kids, not just to get stunning quotes to be used in PNAS articles, NPR stories, and Ted talks. If the researchers in this area want to flat-out make the argument that exaggeration of effects serves a social good, that these programs are so important that it’s worth making big claims that aren’t supported by the data, then I’d like to hear them make this argument in public, for example in comments to this post. But I think what’s happening is more complicated. I think these eminent researchers really don’t understand the problems with noise, researcher degrees of freedom, and forking paths. I think they’ve fooled themselves into thinking that causal identification plus statistical significance equals truth. And they’re supported by a academic, media, and governmental superstructure that continues to affirm them. These guys have gotten where they are in life by not listening to naysayers, so why change the path now? This holds in economics and policy analysis, just as it does in evolutionary psychology, social psychology, and other murky research areas. And, as always, I’m not saying that all or even most researchers are stuck in this trap; just enough for it to pollute our discourse.

What makes me sad is not so much the prominent researchers who get stuck in this way, but the younger scholars who, through similar good intentions, follow along these mistaken paths. There’s often a default assumption that, as the expression goes, with all this poop, there must be a pony somewhere. In addition to all the wasted resources involved in sending people down blind alleys, and in addition to the statistical misconceptions leading to further noisy studies and further mistaken interpretations of data, this sort of default credulity crowds out stronger, more important work, perhaps work by some junior scholar that never gets published in a top 5 journal or whatever because it doesn’t have that B.S. hook.

Remember Gresham’s Law of bad science? Every minute you spend staring at some bad paper, trying to figure out reasons why what they did is actually correct, is a minute you didn’t spend looking at something more serious.

And, yes, I know that I’m giving attention to bad work here, I’m violating my own principles. But we can’t spend all our time writing code. We have to spend some time unit testing and, yes, debugging. I put a lot of effort into doing (what I consider to be) exemplary work, into developing and demonstrating good practices, and into teaching others how to do better. I think it’s also valuable to explore how things can go wrong.