What’s the evidence on the effectiveness of psychotherapy?

Kyle Dirck points us to this article by John Sakaluk, Robyn Kilshaw, Alexander Williams, and Kathleen Rhyner in the Journal of Abnormal Psychology, which begins:

Empirically supported treatments (or therapies; ESTs) are the gold standard in therapeutic interventions for psychopathology. Based on a set of methodological and statistical criteria, the APA [American Psychological Association] has assigned particular treatment-diagnosis combinations EST status and has further rated their empirical support as Strong, Modest, and/or Controversial. Emerging concerns about the replicability of research findings in clinical psychology highlight the need to critically examine the evidential value of EST research. We therefore conducted a meta-scientific review of the EST literature.

And here’s what they found:

This review suggests that although the underlying evidence for a small number of empirically supported therapies is consistently strong across a range of metrics, the evidence is mixed or consistently weak for many, including some classified by Division 12 of the APA as “Strong.”

It was hard for me to follow exactly which are the therapies that clearly work and which are the ones where the evidence is so clear. This seems like an important detail, no? Or maybe I’m missing the point. The difference between significant and not significant is not statistically significant, right?

They also write:

Finally, though the trend towards increased statistical power in EST research is a positive development, there must be greater continued effort to increase the evidential value—broadly construed—of the EST literature . . . EST research may need to eschew the model of small trials. A combined workflow of larger multi-lab registered reports (Chambers, 2013; Uhlmann et al., 2018) coupled with thorough analytic review (Sakaluk, Williams, & Biernat, 2014) would yield the highest degree of confirmatory, accurate evidence for the efficacy of ESTs.

This makes sense. But, speaking generally, I think it’s important when talking about improved data collection to not just talk about increasing your sample size. Don’t forget measurement. I don’t know enough about psychotherapy to say anything specific, but there should be ways of getting repeated measurements on people, intermediate outcomes, etc., going beyond up-or-down summaries to learn more from each person in these studies.


So, there’s lots going on here, statistically speaking, regarding the very important topic of the effectiveness of psychotherapy.

First, I’d like to ask, Which treatments work and which don’t? But we can’t possibly answer that question. The right thing to do is to look at the evidence on different treatments and summarize as well as we can, without trying to make a sharp dividing line between treatments that work and treatments that don’t, or are unproven.

Second, different treatments work for different people, and in different situations. That’s the real target: trying to figure out what to do when. And, for reasons we’ve discussed, there’s no way we can expect to approach anything like certainty when addressing such questions.

Third, when gathering data and assessing evidence, we have to move beyond procedural ideas such as preregistration and the simple statistical idea of increasing N, and think about design and data quality linked to theoretical understanding and real-world goals.

I’ve put that last paragraph in bold, as perhaps it will be the most relevance to many of you who don’t study psychotherapy but are interested in experimental science.

Tomorrow’s post: What does a “statistically significant difference in mortality rates” mean when you’re trying to decide where to send your kid for heart surgery?