Yesterday we discussed difficulties with the concept of average treatment effect.

Part of designing a study is accounting for uncertainty in effect sizes. Unfortunately there is a tradition in clinical trials of making optimistic assumptions in order to claim high power. Here is an example that came up in March, 2020. A doctor was designing a trial for an existing drug that he thought could be effective for high-risk coronavirus patients. I was asked to check his sample size calculation: under the assumption that the drug increased survival rate by 25 percentage points, a sample size of N = 126 would assure 80% power. With 126 people divided evenly in two groups, the standard error of the difference in proportions is bounded above by √(0.5*0.5/63 + 0.5*0.5/63) = 0.089, so an effect of 0.25 is at least 2.8 standard errors from zero, which is the condition for 80% power for the z-test.

When I asked the doctor how confident he was in his guessed effect size, he replied that he thought the effect on these patients would be higher and that 25 percentage points was a conservative estimate. At the same time, he recognized that the drug might not work. I asked the doctor if he would be interested in increasing his sample size so he could detect a 10 percentage point increase in survival, for example, but he said that this would not be necessary.

It might seem reasonable to suppose that a drug might not be effective but would have a large individual effect in case of success. But this vision of uncertainty has problems. Suppose, for example, that the survival rate was 30% among the patients who do not receive this new drug and 55% among the treatment group. Then in a population of 1000 people, it could be that the drug has no effect on the 300 of people who would live either way, no effect on the 450 who would die either way, and it would save the lives of the remaining 250 patients. There are other possibilities consistent with a 25 percentage point benefit—for example the drug could save 350 people while killing 100—but we will stick with the simple scenario for now. In any case, the point is that the posited benefit of the drug is not “a 25 percentage point benefit” for each patient; rather, it’s a benefit on 25% of the patients. And, from that perspective, of course the drug could work but only on 10% of the patients. Once we’ve accepted the idea that the drug works on some people and not others—or in some comorbidity scenarios and not others—we realize that “the treatment effect” in any given study will depend entirely on the patient mix. There is no underlying number representing the effect of the drug. Ideally one would like to know what sorts of patients the treatment would help, but in a clinical trial it is enough to show that there is some clear average effect. My point is that if we consider the treatment effect in the context of variation between patients, this can be the first step in a more grounded understanding of effect size.

This is an interesting example because the outcome is binary—live or die—so the variation in the treatment effect is obvious. By construction, the treatment effect on any given person is +1, -1, or 0, and there’d be no way for it to be 0.25 on everybody. Even in this clear case, however, I think the framing in terms of average treatment effect causes problems, as illustrated in the story above.