Getting all negative about so-called average power

Blake McShane writes:

The idea of retrospectively estimating the average power of a set of studies via meta-analysis has recently been gaining a ton of traction in psychology and medicine. This seems really bad for two reasons:

1. Proponents claim average power is a “replicability estimate” and that it estimates the rate of replicability “if the same studies were run again”. Estimation issues aside, average power obviously says nothing about replicability in any real sense that is meaningful for actual prospective replication studies. It perhaps only says something about replicability if we were able to replicate in the purely hypothetical repeated sampling sense and if we defined success in terms of statistical significance.

2. For the reason you point out in your Bababekov et al. Annals of Surgery letter, the power of a single study is not estimated well:
taking a noisy estimate and plugging it into a formula does not give us “the power”; it gives us a very noisy estimate of the power
Having more than one study in the average power case helps, but not much. For example, in the ideal case of k studies all with the same power, ~ N(z.true, 1/k) and mapping this estimate to power results in a very noisy distribution except for k large (roughly 60 in this ideal case). If you also try to adjust for publication bias as average power proponents do, the resulting distribution is much noisier and requires hundreds of studies for a precise estimate.

In sum, people are left with a noisy estimate that doesn’t mean what they think it means and that they do not realize is noisy!

With all this talk of negativity and bearing in mind the bullshit asymmetry principle, I wonder whether you would consider posting something on this or having a blog discussion or guest post or something along those lines. As Sander and Zad have discussed, it would be good to stop this one in its tracks fairly early on before it becomes more entrenched so as to avoid the dreaded bullshit asymmetry.

He also links to an article, “Average Power: A Cautionary Note,” with Ulf Böckenholt and Karsten Hansen, where they find that “point estimates of average power are too variable and inaccurate for use in application” and that “the width of interval estimates of average power depends on the corresponding point estimates; consequently, the width of an interval estimate of average power cannot serve as an independent measure of the precision of the point estimate.”