What’s the American Statistical Association gonna say in their Task Force on Statistical Significance and Replicability?

Blake McShane and Valentin Amrhein point us to an announcement (see page 7 of this newsletter) from Karen Kafadar, president of the American Statistical Association, which states:

Task Force on Statistical Significance and Replicability Created

At the November 2019 ASA Board meeting, members of the board approved the following motion:

An ASA Task Force on Statistical Significance and Reproducibility will be created, with a charge to develop thoughtful principles and practices that the ASA can endorse and share with scientists and journal editors. The task force will be appointed by the ASA President with advice and participation from the ASA BOD. The task force will report to the ASA BOD by November 2020. . . .

Based on the initial meeting, these members decided “replicability” was more in line with the critical issues than “reproducibility” (cf. National Academy of Sciences report, bit.ly/35YBLbu), hence the title of the task force is ASA Task Force on Statistical Significance and Replicability. . . .

Blake and Valentin and I are a little bit concerned that (a) this might become an official “ASA statement on Statistical Significance and Replicability” and could thus have an enormous influence, and (b) the listed committee seems like a bunch of reasonable people, no bomb-throwers like us or Nicole Lazar or John Carlin or Sander Greenland or various others to represent the voice of radical reform. We’re all reasonable people too, but we’re reasonable people who start from the perspective that, whatever its successes in engineering and industrial applications, null hypothesis significance testing has been a disaster in areas like social, psychological, environmental, and medical research—not the perspective that it’s basically a good idea that just needs a little bit of tinkering to apply to such inexact sciences.

I respect the perspectives of the status-quo people, the “centrists,” as it were—they represent a large group of the statistics community and should be part of any position taken by the American Statistical Association—but I think our perspective is important too.

I also don’t think that concerns about null hypothesis significance testing should be placed into a Bayesian/frequentist debate, with a framing that the Bayesians are foolish idealists and the frequentists are the practical people . . . that might have been the case 50 years ago, but it’s not the case now. As we have repeatedly written, the problem with thresholds is that they are used to finesse real uncertainty, and that’s an issue whether the threshold is based on p-values or posterior probabilities or Bayes factors or whatever. Again, we recognize and respect opposing views on this; our concern here is that the ASA discussion represents our perspective too, a perspective we believe is well supported on theoretical grounds and is also highly relevant to the recent replication crises in many areas of science.

This post is to stimulate some publicly visible discussion before the task force reports to the ASA board and in particular before the ASA board comes to a decision. The above-linked statement informs us that the leaders of this effort welcome input and are working on a mechanism for receiving comments from the community.

So go for it! As usual, feel free in the comments to disagree with me.