Filling/emptying the half empty/full glass of profitable science: Different views on retiring versus retaining thresholds for statistical significance.

Unless you are new to this blog, you likely will know what this is about.

Now, by profitable science in the title is meant repeatedly producing logically good explanations  which “through subjection to the test of experiment experiment, to lead to the avoidance of all surprise and to the establishment of a habit of positive expectation that shall not be disappointed.” CS Peirce

It all started with a Nature commentary by Valentin Amrhein, Sander Greenland, and Blake McShane. Then the discussion , then thinking about it , then an argument that it is sensible and practical , then an example of statistical significance not working and then a dissenting opinion by Deborah Mayo .

Notice the lack of finally!

However, Valentin Amrhein, Sander Greenland, and Blake McShane have responded with a focused and concise discernment why they think retiring statistical significance will fill up the glass of profitable science while maintaining hard default thresholds for declaring statistical significance will continue to empty it. Statistical significance gives bias a free pass. This is their just published letter to the editor (JPA Ioannidis) on TA Hardwicke and JPA Ioannidis’ Petitions in scientific argumentation: Dissecting the request to retire statistical significance, where Hardwicke and Ioannidis argued (almost) the exact opposite.

“In contrast to Ioannidis, we and others hold that it is using – not retiring – statistical significance as a “filtering process” or “gatekeeper” that “gives bias a free pass”. “

A two sentence excerpt that I liked the most was “Instead, it [retiring statistical significance] encourages honest description of all results and humility about conclusions, thereby reducing selection and publication biases. The aim of single studies should be to report uncensored information that can later be used to make more general conclusions based on cumulative evidence from multiple studies.”

However, the full letter to the editor is only slightly longer than two pages – so should be read in full – Statistical significance gives bias a free pass.

I also can’t help but wonder how much of the discussion that ensued from the initial  Nature commentary could have been avoided if less strict page limitations had been allowed.

Now it may seem strange for an editor who is also an author on the paper drawing a critical letter to the editor – accepts it. It happens, but not always. I also submitted a letter to the editor on this same paper and the same editor rejected it without giving a specific reason. That full letter of mine is below for those who might be interested.

My letter was less focused but had three main points. Someone with a strong position on a topic that undertakes to do a survey themselves displaces the opportunity for others without such strong positions to learn more, univariate  summaries of responses can be misleading and pre-registration (minor) violations and comments (only given in the appendix) can provided insight into the quality of the design and execution of thw survey. For instance, the authors had anticipated analyzing nominal responses with correlation analysis.

Read more.


Those with a truly scientific attitude should look forward (or even be thrilled) by opportunities to learn how they were wrong. This survey has provided some. In particular, potential signatories should have been informed about what a signatory was endorsing and perhaps given list of options to choose from. Before I give my own personal sense of why I chose to be a signatory, I believe that I should first disclose that I did not respond to the survey.

This was primarily due to my lack of confidence, initially in the survey software and then the survey itself – especially given the lack of any ethics approval. When I initially read through the online survey to get a sense of the full set of responses before I replied, it accepted my “submission” (without a confirmation prompt). I contacted the authors about the problem and they indicated they could remove my response. Later when I clicked on my link, surprisingly my unfinished survey appeared on my screen containing some initial text I had entered. I had not been warned about this possibility and may well have shared my link with others.

Perhaps given these concerns, I assessed the survey itself more thoroughly. I found it poorly designed and to me ill thought out. Given the disclosure in the supplementary information that the pre-registered protocol had anticipated the examination of correlations between nominal responses – suggests that I may not have been too wrong in my assessment. I would encourage interested readers to read all the open response comments in the supplementary information to get a better sense of other concerns.

My own personal sense, was that as signatory I was endorsing that the commentary was worth serious consideration along with the understanding that various arguments could be disregarded or set aside if found not compelling. Something that was not to be ignored but rather, only possibly disregarded after due consideration. It was comforting to see that in question 8 (that address the expected benefit of the “petition”), 83% of the respondents chose “A: I felt it would draw attention to the argument”. But the authors’ text highlights “almost a third [of respondents]felt it would make the argument more convincing”. Now 31% did choose “B: I felt that it would make the argument more convincing” but the accusation they committed a logical fallacy – the logical fallacy of “argumentum ad populum” – should not immediately follow.

This is unfair for two reasons. First, the sentence “B: I felt that it would make the argument more convincing” can be interpreted descriptively or normatively. A normative interpretation is required for the logical fallacy. Many of the respondent may well have been thinking of it descriptively – although readers should not, many will actually accept the authority of large numbers. Unfortunately as we all know, many readers of methodology papers don’t do what they should but rather what they wish. Second, the response to this question was multivariate (tick all that apply) and the univariate reporting in the main paper is potentially misleading. This is only clear in the supplementary information in table S2. Only 6% of respondents chose B as their sole response. On the other hand, 83% of those who chose B, also chose A. Arguably, the interpretation of the joint response of AB is far less supportive of an accusation of committing the logical fallacy of “argumentum ad populum” than B on its own.

More generally, the authors chose to proceed with the survey without any ethics approval. They indicate some sort advice being sought on this from leadership of QUEST but no information about who they are and what they actually did, was provided. Research ethics is not just about protecting participants and preventing conflicts of interest but among other things maximizing the value of research. It is hard believe a qualified group would have missed the incorrect anticipation of the appropriateness of analyzing nominal response with correlation. That is, if they had they carefully reviewed it. These quality issues complicate any careful interpretation of the survey and definitely suggest selective non-response.

I know at the start of the survey there were numerous emails and twitter comments among signatories suggesting that the survey should not be responded to. Additionally, there are also many wary comments from respondents in the survey responses. Given this, I think more would have been learned from the survey if a group without a previous strong position on signatories had been asked to do the survey. If the authors had asked them to do this, they simply would have needed to indicate this. If another group was not available or willing to do the survey, research ethics approval should have been sought rather than presumed unnecessary.