Commenter Thanatos Savehn pointed to an official National Academy of Sciences report on Reproducibility and Replicability that included the following “set of criteria to help determine when testing replicability may be warranted”:
1) The scientific results are important for individual decision-making or for policy decisions.
2) The results have the potential to make a large contribution to basic scientific knowledge.
3) The original result is particularly surprising, that is, it is unexpected in light of previous evidence and knowledge.
4) There is controversy about the topic.
5) There was potential bias in the original investigation, due, for example, to the source of funding.
6) There was a weakness or flaw in the design, methods, or analysis of the original study.
7) The cost of a replication is offset by the potential value in reaffirming the original results.
8) Future expensive and important studies will build on the original scientific results.
I’m ok with items 1 and 2 on this list, and items 7 and 8: You want to put in the effort to replicate on problems that are important, and where the replications will be helpful. One difficulty here is are determining if “The scientific results are important . . . potential to make a large contribution to basic scientific knowledge.” Consider, for example, Bem’s notorious ESP study: if the claimed results are true, they could revolutionize science. If there’s nothing there, though, it’s not so interesting. This sort of thing comes up a lot, and it’s not clear how we should answer questions 1 and 2 above in the context of such uncertainty.
But the real problem I have is with items 3, 4, 5, and 6, all of which would seem to favor replications of studies that have problems.
In particular consider item 6: “There was a weakness or flaw in the design, methods, or analysis of the original study.”
I’d think about it the other way: If a study is strong, it makes sense to try to replicate it. If a study is weak, why bother?
Here’s the point. Replication often seems to be taken as a sort of attack, something to try when a study has problems, an attempt to shoot down a published claim. But I think that replication is an honor, something to try when you think a study has found something, to confirm something interesting.
ESP, himmicanes, ghosts, Bigfoot, astrology etc.: all very interesting if true, not so interesting as speculations not supported by any good evidence.
So I recommend changing items 3, 4, 5, and 6 of the National Academy of Sciences. Instead of replicating studies with problems, let’s replicate the good studies.
To put it another way: The problem with the above guidelines is that they implicitly assume that if a study doesn’t have obvious major problems, that it should be believed. Thus, they see the point of replications as checking up on iffy claims. But I’d say it the other way: unless a study in its design, data collection, and results are unambiguously clear, we should default to skepticism, hence replication can be valuable in giving support to a potentially important claim.
Tomorrow’s post: Is “abandon statistical significance” like organically fed, free-range chicken?