The value (or lack of value) of preregistration in the absence of scientific theory

https://statmodeling.stat.columbia.edu/2020/03/26/the-value-or-lack-of-value-of-preregistration-in-the-absence-of-scientific-theory/

Javier Benitez points us to this 2013 post by psychology researcher Denny Borsboom.

I have some thoughts on this article—in particular I want to compare psychology to other social science fields such as political science and economics—but first let me summarize it.

Preregistration and open science

Borsboom writes:

In the past few months, the Center for Open Science and its associated enterprises have gathered enormous support in the community of psychological scientists. While these developments are happy ones, in my view, they also cast a shadow over the field of psychology: clearly, many people think that the activities of the Center for Open Science, like organizing massive replication work and promoting preregistration, are necessary. That, in turn, implies that something in the current scientific order is seriously broken. I think that, apart from working towards improvements, it is useful to investigate what that something is. In this post, I want to point towards a factor that I think has received too little attention in the public debate; namely, the near absence of unambiguously formalized scientific theory in psychology.

Interesting. This was 6 years ago, and psychology researchers are continuing to have these arguments today.

Borsboom continues:

Scientific theories allow you to work out, on a piece of paper, what would happen to stuff in conditions that aren’t actually realized. So you can figure out whether an imaginary bridge will stand or collapse in imaginary conditions. You can do this by simply just feeding some imaginary quantities that your imaginary bridge would have (like its mass and dimensions) to a scientific theory (say, Newton’s) and out comes a prediction on what will happen. In the more impressive cases, the predictions are so good that you can actually design the entire bridge on paper, then build it according to specifications (by systematically mapping empirical objects to theoretical terms), and then the bridge will do precisely what the theory says it should do. . . .

That’s how they put a man on the moon and that’s how they make the computer screen you’re now looking at. It’s all done in theory before it’s done for real, and that’s what makes it possible to construct complicated but functional pieces of equipment. This is, in effect, why scientific theory makes technology possible, and therefore this is an absolutely central ingredient of the scientific enterprise which, without technology, would be much less impressive than it is. . . .

My [Borsboom’s] field – psychology – unfortunately does not afford much of a lazy life. We don’t have theories that can offer predictions sufficiently precise to intervene in the world with appreciable certainty. That’s why there exists no such thing as a psychological engineer. And that’s why there are fields of theoretical physics, theoretical biology, and even theoretical economics, while there is no parallel field of theoretical psychology. . . .

Interesting. He continues:

And that’s why psychology is so hyper-ultra-mega empirical. We never know how our interventions will pan out, because we have no theory that says how they will pan out . . . if we want to know what would happen if we did X, we have to actually do X. . . .

This has important consequences. For instance, as a field has less theory, it has to leave more to the data. Since you can’t learn anything from data without the armature of statistical analysis, a field without theory tends to grow a thriving statistical community. Thus, the role of statistics grows as soon as the presence of scientific theory wanes.

Not quite. Statistics gets used in atomic physics and in pharmacometrics: two fields where there is strong theory, but we still need statistics to draw inference from indirect data. The relevant buzzword here is “inverse problem.”

Borsboom’s comments all seem reasonable for the use of statistics within personality and social psychology, but I think he jumped too quickly to generalize about statistics in other fields.

Even within psychology, I don’t think his generalization quite works. Consider psychometrics. Theory in psychometrics isn’t quite as strong as theory in physics or pharmacometrics, but it’s not bad; indeed sometimes they can even break down the skill of solving a particular problem into relevant sub-skills. To fit such models, with all their latent parameters, you need sophisticated statistical methods.

So, yes, they use fancy statistics in junk science like ESP and embodied cognition, but they also use fancy statistics in real science like psychometrics that involve multivariate inference from sparse data.

Borsboom continues:

It would be extremely healthy if psychologists received more education in fields which do have some theories, even if they are empirically shaky ones, like you often see in economics or biology.

Again, what about psychometrics? “Psychology” is not just personality and social psychology, right?

The post also has several comments, my favorite of which is from Greg Francis, another psychology researcher, who writes:

I [Francis] liked many of the points raised in the article, but I do not understand the implied connection between the lack of theoretical ideas and the need for preregistration. . . .

If we do not have a theory that predicts the outcome of an experiment, then what is the point of preregistration? What would be preregistered? Guesses? It does not seem helpful to know that a researcher wrote some guesses about their experimental outcomes prior to gathering the data. Would experimental success validate the guesses in some sense? What if the predictions were generated by coin flips? . . .

I agree with Francis here. Indeed, we have an example of the problems with preregistration in a study published a year ago that purportedly offered “real-world experimental evidence” that “exposure to socioeconomic inequality in an everyday setting negatively affects willingness to publicly support a redistributive economic policy.” This was a study that was preregistered, but with such weak theory that when the result came in the opposite direction as predicted (in a way consistent with noise, as discussed for example in this comment), the authors immediately turned around and proposed an opposite theory to explain the results, a theory which in turn was presented as being so strong that the study was said to “[advance] our understanding of how environmental factors, such as exposure to racial and economic outgroups, affect human behavior in consequential ways.” Now, it’s tricky to navigate these waters. We can learn from surprises, and it’s fine to preregister theories, including those that do not end up supported by data. The point here is that the theory in this example is indeed pretty close to valueless. Or, to put it more carefully, the theory is fine for what it is but it does not interact usefully with the experiment, as the experimental data are too noisy the data. This is the same way in which the theory of relativity is fine for what it is but it does not interact usefully with a tabletop experiment of balls rolling down inclined planes.

So, yes, as Greg Francis says, you don’t get much scientific value from preregistering guesses.

The other thing which came up in the discussion of the above example, which was not explicitly mentioned by Borsboom, is that a lot of social scientists seem to turn off their critical thinking when a randomized experiment is involved. I wrote that Borsboom didn’t explicitly mention this point, but you could say that he was considering it implicitly, in that lots of junk science of the psychology variety involves experimentation, and there does seem to the attitude that:

random assignment + statistically significant p-value = meaningful scientific result,

and that:

random assignment + statistically significant p-value + good story = important scientific result.

The random assignment plays a big part in this story, which is particularly unfortunate in recent psychology research, given that psychologists are traditionally trained to be concerned with validity (are you measuring something of interest) and reliability (are your measurements stable enough to allow you to learn anything generalizable from your sample).

There’s a lot going on in this example, but the key points are:

– I agree with Borsboom that personality and social psychology have weak theory.

– I also agree that when researchers have weak theory, they can rely on statistics.

– I disagree with the claim that “the role of statistics grows as soon as the presence of scientific theory wanes”: highly theoretical fields can also rely heavily on statistics.

– I agree with Francis that preregistration doesn’t have much value in the absence of strong theory and measurement.

Comparing preregistration to fake-data simulation

You might wonder about this last statement of above, given that I published a preregistered study of my own on a topic where we had zero theory. And, more generally, I recommend that, before gathering new data, we start any research project with fake-data simulation, which is a sort of preregistered hypothesis, or model, of the underlying process as well as the data-generating mechanism. I think the value of preregistration, or fake-data simulation, is that it clarifies our research plans and it clarifies our model of the world. Here I’m thinking of “model” as a statistical model or generative model of data, not as a theoretical model that would, for example, explain the direction and size of an effect.

I think it’s worth further exploring this distinction between statistical generative model and scientific theoretical model, but let’s set aside this for now.

There is, of course, another reason that’s given for preregistration, which is to avoid forking paths and thus allow p-values to have their nominal interpretations. That’s all fine but it’s not so important to me, as I’m not usually computing p-values in the first place. I like the analogy between preregistration, random sampling, and controlled experimentation.

To put it another way, preregistration is fine but it doesn’t solve your problem if studies are sloppy, variation is high, and effects are small. That said, fake-data simulation can be helpful in (a) making a study less sloppy, and (b) giving us a sense of variation is high compared to effect sizes. What’s needed here, though, is what might be called quantitative preregistration: the point is not to lay down a marker about prior belief in the direction of an effect, but to make quantitative assumptions about effect sizes and variation.

Comparing psychology to other sciences

Unfortunately, in recent years some political scientists have been following the lead of psychology and have been making grand claims from small experiments plus zero theory (even theory that would appear to contradict the experimental data).

On the other hand, we draw all sorts of generalizations from, say 16 presidential elections, and it’s not like we have a lot of theory there. I mean, sure, voters react to economic performance. But that’s not quite a theory, right?

I asked Borsboom what he thought, and he wrote:

On the other hand, the culture of psychology has changed for the better and has done so rather quickly (especially given the general inertia of culture). People are really a lot more open about everything and I would say the field is in a much better state now than say ten years ago.

Yet even after all those years it strikes me that psychology’s reaction to the whole crisis has basically been the same as it’s always been: gather more data. Of course: data are now gathered in a massive multicenter open reproducible data settings and analyzed with bayesian statistics, model averaging, and multiverses. But data is data is data, it doesn’t magically turn into good scientific theory if you just gather enough of it.

I sometimes think that psychology should just stop gathering data for a year or two. Just organize the data we have gathered over the past century and think deeply about how to explain the reliable patterns we have observed in these data. Or perhaps 10% of the scientists could do that. Rather than 100% of the people chasing new data 100% of the time.

He also points to this article by Paul Smaldino, Better methods can’t make up for mediocre theory.