“The good news about this episode is that it’s kinda shut up those people who were criticizing that Stanford antibody study because it was an un-peer-reviewed preprint. . . .” and a P.P.P.S. with Paul Alper’s line about the dead horse

People keep emailing me about this recently published paper, but I already said I’m not going to write about it. So I’ll mask the details.

Philippe Lemoine writes:

So far it seems you haven’t taken a close look at the paper yourself and I’m hoping that you will, because I’m curious to know what you think and I know I’m not alone.

I really think there are serious problems with this paper and that it shouldn’t have been published without doing something about them. In my opinion, the most obvious issue is that, just looking at table **, one can see that people in the treatment groups were almost ** times as likely to be placed on mechanical ventilation as people in the control group, even though the covariates they used seem balanced across groups. As tables ** in the supplementary materials show, even after matching on propensity score, there are more than twice as many people who ended up on mechanical ventilation in the treatment groups as in the control groups.

If the control and treatment groups were really comparable at the beginning, it seems very unlikely there would be such a difference between them in the proportion of people who ended up being placed on mechanical ventilation, so I think it was a huge red flag that the covariates they used weren’t sufficient to adequately control for disease severity at baseline. (Another study with a similar design published recently in the NEJM used more covariates to control for baseline disease severity and didn’t find any effect.) They should at least have tried other specifications to see if that affected the results. But they didn’t and only used propensity score matching with exactly the same covariates in a secondary analysis.

In the discussion section, when they talk about the limitations of the study, they write this extraordinary sentence: “Due to the observational study design, we cannot exclude the possibility of unmeasured confounding factors, although we have reassuringly [emphasis mine] noted consistency between the primary analysis and the propensity score matched analyses.” But propensity score matching is just a non-parametric alternative to regression, it still assumes that treatment assignment is strongly ignorable, so how could it be reassuring that unmeasured confounding factors didn’t bias the results?

I actually have an answer to that one! In causal inference from observational data, you start with the raw-data comparison, then you adjust for basic demographics, then you adjust for other available pre-treatment predictors such as pre-existing medical conditions, smoking history, etc. And then you have to worry about adjustments for the relevant pre-treatment predictors you haven’t measured. At each step you should show what your adjustment did. If adjusting for demographics doesn’t change your answer much, and adjusting for available pre-treatment predictors doesn’t change your answer much, then it’s not so unreasonable to suppose that adjustment for other, harder-to-measure, variables won’t do much either. This is standard reasoning in observational studies (see our 1990 paper, for example). I think Paul Rosenbaum has written some more formal arguments along those lines.

Lemoine continues:

Of course, those are hardly the only issues with this paper, as many commenters have noted on your blog. In particular, I think **’s analysis of table ** is pretty convincing, but as I noted in response to his comment, if he is right that it’s what the authors of the study did, what they say in the paper is extremely misleading. They should clarify what they did and, if the data in that table were indeed processed, which seems very likely, a correction should be made to explain what they did.

Frankly, I don’t expect ** to have anything other than a small effect, whether positive or negative, so I don’t really care about that issue. But I fear that, since the issue has become politicized (which is kind of crazy when you think about it), many people are unwilling to criticize this study because the conclusions are politically convenient and they don’t want to appear to side with **. I think this is very bad for science and that it’s important that post-publication peer review proceeds as it normally would.

I just wanted to encourage you to dig into the study yourself because I’m curious to know what you think. Moreover, if you agree there are serious issues with it and say that on your blog, the authors will be more likely to respond instead of ignoring those criticisms, as they have been doing so far.

My reply:

My now, enough has been said about this study that I don’t need to look into it in detail! At this point, it seems that nobody believes the published analysis or conclusion, and the main questions revolve around what the data actually are and where they came from. It’s become a pizzagate kind of thing. It’s possible that the authors will be able to pull a rabbit out of the hat and explain everything, but given their responses so far, I’m doubtful. As we’ve discussed, ** (and journals in general) have a poor record about responding to criticisms of their paper: at best, the most you’ll usually get is a letter published months after the original article, along with a bag of words by the original authors explaining how, surprise! none of their conclusions have changed in any way.

The good news about this episode is that it’s kinda shut up those people who were criticizing that Stanford antibody study because it was an un-peer-reviewed preprint. The problem with the Stanford antibody study is not that it was an un-peer-reviewed preprint; it’s that it had bad statistical analyses and the authors supplied no data or code. It easily could’ve been published in JAMA or NEJM or Lancet or whatever and had the same problems. Indeed, “Stanford” played a similar role as “Lancet” in giving the paper instant credibility. As did “Cornell” with the pizzagate papers.

As Kelsey Piper puts it, “the new, fast scientific process (and even the old, slow scientific process) can produce errors — sometimes significant ones — to make it through peer review.”

P.S. Keep sending me cat pictures, people! They make these posts soooo much more appealing.

P.P.S. As usual, I’m open to the possibility that the conclusions in the disputed paper are correct. Just because they haven’t made a convincing case and they haven’t shared their data and code and people have already found problems with their data, that doesn’t mean that their substantive conclusions are wrong. It just means they haven’t supplied strong evidence for their claims. Remember evidence and truth.

P.P.P.S. I better explain something that comes up sometimes with these Zombies posts. Why beat a dead horse? Remember Paul Alper’s dictum, “One should always beat a dead horse because the horse is never really dead.” Is it obsessive to post multiple takes on the same topic? Remember the Javert paradox. It’s still not too late for the authors to release their code and some version of their data and to respond in good faith on the pubpeer thread, also not too late for the journal to do something.

What could the journal do? For one, they could call on the authors to release their code and some version of their data and to respond in good faith on the pubpeer thread. That’s not a statement that the published paper is wrong; it’s a statement that the topic is important enough to engage the hivemind. Nobody’s perfect in design of a study or in data analysis, and it seems absolutely ludicrous for data and code to be hidden so that, out of all the 8 billion people in the world, only 4 people have access to this information from which such big conclusions are drawn. It’s kind of like how in World War 2, so much was done in such absolute secrecy that nobody but the U.S. Army and Joseph Stalin knew what was going on. Except here the enemy can’t spy on us, so secrecy serves no social benefit.