So. I opened the newspaper today and saw this article by Roni Caryn Rabin, “Two Retractions Hurt Credibility of Peer Review.” It was about the Surgisphere scandal, which we’ve discussed a few times in this space, going from Doubts about that article claiming that hydroxychloroquine/chloroquine is killing people to How should those Lancet/Surgisphere/Harvard data have been analyzed?
The news article had this revealing bit:
In interviews with The New York Times, Dr. Richard Horton, the editor in chief of The Lancet, and Dr. Eric Rubin, editor in chief of the N.E.J.M., said that the studies should never have appeared in their journals but insisted that the review process was still working. . . .
Dr. Horton called the paper retracted by his journal a “fabrication” and “a monumental fraud.” But peer review was never intended to detect outright deceit, he said, and anyone who thinks otherwise has “a fundamental misunderstanding of what peer review is.”
“If you have an author who deliberately tries to mislead, it’s surprisingly easy for them to do so,” he said.
I hate hate hate hate hate this attitude, trying to use fraud to get himself off the hook.
It’s not about the fraud
The fraud is the least of it.
As regular readers of this blog will recall, the original criticism of the recent Lancet/Surgisphere/Harvard paper on hydro-oxy-whatever was not that the data came from a Theranos-like company that employs more adult-content models than statisticians, but rather that the data, being observational, required some adjustment to yield strong causal conclusions—and the causal adjustment reported in that article did not seem to be enough.
As James “not the racist dude who assured us that cancer would be cured by 2000” Watson wrote:
This is a retrospective study using data from 600+ hospitals in the US and elsewhere with over 96,000 patients, of whom about 15,000 received hydroxychloroquine/chloroquine (HCQ/CQ) with or without an antibiotic. The big finding is that when controlling for age, sex, race, co-morbidities and disease severity, the mortality is double in the HCQ/CQ groups (16-24% versus 9% in controls). This is a huge effect size! Not many drugs are that good at killing people. . . .
The most obvious confounder is disease severity . . . The authors say that they adjust for disease severity but actually they use just two binary variables: oxygen saturation and qSOFA score. The second one has actually been reported to be quite bad for stratifying disease severity in COVID. The biggest problem is that they include patients who received HCQ/CQ treatment up to 48 hours post admission. . . . This temporal aspect cannot be picked up a single severity measurement.
In short, seeing such huge effects really suggests that some very big confounders have not been properly adjusted for. . . .
I’m not saying that the editor of Lancet should’ve caught this. After all, he’s not a statistician, and indeed his journal has a track record of falling for statistically innumerate but politically convenient policy claims (see here for a discussion of one example).
Before giving up, Lancet doubled down
Here’s a quote from Team Lancet, five days after the problems about that hydroxychloroquine paper came out:
On May 29, Jessica Kleyn, a press officer at The Lancet journals, informed The Scientist in an emailed statement that the authors had corrected the Australian data in their paper and redone one of the tables in the supplementary information with raw data rather than the adjusted data Desai said had been shown before.
“The results and conclusions reported in the study remain unchanged,” Kleyn adds in the email. “The original full-text article will be updated on our website. The Lancet encourages scientific debate and will publish responses to the study, along with a response from the authors, in the journal in due course.”
As I wrote at the time, the real scandal is that the respected medical journal Lancet aids and abets in poor research practices by serving as a kind of shield for the authors of a questionable paper, by acting as if secret pre-publication review has more validity than open post-publication review.
Give credit where due
Unfortunately, the NYT article did not quote or even mention James Watson and Peter Ellis, the two researchers who exposed the problems with the Surgisphere study.
Watson pointed out the statistical problems and the data irregularities; Ellis did some investigation and found out that Surgisphere had no there there.
Why give credit where due? Not just out of fairness to Watson and Ellis. Also because post-publication review is what did the job. Unlike alcohol (in Homer Simpson’s famous phrase), post-publication review is the solution, but not the cause, of these particular problems.
The peer review system did not catch the clear statistical problems with the paper. Also, the peer review system did not catch the fraud.
Post-publication review caught the statistical problems and the fraud.
I’m frustrated with the NYT article because it had about a hundred paragraphs on peer review and just about nothing on post-publication review.
There was this quote from a former editor of the New England Journal of Medicine:
If outside scientists detected problems that weren’t identified by the peer reviewers, then the journals failed.
But that’s not quite right. There’s no way that peer review can even come close to post-publication review. Peer review is done by 3 or 4 insiders; post-publication review is done by thousands of outsiders. There’s no contest.
Here’s a good line
I was amused by this bit from the Times article:
Shot: Dr. Mehra is well respected in scientific circles.
Chaser: Both editors pointed out that Dr. Mehra had signed statements indicating he had access to all of the data and took responsibility for the work, as did other co-authors.
Ouch! It looks like he traded in all that respect for the chance to be first author on a Lancet paper. Maybe not the best way to spend your scientific reputation.