bla bla bla PEER REVIEW bla bla bla

OK, I’ve been saying this over the phone to a bunch of journalists during the past month so I might as well share it with all of you . . .

1. The peers . . .

The problem with peer review is the peers. Who are “the peers” of four M.D.’s writing up an observational study? Four more M.D.’s who know just as little as the topic. Who are “the peers” of a sociologist who likes to bullshit about evolutionary psychology but who doesn’t know much about the statistics of sex ratios? Other sociologists who like to bullshit about evolutionary psychology but who don’t know much about the statistics of sex ratios. Who are “the peers” of a couple of psychologists who like to imagine that hormonal changes will induce huge, previously undetected changes in political attitudes, and who think this can be detected using a between-person study of a small and nonrepresentative sample? That’s right, another couple of psychologists who like to imagine that hormonal changes will induce huge, previously undetected changes in political attitudes, and who think this can be detected using a between-person study of a small and nonrepresentative sample. Who are “the peers” of a contrarian economist who likes to make bold pronouncements based on almost no data, and whose conclusions don’t change even when people keep pointing out errors in his data? That’s right, other economists who like to make bold pronouncements based on almost no data, and whose conclusions don’t change even when people keep pointing out errors in their data. Who are “the peers” of a wacky business-school professor who cares more about cool experiments than data management and who doesn’t seem to mind if the numbers in his tables don’t add up? Yup, it’s other business-school professors who care more about cool experiments than data management and who don’t seem to mind if the numbers in their tables don’t add up? Who are “the peers” of fake authors of postmodern gibberish? Actual authors of postmodern gibberish, of course.

I think you get the idea.

Peer review is fine for what it is—it tells you that a paper is up to standard in its subfield. Peer reviewers can catch missing references in the literature review. That can be helpful! But if peer review catches anything that the original authors didn’t understand . . . well, that’s just lucky. You certainly can’t expect it.

So, when the editor of Lancet writes:

And when an M.D. who doesn’t know poop about data or statistics but is willing to write this:

I just think they don’t know what they’re talking about.

2. I get it . . .

I get why you like peer review even if you’re not one of the winners, even if you haven’t directly benefited from the peer-review system the way the above people have. The reason you like “peer review” is that it seems better than two alternatives: (1) “political review” and (2) “pal review.” For all its flaws, peer review is (usually) about the quality of the paper, not about politics, logrolling, trading of favors, etc. Sure, this sometimes happens—sometimes a journal editor will print flat-out lies because he’s friends with the author of an article—but peer review, with its layers of anonymity, really can allow papers by outsiders to get accepted and papers by insiders to get rejected. Not always—some politics remains—but I see the appeal of peer review as a preferable alternative to a pure old boys’ network.

But . . .

3. The actual alternative to peer review is . . .

Instead of thinking of the alternative to peer review as backroom politics, think of the alternative to peer review as post-publication review, which, in addition to all its other benefits (most notably, you can get out of the circle of ignorance of “the peers”), has the benefit of efficiency.

4. All the papers that don’t get retracted . . .

One problem with the attention given to the fatally flawed papers that get retracted is that we forget all the fatally flawed papers that aren’t retracted.

For example, from that long paragraph above:

– The beauty-and-sex-ratio paper: Not retracted. 58 citations. Endorsement from Freakonomics never rescinded.

– The ovulation-and-voting paper: Not retracted. 107 citations. Fortunately this one never got taken seriously by the news media.

– The gremlins paper. Not retracted: 1064 citations. Might still be influencing some policy debates.

Etc etc etc.

And, if we just want to look at papers published in Lancet:

– That seriously flawed Iraq survey: Never retracted. 779 citations. Came up in policy debates.

– That hopeless-from-the-start paper on gun control based on an unregularized regression with 50 data points, 25+ predictors, and a bunch of ridiculous conclusions: OK, this one had only 75 citations and, fortunately, it was blasted in the press when it came out. So, too bad that something like 75 people thought this paper was worth citing (and, no, a quick glance at the citations suggests that these are not 75 papers using this as an example to show people how not to do policy analysis). As I wrote the other day, the useless study is included in a meta-analysis published in JAMA—and one of the authors of that meta-analysis is the person who said he did not believe the Lancet paper when it came out! But now it’s in the literature now and it’s not going away.

5. So . . .

When the editor of Lancet bragged about his journal’s peer-review process, he didn’t say, “Yeah, but our bad about that vaccine denial and the Iraq survey and the gun control analysis. For political reasons we can’t actually retract these papers, but we’ll try to do better next time.” No, he didn’t mention those articles at all.

6. A statistical argument . . .

It might be that everything I’m saying here is wrong. Not wrong in the details—I’m pretty sure these particular articles are fatally flawed—but wrong in the conclusions I’m implicitly drawing. After all, Lancet publishes, oh, I dunno, 1000 papers a year? 2000 maybe? Even if it’s just 1000, then I’m saying that they published (and didn’t retract) 2 bad papers in the past 15 years? That’s a failure rate of 2/15000. Even if I’ve missed a few and there are 2 fatally flawed papers a year, that’s still only a 0.1% failure rate, which would be not bad at all! My own failure rate (as measured by the number of papers I’ve had to issue major corrections for, divided by the total number of papers I’ve published) is about 1%. So here I am criticizing Lancet for being maybe 10 times more reliable than I am!

I don’t know. I really don’t know. The only way to really get a bead on this, I think, would be to take a random sample of papers from the journal and carefully review them. I see bad papers because people send me bad papers. Sometimes they send me good papers too, but I probably see a lot of the worst.

How bad are things? Back in 2013 or so, I think Psychological Science was really bad. I remember looking at their This Week in Psychological Science feature a few times, and it seemed that more than half the papers they were featuring were junk science; see slides 14-16 of this presentation. I didn’t do a careful survey, and maybe I just caught the journal at a bad time, but it really seemed that they had no control over what they were publishing. The cargo cult scientists had basically hacked the system. They’d figured out the cheat codes and were driving their Pac-Men all over the board.

Lancet can’t be that bad.

I’ll say this, though. It may well be that 99%, or 90%, of Lancet articles are just fine, in the sense that their flaws, such as they are (and just about no research paper is flawless) are not overwhelming, so that the articles represent real contributions to science (including negative contributions such as, “This treatment does not do much”). If so, great. But it may well be that 99%, or 90%, of Medrxiv articles are just fine too. I just don’t know.

7. Half-full or half-empty . . .

So, the peer-review system is either the last bastion protecting us from a revised old boys’ network, or a waste of time and resources that could better be spent on post-publication review. It’s either an efficient if imperfect tool for sifting through millions of research articles published each year, or an absolute disaster. Probably it’s both.

I don’t know what to think. Consider computer science. They mostly seem to have abandoned journals; instead they have something like 10 major conferences a year, and the idea is to publish a bunch of papers in each conference. Getting published in a conference proceedings is competitive, but it’s different than publishing in a journal. Or maybe it’s more like publishing in a medical journal, I’m not sure. Hype seems to be important. You gotta show that your method is an order of magnitude better than the alternatives. Which is tough: how can you improve performance by an order of magnitude, 10 times a year? But people manage to do it, I guess using the same methods of hype that led the other Janes Watson to think in 1998 that cancer was two years away from being cured.

8. Again . . .

Surgisphere appears to be the Theranos, or possibly the Cornell Food and Brand Lab, of medical research, and Lancet is a serial enabler of research fraud (see this news article by Michael Hiltzik), and it’s easy to focus on that. But remember all the crappy papers these journals publish that don’t get retracted, cos they’re not fraudulent, they’re just crappy. Retracting papers just cos they’re crappy—no fraud, they’re just bad science—I think that’s never ever ever gonna happen. Retraction is taken as some kind of personal punishment meted out to an author and a journal. This frustrates me to no end. What’s important is the science, not the author. But it’s not happening. So, when we hear about glamorous/seedy stories of fraud, remember the bad research, the research that’s not evilicious but just incompetent, maybe never even had a chance of working. That stuff will stay in the published literature forever, and journals love publishing it.

As we say in statistics, the shitty is the enemy of the good.

9. Open code, open data, open review . . .

So, you knew I’d get to this…

Just remember, honesty and transparency are not enuf. Open data and code don’t mean your work is any good. A preregistered study can be a waste of time. The point of open data and code is not that it makes it easier to do post-publication review. If you’re open, it makes it easier for other people to find flaws in your work. And that’s a good thing.

An egg is just a chicken’s way of making another egg.

And the point of science and policy analysis is not to build beautiful careers. The purpose is to learn about and improve the world.