(Some) forecasting for COVID-19 has failed: a discussion of Taleb and Ioannidis et al.

Nassim Taleb points us to this pair of papers:

On single point forecasts for fat tailed variables, by Nassim Taleb

Forecasting for COVID-19 has failed, by John Ioannidis, Sally Cripps, and Martin Tanner

The two articles agree in their mistrust of media-certified experts.

Here’s Taleb:

Both forecasters and their critics are wrong: At the onset of the COVID-19 pandemic, many researcher groups and agencies produced single point “forecasts” for the pandemic — most relied on the compartmental SIR model, sometimes supplemented with cellular automata. The prevailing idea is that producing a numerical estimate is how science is done, and how science-informed decision-making ought to be done: bean counters producing precise numbers.

Well, no. That’s not how “science is done”, at least in this domain, and that’s not how informed decision-making ought to be done. Furthermore, subsequently, many criticized the predictions because these did not play out (no surprise there). This is also wrong. Both forecasters (who missed) and their critics were wrong — and the forecasters would have been wrong even if they got the prediction right. . . .

Here are Ioannidis et al.:

COVID-19 is a major acute crisis with unpredictable consequences. Many scientists have struggled to make forecasts about its impact. However, despite involving many excellent modelers, best intentions, and highly sophisticated tools, forecasting efforts have largely failed. . . . Despite these obvious failures, epidemic forecasting continued to thrive, perhaps because vastly erroneous predictions typically lacked serious consequences. . . .

But Taleb makes a point that the others miss:

What’s relevant is the distributional forecast, not the point forecast. This came up last month, when political columnist John Fund criticized Imperial College epidemiologist Neil Ferguson for getting a bunch of forecasts wrong—but it turned out that what Fund was doing was taking the upper bounds of Ferguson’s forecasts of past public health crises and pointing out that they were overestimates of the actual number of deaths in each case. That was just wrong on Fund’s part: it’s the nature of upper bounds that they will generally too high.

Ioannidis et al. do show problems with the distributional forecasts from the University of Washington IHME, and indeed Cripps and Tanner, the two coauthors of the recent article with Ioannidis, are also coauthors of an earlier paper pointing out problems with the IHME forecasts. The IHME model has lots of problems, and I don’t think it’s right to take its flaws as representative of more serious statistical models for epidemic progression. It’s more fair to take this as a flaw of the news media for sometimes presenting IHME forecasts uncritically.

One thing that bothers me about the Ioannidis et al. article is that it does not at all address the previous statistical failures of Ioannidis’s work in this area.

1. In their above-linked 11 June article, Ioannidis et al. write:

Despite these obvious failures, epidemic forecasting continued to thrive, perhaps because vastly erroneous predictions typically lacked serious consequences. Actually, erroneous predictions may have been even useful. . . .

But just two months earlier, on 9 Apr, Ioannidis said:

If I were to make an informed estimate based on the limited testing data we have, I would say that covid-19 will result in fewer than 40,000 deaths this season in the USA.

That’s fine—Ioannidis was careful at the time to condition his estimate based on the limitations of available data, and you can learn a lot in two months. Still, 40,000 was an erroneous prediction, and at the very least this error should cause him to reassess his assumptions. And if he’s gonna write about erroneous predictions, he could mention his own.

2. In their above-linked 11 June article, Ioannidis et al. list the following problems with forecasts: “Wrong assumptions in the modeling,” “High sensitivity of estimates,” “Lack of incorporation of epidemiological features,” “Lack of transparency,” “Errors”, “Lack of expertise in crucial disciplines,” “Groupthink and bandwagon effects,” and “Selective reporting.”

All of these problems arose with the much discussed Stanford antibody study. Ioannidis was only the 16th of 17 authors on this study, so I’m not blaming him for the wrong assumptions, lack of transparency, errors, etc., but I don’t think he’s disavowed that paper either. My point is not to use this as a “gotcha” on Ioannidis but rather to say that it’s hard to know what to make of these criticisms given that they all apply to work that he stands by. Maybe his article should’ve been titled, “Our forecasting for COVID-19 has failed,” and he could’ve criticized the errors and lack of expertise in that Stanford study.

I don’t think the Imperial College models (for example, here) are so flawed. They’re not perfect—wrong assumptions and high sensitivity are unavoidable!—but they are transparent and I think they’re a way forward. Full disclosure: I’ve worked with the first author of that paper, and my colleagues and I helped him with some of the modeling.

The sad truth, I’m afraid, is that Taleb is right: point forecasts are close to useless, and distributional forecasts are really hard. We have to try our best and use all available resources.

P.S. Maybe we could get law professor Richard Epstein to weigh in on this one. He’s the real expert here.