Estimates of the severity of COVID-19 disease: another Bayesian model with poststratification

Following up on our discussions here and here of poststratified models of coronavirus risk, Jon Zelner writes:

Here’s a paper [by Robert Verity et al.] that I think shows what could be done with an MRP approach.

From the abstract:

We used individual-case data from mainland China and cases detected outside mainland China to estimate the time between onset of symptoms and outcome (death or discharge from hospital). We next obtained age-stratified estimates of the CFR by relating the aggregate distribution of cases by dates of onset to the observed cumulative deaths in China, assuming a constant attack rate by age and adjusting for the demography of the population, and age- and location-based under-ascertainment. We additionally estimated the CFR from individual line-list data on 1,334 cases identified outside mainland China. We used data on the PCR prevalence in international residents repatriated from China at the end of January 2020 to obtain age-stratified estimates of the infection fatality ratio (IFR). Using data on age-stratified severity in a subset of 3,665 cases from China, we estimated the proportion of infections that will likely require hospitalisation.

And here’s what they found:

We estimate the mean duration from onset-of-symptoms to death to be 18 days and from onset-of-symptoms to hospital discharge to be 23 days. We estimate a crude CFR of 3.7% in cases from mainland China. Adjusting for demography and under-ascertainment of milder cases in Wuhan relative to the rest of China, we obtain a best estimate of the CFR in China of 1.4%) with substantially higher values in older ages. Our estimate of the CFR from international cases stratified by age (under 60 / 60 and above) are consistent with these estimates from China. We obtain an overall IFR estimate for China of 0.7%, again with an increasing profile with age.

I edited the above paragraph by rounding all numbers and removing the 95% intervals. The intervals are model-based and look way too narrow compared to actual uncertainty. For example, their estimate of the mean duration from onset to death is 17.8 days with a 95% interval of 16.9–19.2 days. There’s no way they can know this so precisely. And their estimate of the crude CFR from mainland China is 3.67% with 95% interval of 3.56%-3.80%. Again, this interval is too narrow to tell us anything. With an interval so narrow, we might as well just take the point estimate.

I’ve not looked at the substance of their data or model but I did notice they used Bayesian inference, which I think is a good idea given the need to integrate different data sources when studying this problem. I’m pretty sure they could fit their model in Stan, which would be a good idea as it would allow them to incorporate more structure in the model without requiring onerous programming effort to fit it. Also I skimmed through the paper and have some issues with their prior distributions (compare to general principles here). But that’s fine, there’s room for improvement. This and other models will need to be re-fit with new data in any case.

There’s some literature on this problem. For example, Verity et al. cite a 2005 article by A. C. Ghani et al., “Methods for Estimating the Case Fatality Ratio for a Novel, Emerging Infectious Disease,” in the American Journal of Epidemiology. That earlier work is non-Bayesian, though, which will create challenges if you’re dealing with sparse data or trying to combine multiple sources of information.