Coronavirus age-specific fatality ratio, estimated using Stan, and (attempting) to account for underreporting of cases and the time delay to death

Julien Riou writes:

Stan epidemiologist here. We actually just released a preprint [estimating death rates of people infected with coronavirus, breaking down the population by age and then poststratifying] using Stan (

Crude estimates of case fatality ratio obtained by dividing observed deaths by observed cases are biased in two ways:

1) Deaths are underestimated because of the delay between disease onset and death (right censoring);

2) Total cases are underestimated because surveillance efforts focus on severe cases and miss asymptomatic and mild cases.

We attempted to correct for both these biases using data from China and a few assumptions. It might still need some refinement though, happy to hear any comment.

And here’s the paper by Julien Rieu, Anthony Hauser, Michel Counotte, and Christian Althaus:

We [Riou et al.] estimated the age-specific case fatality ratio (CFR) by fitting a transmission model to data from China, accounting for underreporting of cases and the time delay to death. . . . We find that 1.6% (1.4-1.8) of individuals infected with COVID-19 [in Hubei between 1 Jan and 11 Feb] with or without symptoms died or will die, with even more important differences by age group than suggested by the raw data. The probability of death among infected individuals with symptoms is estimated at 3.3% (2.9-3.8), with a steep increase over 60 years old to reach 36% over 80 years old.

The narrowness of these intervals implies that these are inferential uncertainties conditional on the model and do not account for uncertainty in the model itself.

Here are some more graphs from the paper:

Strengths of the analysis

Here’s how Rieu et al. describe the strengths of their work:

(1) We use a mechanistic model for the transmission of and the mortality associated with COVID-19 that is a direct translation of the data-generating mechanisms leading to the biased observations of the number of deaths (because of right-censoring) and of cases (because of surveillance bias). Our model also accounts for the effect of control measures on disease transmission. (2) Our model is stratified by age group, which has been shown as a crucial feature for modelling emerging respiratory infections [16]. (3) The estimates rely on routinely collected surveillance data such as incident cases by disease onset, incidence deaths, and the age distribution of cases and deaths, and does not require individual-level data nor studies in the general population.


The paper continues with a list of limitations:

(1) Our results depend on the central assumption that the cause of the deficit of reported cases among younger age groups is a surveillance bias and does not reflect a lower risk of infection in younger individuals. The reason for this age shift is unknown [10]. Retrospective testing for COVID-19 of samples from influenza-like-illness surveillance found no positive test among children, but the sample sizes were small (20 per week including both adults and children) [10]. Uneven age distributions in the risk of infection can be attributed to immunological features, such as the lower circulation of H1N1 influenza in older individuals due to residual immunity [17]. An immunological explanation of the opposite phenomenon, with a lower susceptibility of younger individuals, seems unlikely, and there is no indication of pre-existing immunity to COVID-19 in humans [10]. Different contact patterns could play a role in a limited outbreak, but not in such a widespread infection, especially as household transmission seems to play a major role [10]. The last explanation that we assume here is that younger individuals, when symptomatic, have milder symptoms that decrease the probability of seeking care and being identified.

(2) In a related matter, our results depend on the assumption that older individuals have more severe symptoms and are more likely to be identified. In the absence of an outside reference point, the reporting rate cannot be estimated from surveillance data only. We chose to fix to 100% the reporting rate of infected individuals that have symptoms and are aged 80 and more, and estimate the reporting rates in other age groups relatively to that of older individuals. If further data, coming from a study in the general population, shows that this assumption is violated, this would lead to an overestimation of the CFR in our study.

(3) There is important uncertainty around the proportion of asymptomatic infections. Currently, the detection of asymptomatic patients in China is limited by the focus on symptomatic patients seeking care and the lack of seroprevalence data [18]. The proportion of symptomatic infections has been estimated to 58% (95% confidence interval: 33-83) in a small sample of cases exported to Japan [19]. During the outbreak on the ship “Diamond Princess”, nearly all individuals were tested regardless of symptoms, leading to an average proportion of symptomatic infections of 49% in a sample size of 619, which was used in the present study [13]. Still, uncertainty about the proportion of symptomatic infections will remain until a large retrospective seroprevalence study is conducted in the general population, and our results are dependent on this estimate. Additionally, the dichotomization of infection into asymptomatic and symptomatic is a simplification of reality; the infection with SARS-CoV-2, will likely cause a gradient of symptoms in different individuals depending on age, sex and comorbidities [10]. The proportion of asymptomatic infections might show an age-dependent structure.

(4) Our findings regarding the CFR are specific to the context, and should be interpreted in that light. The findings describe the situation in Hubei from 1 January to 11 February, 2020. It was demonstrated there, that mortality rates have changed over time as a result of an improvement of the standard of care [10]. The standard of care and, as a result, the CFR is setting-dependent and cannot be directly applied to other contexts.

I have not read this paper carefully or tried to evaluate their model or their claims.