Coronavirus model update: Background, assumptions, and room for improvement

https://statmodeling.stat.columbia.edu/2020/03/09/coronavirus-model-update-background-assumptions-and-room-for-improvement/

Julien Riou, coauthor of one of the models we discussed here, writes:

Here is an overview of the current state of the project, so that it is easier for everyone to quickly grasp what is the potential room for improvement.

Background on the epidemic: COVID-19 just passed 100,000 confirmed cases all over the world, and is expected to continue to spread. Without strong control measures or a lucky turn of events (such as a lower transmission during summer months), a significant portion of humanity is going to get infected in the next few months (Marc Lipsitch from Harvard, one of the most respected scientists of the field, suggests than 20 to 50% of all adults may become infected and pretty much everyone in the field agrees). Some countries implemented strong control measures (China, Singapore) and are actually controlling the epidemic, so there is still hope to win enough time to develop a vaccine, which would be the best scenario.

Background on mortality: Most reports about the mortality (including WHO) only consider the crude case fatality ratio (CFR), i.e. the number of deaths observed divided by the number of cases observed until a given date.

This estimate is biased in two ways:

– the number of deaths is underestimated because of the delay between disease onset and death (up to 2 months)

– the number of cases is underestimated because of asymptomatics and mild cases.
WHO says 3.6%, there are other estimates taking other denominators, but there is a lack of clarity on this question. This is problematic as politicians and public health authorities need to decide on strong control measures, and the debate keeps getting polluted by contradicting claims (from “it’s just a flu” to panic-inducing claims).

Objective: To develop a model to estimate the true mortality, i.e. the proportion of people infected with or without symptoms that will die, stratified by age. The model will be applied to data from China first, then to other countries when it becomes available.

Impact: Obtaining a single, rock-solid estimate of the true mortality would be extremely important to support and direct the efforts of public health authorities.

Data: We started from data released by the Chinese CDC (enclosed), analysing data from the Hubei province (where the epidemic started in December). We got the incidence of confirmed cases by day of disease onset (first day of symptoms, see Fig A below) and the age distribution of cases and deaths in China, which is different from the age distribution of the Chinese population (see Fig B).

This age shift in cases can have several explanations:

1) younger people have prior immunity to COVID-19 (unlikely as it is a new disease)

2) younger people have cross immunity by being infected by other coronaviruses (very few data on that, and why would it only concern young people)

3) younger people have less contact with the potential infectors (possible but partial explanation, see Fig 1C)

4) younger people are infected the same but have less symptoms, do not seek care and are not identified (more likely)

Analysis plan: In this analysis, we want to consider only 3) and 4), and try to estimate the total size of the population. We also account for the asymptomatics and for the delay in mortality using external data. For that we need to do some kind of poststratification, backed by an epidemic model that generates infections by date of disease onset, which is important when accounting for the delay. All code and data are in https://github.com/jriou/covid_adjusted_cfr.

Current state: We published preliminary results in a preprint (here), because we were confortable with the results at this point, and wanted to put the idea out. But there is still room for improvement. In this version (model10) we did not consider the different contacts by age class. Since then we added the contact structure, and also an alternative source for the symptomatic rate (with uncertainty). This is model12 on github. Enclosed to this email (supplementary.pdf) you will find a better description of model12 (still a bit rough at this point). I don’t have the results of model12 yet (almost finished, it took 3 days on the cluster at the University of Bern).

—————————————————–

Bottlenecks and room for improvement: The model relies on many assumptions listed below. It would be good to be able to relax some of these assumptions, or find more data or ideas to circumvent them. There are also potential improvements, also listed.

1) We assume that there is no prior immunity to COVID-19 due to cross reaction to other coronaviruses. I did not find any good data on that, but an alternative strategy may be to estimate what kind of prior immunity would explain the age shift, and see if it is conceivable.

2) We assume that about 80% of cases are symptomatic (comes from Bi et al, enclosed), without any trend by age. Only symptomatic people can be recognised as cases (with an age-dependent reporting rate) and can die (with an age-dependent mortality). It means that we assume that the probability of “no symptoms at all” is the same in each age group, but still younger individuals that have symptoms have milder ones, which leads to less reporting and less deaths. Better data on that would be good.

3) In order to identify the model we need to set the reporting rate for the older group to 100% (like an upper bound). This is not too far fetched (as we can imagine that in this context, old people with fever or cough will get tested very quickly), but still an assumption.

4) Mortality is also higher in people with comorbidities (same Chinese CDC paper):

It would be nice to stratify by comorbidity like we did for the age groups, but I worry about computation time and identifiability. And we would need the proper data in China.

5) We include a time-dependent parameter for transmission (to show the decrease in transmission after the control measures), but mortality is not time-dependent in the model. Neither is included the saturation of the hospitals, which could lead to increased mortality at the peak of the epidemic. These are obvious limitations, but I’m not sure if we can address them here.

6) It would be good to apply the model to other contexts (South Korea, Japan, Iran, Italy), but I didn’t find the appropriate data yet (time series of cases and deaths + age distributions). We could also apply our age-specific estimates to other age distributions in countries around the world.

Again, the data, code, and research paper by Riou et al. are all posted, so anyone can get involved here.