“Estimating Covid-19 prevalence using Singapore and Taiwan”


Jacob Steinhardt writes:

I wanted to share some applied statistical modeling that you and your readers might enjoy. I took a break from machine learning research for the past week to do some applied statistical modeling, in particular trying to correct for underreporting due to insufficient testing in some countries. My overall conclusion is that in most European countries, backing out the number of cases from the mortality data is reasonably reliable, but there’s other countries where it’s less reliable and the reported deaths may substantially underestimate the actual deaths.

Of course, my analysis also relies on assumptions, many of which are obviously incorrect. But it’s a different set of incorrect assumptions than taking the reported deaths as given, so together these can help start to paint a clearer picture. And hopefully more analyses and more data later will continue to improve our understanding.

The full blog post is here, and you can also find the underlying data here or even rawer data on github.

I haven’t read this in detail, but I’m forwarding in case it interests some of you. My only quick comment on the analysis is I think you should just about never use the Poisson model. Always use overdispersed Poisson. Also I recommend you fit any model in Stan, as it’s flexible so you can expand it in various ways, include new data, etc., the usual story.