Uros Seljak writes:

You may be interested in our Gaussian Process counterfactual analysis of Italy mortality data that we just posted.

Our results are in a strong disagreement with the Stanford seropositive paper that appeared on Friday.

Their work was all over the news, but is completely misleading and needs to be countered: they claim infection fatality ratio as low as 0.1%, we claim it cannot be less than 0.5% in Santa Clara or NYC. This is not just a bad flu season, as people from NYC probably can confirm.

From the paper by Modi, Boehm, Ferraro, Stein, and Seljak:

We find an excess mortality that is correlated in time with the COVID-19 reported death rate time series. Our analysis shows good agreement with reported COVID-19 mortality for age

I have not read their article so you’ll have to make your own judgment on this—which I guess you should be doing, even if I *had* read the article!

In any case, this is all a tough subject to study, as we’re taking the ratios of numbers where both the numerator and denominator are uncertain.

Also, Seljak had a question about poststratification, arising from our discussion of that Stanford paper the other day. Seljak writes:

You say it is standard procedure, but I have my misgivings (but this is not my field). I understand that it makes sense when the data tell you that you have to do it, ie you have clear evidence that the infection ratio depends on these covariates. But the null hypothesis in epidemiology models is that everyone is infected at the same rate regardless of sex, race, age etc. Their data (50 positives) is too small to do regression analysis and get any signal to violate the null hypothesis. If so, and given the null hypothesis, then there is no point to post-stratify, one is just increasing the noise from importance weighting and not gaining anything. If so the crude rate of 1.5% would be the infection rate (ignoring the other issues you talk about). So by post-stratifying they are rejecting this null hypothesis, ie their underlying assumption of post-stratification is that the infection rate is not the same across these covariates. Say I know my sample is unbalanced against 10 things (age, zip code, sex, race, income, urban-suburban, it never ends). So how do I choose against which to post-stratify, when does one stop? Eg they did not do it for age, despite the large imbalance, how is this justified? Either you do it for all, and pay the price in large variance, or not at all?

What I am getting at is that post-stratification offers a lot of opportunity for p-hacking (could be subconscious of course): you stop when you like the answer, until then you try this and that. It is easy to give a posteriori story justifying it (say the answer was weird for age post-stratification, so we did not do it), but it is still p-hacking. Why is this not always a concern in these procedures, specially if the null hypothesis is saying there is no need to do anything? And in their case, this takes them from 1.5% to 2.8%.

My reply:

As the saying goes, not making a decision is itself a decision. It’s fine to take survey data and not poststratify; then instead of doing inference about the general population, you’re doing inference about people who are like those in your sample. For many purposes that can be fine. If you’re doing this, you’ll want to know what population that you’re generalizing. If you do want to make inferences to the general population, then I think there’s no choice but to poststratify (or to do some other statistical adjustment that is the equivalent of poststratification).

Regarding the point that 50 positive cases does not give you enough information to regression analysis: it’s actually 3300 cases, not 50 cases, as you poststratify on all the observations, not just the ones who test positive. And if you look at the summary data from that Santa Clara study, you’ll see some clear differences between sample and population. But, yes, even 3300 is not such a large number, so you’ll want to do regularization. That’s why we do *multilevel* regression and poststratification, not just regression and poststratification.

Regarding concerns about p-hacking: Yes, that’s an issue in the Santa Clara study. But I’d also be concerned if they’d done no adjustment at all. In general, I think the solution to this sort of p-hacking concern is to include more variables in the model and then to regularize. In this example, I’d say the same thing: poststratify not just on sex, ethnicity, and zip code but also on age, and use a multilevel model with partial pooling. I can’t quite tell how they did the poststratification in that study, but from the description in the paper I’m concerned that what they did was too noisy. Again, though, the problem is not with the poststratification but with the lack of regularization in the small-area estimates that go into the poststrat.