Emma Pierson writes:

My two sisters and I, with my friend Jacob Steinhardt, spent the last several days looking at the statistical methodology in a paper which has achieved a lot of press – Incarceration and Its Disseminations: COVID-19 Pandemic Lessons From Chicago’s Cook County Jail (results in supplement), published in Health Affairs. (Here’s the New York Times op-ed one of the authors published this weekend.) The central finding in the paper, quoting the abstract, is that community infections from people released from a single jail in Illinois in March are “associated with 15.7 percent of all documented novel coronavirus disease (COVID-19) cases in Illinois and 15.9 percent in Chicago as of April 19, 2020”. From the New York Times op-ed, “Roughly one in six of all cases in the city and state were linked to people who were jailed and released from this single jail”. On the basis of this claim, both the paper and the op-ed make a bunch of policy recommendations in the interest of public health – eg, reducing unnecessary arrests.

To be clear, we largely agree with these policy recommendations – and separate from this paper, there’s been a lot of good work documenting the dire COVID situation in jails and prisons. Mass incarceration was a public health emergency even before the COVID-19 pandemic, and tens of thousands of people have now contracted coronavirus in overcrowded jails and prisons. The Cook County Jail was identified as “the largest-known source of coronavirus infections in the U.S.” in April. Many incarcerated individuals have died after being denied appropriate medical care.

However, we also feel that the statistical methodology in the paper is sufficiently flawed that it does not provide strong evidence to back up its policy recommendations, or much evidence at all about the effect of jail releases on community infections.

We are concerned both about the statistical methods it uses and the effect sizes it estimates. We have shared our concerns with the first author, who helpfully shared his data and thoughts but did not persuade us that any of the concerns below are invalid. Given the high-profile nature of the paper, we thought the statistics community would benefit from open discussion. Depending on what you and your readers think we may reach out to the journal editors as well.

The analysis relies on multivariate regression. It regresses COVID cases in each zip code on the number of inmates released from the Cook County Jail to that zip code, adjusting for variables which include the number or proportion of Black residents, poverty rate, public transit utilization rate, and population density. A number of these variables are highly correlated: for example, the correlation between the number of Black residents in a zip code and the number of inmates released in March is 0.86 in the full Illinois sample (and 0.84 in Chicago zip codes). The results in the paper testify to the dangers of multivariate regression on highly correlated variables: the signs on several regression coefficients (eg, public transit utilization) flip from positive in the bivariate regressions (Supplemental Exhibit 2) to negative in the multivariate regressions (Supplemental Exhibit 3). If the regression coefficients do in fact have causal interpretations, we should infer that if more people use public transit, COVID cases will decrease, which doesn’t seem plausible.

Given the small samples (50 zip codes in Chicago, and 355 overall) and highly correlated variables, it is unsurprising that the results in the paper are not robust across alternate plausible specifications. Here are two examples of this.

First, we examined how the effect size estimate (ie, the coefficient on how many inmates were released in March) varied depending on which controls were included (assessing all subsets of the original controls used in the paper) and which subset of the dataset was used (Chicago or the full sample). We found that the primary effect estimate varied by a factor of nearly 5 across specifications, and that for some specifications, was no longer statistically significant. Similarly, in Appendix Exhibit 2 (top table, Chicago estimate) the paper shows that including all zip codes rather than just those with at least 5 cases (which adds only 3 additional zipcodes) renders the paper’s effect estimate no longer statistically significant.

Second, the results are not robust when fitting other standard regression models which account for overdispersion. As a robustness check, the paper fits a Poisson model to the case count data (Appendix Exhibit 4). Standard checks for overdispersion, like the Pearson 𝛘2 statistic from the Poisson regression, or fitting a quasipoisson model, imply the data is overdispersed. So we refit the Poisson model used in Appendix Exhibit 4 on the Chicago sample using three methods which are more robust to overdispersion: (1) we bootstrapped the confidence intervals on the regression estimate from the original Poisson model; (2) we fit a quasipoisson model, and (3) we fit a negative binomial model. All three of these methods yield confidence intervals which are much wider than the original Poisson confidence intervals, and which overlap zero. (For consistency with the original paper, we performed all these regressions using the same covariates used in Appendix Exhibit 4 for the original Poisson regression. We’re pretty sure that setup is non-standard, however, or at least it doesn’t seem to agree with the way you do it in your stop-and-frisk paper—it models cases as exponential in population, rather than using population as an offset term—so we also perform an alternate regression using a more standard covariate setup. Both methods yield similar conclusions.) Overall this analysis implies that, when overdispersion in the data is correctly modeled, the Chicago sample results are no longer statistically significant. Even in the full sample, the results are often not statistically significant depending on what specification is used.

To be clear, I would remain skeptical of the basic empirical strategy in the paper even if the sample were ten times as large, and none of the above applied. But I’m curious about your and your readers’ thoughts on this – as well as alternate strategies for estimating the number of community infections jail releases cause.

In addition to the methodological concerns summarized above, the effect sizes estimated in the paper seem somewhat unlikely. The paper estimates that each person released from the Cook County Jail results in approximately two additional reported cases in the zip code they are released to. Two facts make us question this finding. First, the CDC estimates that cases are underreported by a factor of ten, so to cause two reported infections, the average released person would have to cause twenty infections. Second, not everyone who left Cook County Jail was infected. Positive test rates at Cook County Jail are 11%; the overall fraction of inmates who were infected is likely lower, since it is probable that individuals with COVID-19 were more likely to be tested, but we use 11% as a reasonable upper bound on the fraction of released people who were infected. Combining these two facts, in order for the average person released to cause two reported cases, the averaged infected person released in March would have to cause nearly two hundred cases by April 19 (when the paper sources its case counts). This isn’t impossible — there’s a lot of uncertainty about the reproductive number, the degree of case underreporting, and the fact that not all detainees are included in the booking dataset. Still, we coded up a quick simulation based on estimates of the reproductive number in Illinois, and it seems somewhat unlikely.

My reply:

There are three things going on here: language, statistics, and policy.

*Language.* In the article, the authors say, “Although we cannot infer causality, it is possible that, as arrested individuals are exposed to high-risk spaces for infection in jails and then later released to their communities, the criminal justice system is turning them into potential disease vectors for their families, neighbors, and, ultimately, the general public.” And in the op-ed: “for each person cycled through Cook County Jail, our research shows that an additional 2.149 cases of Covid-19 appeared in their ZIP code within three to four weeks after the inmate’s discharge.” I appreciate their careful avoidance of causal language. But “2.149”? That’s ridiculous? Why not just say 2.14923892348901827390823? Even seeing aside all identification issues, you could never estimate this parameter to three decimal places. Indeed, the precision of that number is immediately destroyed by the vague “three to four weeks” that follows it. You might as well say that a dollop of whipped cream ways 2.149 grams.

But I don’t like this bit in the op-ed: “Roughly one in six of all cases in the city and state were linked to people who were jailed and released from this single jail, according to data through April 19.” All the analysis is at the aggregate level. Cases have not been “linked to people” at the jail at all!

I do not mean to single out the authors of this particular article. The real message is that it’s hard to be precise when describing what you’ve learned from data. These authors were so careful in almost everything they wrote, but even so, they slipped up at one point!

*Statistics.* The supplementary material has a scatterplot of the data from the 50 zip codes in Chicago that had 5 or more coronavirus cases during this period:

First off, I’m not clear why the “Inmates released” variable is sometimes negative. I guess that represents some sort of recoding or standardaztion, but in that case I’d prefer to see the raw variable in the graph.

As to the rest of the graphs: the key result is that the rate of coronavirus cases is correlated with the rate of inmate releases but *not* correlated with poverty rate. That seems kinda surprising.

So it seems that the authors of this paper did find something interesting. If I’d been writing the article, I would’ve framed it that way: We found this surprising pattern in the zip code data, and here are some possible explanations. Next logical step is to look at data from other cities. Also it seems that we should try to understand better the selection bias in who gets tested.

Unfortunately, the usual way to write a research article is to frame it in terms of a conclusion and then to defend that conclusion against all comers. Hence robustness checks and all the rest. That’s too bad. I’d rather frame this as an exploratory analysis than as an attempt at a definitive causal inference.

My take-home point is not that this article is bad, but rather that they saw something interesting that it would be worth tracking down, comparing to other cities and also thinking about the selection bias in who got tested.

*Policy.* Unlike the journal Psychological Science, I support criminal justice reform. I’m in agreement with the op-ed that we should reduce the number of people in jail and prison. I’d say that whatever the story is regarding these coronavirus numbers.