Jake Hofman, Dan Goldstein, and Jessica Hullman write:

Scientists presenting experimental results often choose to display either inferential uncertainty (e.g., uncertainty in the estimate of a population mean) or outcome uncertainty (e.g., variation of outcomes around that mean). How does this choice impact readers’ beliefs about the size of treatment effects? We investigate this question in two experiments comparing 95% confidence intervals (means and standard errors) to 95% prediction intervals (means and standard deviations). The firstexperiment finds that participants are willing to pay more forand overestimate the effect of a treatment when shown confidence intervals relative to prediction intervals. The second experiment evaluates how alternative visualizations compare to standard visualizations for different effect sizes. We find that axis rescaling reduces error, but not as well as prediction intervals or animated hypothetical outcome plots (HOPs), and that depicting inferential uncertainty causes participants to underestimate variability in individual outcomes.

These results make sense. Sometimes I try to make this point by distinguishing between *uncertainty* and *variation*. I’ve always thought these two concepts were conceptually distinct (we can speak of uncertainty in the estimate of a population average, or variation across the population), but then I started quizzing students, and I learned that, to them, “uncertainty” and ‘variation” were not distinct concepts. Part of this is wording—there’s an idea that these two words are roughly synonyms—but I think part of it is that most people don’t think of these as being two different ideas. And if lots of students don’t get this distinction, it’s no surprise that researchers and consumers of research also get stuck on this.

I’m reminded of the example from a few months ago where someone published a paper including graphs that revealed the sensitivity of its headline conclusions on some implausible assumptions. The question then arose: what if the paper had not included the graph, then maybe no one would’ve realized the problem. I argued that, had the graph not been there, I would’ve wanted to see the data. But a lot of people would just accept the estimate and standard error and not want to know more.