I (inadvertently) misrepresented others’ research in a way that made my story sound better.

During a recent talk (I think it was this one on statistical visualization), I spent a few minutes discussing a political science experiment involving social stimuli and attitudes toward redistribution. I characterized the study as being problematic for various reasons (for background, see this post), and I remarked that you shouldn’t expect to learn much from a between-person study of 38 people in this context.

I was thinking more about this example the other day and so I went back to the original published paper to get more details on who those 38 people were. I found this table of results:

But that’s not 38 people in the active condition; it’s 38 clusters! Looking at the article more carefully, we see this:

The starting race and SES and starting petition were randomized each day, and the confederates rotated based on these starting conditions. In total, there are 74 date–time clusters across 15 days.

That’s 38 clusters in the active condition and 36 in the control.

And this, from the abstract:

Results from 2,591 solicitations . . .

Right there in the abstract! And I missed it. I thought it was N=38 (or maybe I was remembering N=36; I can’t recall) but it was actually N=2591.

There’s a big difference between 38 and 2591. It’s almost as if I didn’t know what I was talking about.

But it’s worse than that. I didn’t just make a mistake (of two orders of magnitude!). I made a mistake that fit my story. My story was that the paper in question had problems—indeed I’m skeptical of its claims, for reasons discussed in the linked post and which had nothing to do with sample size—and so it was all too easy for me to believe it had other problems.

It’s interesting to have caught myself making this mistake, and it’s easy to see how it can happen: if you get a false impression but that impression is consistent with something you already believe, you might not bother checking it, and then you mentally add it to your list of known facts.

This can also be taken as an argument against slide-free talks. I usually don’t use slides when I give talks, I like it that way, and it can go really well—just come to the New York R conference some year and you’ll see. But one advantage of slides is that with slides you have to write everything down, and if you have to write things down, you’ll check the details rather than just going by your recollection. In this case I wouldn’t’ve said the sample size was 38; I would’ve checked, I would’ve found the error, and my talk would’ve been stronger, as I could’ve made the relevant points more directly.

During the talk when I came to that example I said that I didn’t remember all the details of the study and that I was just making a general point . . . but, hey, it was 2591, not 38. Jeez.