Hey! Participants in survey experiments aren’t paying attention.

Gaurav Sood writes:

Do survey respondents account for the hypothesis that they think people fielding the survey have when they respond? The answer, according to Mummolo and Peterson, is not much.

Their paper also very likely provides the reason why—people don’t pay much attention. Figure 3 provides data on manipulation checks—the proportion guessing the hypothesis being tested correctly. The change in proportion between control and treatment ranges from -.05 to .25, with a bulk of the differences in Qualtrics between 0 and .1. (In one condition, authors even offer an additional 25 cents to give an answer consistent with the hypothesis. And presumably, people need to know the hypothesis before they can answer in line with it.) The faint increase is especially noteworthy given that on average, the proportion of people in the control group who guess the hypothesis correctly—without the guessing correction—is between .25–.35 (see Appendix B).

So, the big thing we may have learned from the data is how little attention survey respondents pay. The numbers obtained here are in a similar vein to those in Appendix D of Jonathan Woon’s paper. The point is humbling and suggests that we need to: (a) invest more in measurement, and (b) have yet larger samples, which is an expensive way to overcome measurement error.

(The two fixes are things you have made before. I claim no credit. I wrote this because I don’t think I fully grasped how much noise there is on online surveys. And I think is likely useful to explore the consequences carefully.)

P.S. I think my Mturk paper gives one potential explanation for why things are so noisy—not easy to judge quality on surveys:

Here’s one relevant bit of datum from the turk paper: we got our estimates after recruiting workers “with a HIT completion rate of at least 95%.” This latter point also relates to a recent “reputation inflation” online paper.

So, if I’m understanding this correctly, Mummolo and Peterson are saying we don’t have to worry about demand effects in psychology experiments, but Sood is saying that this is just because the participants in the experiments aren’t really paying attention!

I wonder what this implies about my own research. Nothing good, I suppose.

P.S. Sood adds three things:

1. The numbers on compliance in M/P aren’t adjusted for guessing—some people doubtlessly just guessed the right answer. (We can back it out from proportion incorrect after taking out people who mark “don’t know.”)

2. This is how I [Sood] understand things: Experiments tell us the average treatment effect of what we manipulate. And the role of manipulation checks is to shed light on compliance.

If conveying experimenter demand clearly and loudly is a goal, then the experiments included probably failed. If the purpose was to know whether clear but not very loud cues about “demand” matter—and for what it’s worth, I think it is a very reasonable goal; pushing further, in my mind, would have reduced the experiment to a tautology–—the paper provides the answer. (But your reading is correct.)

3. The key point that I took from the experiment, Woon, etc. was still just about how little attention people pay on online surveys. And compliance estimates in M/P tell us something about the amount of attention people pay because compliance in their case = reading something—simple and brief—that they quiz you on later.

Tomorrow’s post: To do: Construct a build-your-own-relevant-statistics-class kit.