External vs. internal validity of causal inference from natural experiments: The example of charter school lottery studies

Alex Hoffman writes:

I recently was discussing/arguing about the value of charter schools lottery studies. I suggested that their validity was questionable because of all the data that they ignore. (1) They ignore all charter schools (and their students) that are not so oversubscribed that they need to use lotteries for admission. (2) They ignore all the students at the public school that did not apply to a lottery.

The response I received was that they may lack external validity, but that’s just because the researcher focused so much on internal validity.

What do you think that we should do with this kind of defense of supposedly policy-relevant research? Is there something I am missing; is admissions of a lack of external validity ameliorated because of stronger internal validity?

This strikes me as the same issue as how the assessment industry has so focused on maximizing reliability (alpha) that they have are not willing to give a little on alpha in exchange for greater validity. They don’t sample from the entire construct/curriculum/standards because they can’t get enough items on a test if they do, thus the construct is predictably underrepresented.

I believe that the goal should be external validity. Sure, you need internal validity as an intermediate step, but internal validity should not be goal, in and of itself.

Am I missing something?

My reply: One way to think about it is that you can get estimated causal effects for everyone, but your estimates are most believable for the core group (in this case, the intersection of (a) oversubscribed charter schools and (b) students who applied for a lottery) and rely on increasingly strong assumptions as you move away from the core.

At this point, you have two choices: You can report your estimate for the core, which is narrow but less assumption-bound (less external validity, more internal validity), or you can construct estimates for everyone, and then you’ll pay the price in internal validity. I think it’s fine to do this latter strategy; you should just make your assumptions clear.

The thing you don’t want to do is take the estimate for the core and report it as an estimate for the general population without modeling how the treatment effect might vary. In the example above, you’ll want to think about how the effect of charter school compared to public school could be different, (a) for charter schools that are not oversubscribed, and (b) for students who did not apply for a lottery.