Shoumitro Chatterjee, who sent me that paper we discussed yesterday, writes:

I [Chatterjee] recently finished my PhD in economics from Princeton and am starting as junior faculty at Penn State. I do applied work on development using observational and administrative data, and I have a few questions:

1. Is there a difference between multiple comparisons and multiple hypothesis testing?

2. Your examples of multiple comparisons is mostly from experimental settings. I use observational data in my work. Here is my concern: think of any large scale household survey dataset — the demographic and health surveys, the PSID or the National Sample Surveys of India. Many papers use the same data. Different papers are testing different hypotheses — paper A might be running a mincer regression using the NSS and the paper B could look at the effect of neighborhood disease environment on heights. Since paper A and paper B use the same underlying data set, are they also subject to the multiple comparisons problem? (May I also request that in your blog you sometimes mention examples from papers that use large sample surveys for exposition).

3. I was writing a paper with my co-author using the DHS. We had a economic model in mind. That had a few implications which we tested using the DHS. We could not reject the null so we gave up that idea. Next, we came up with another idea (using the DHS), again had an economic model to begin with that we wanted to test and this time it “worked”. We learn by examining the data. I know you write about this in the statistical crisis in science paper. Your recommendation is to “more fully analyze existing data” and analyze all possible comparisons. May I request you to please elaborate on this — may be with respect to my own paper?

We were interested in the effect of economic growth on fertility and we were using the DHS. Therefore the main regression is fertility on economic growth with a bunch of fixed effects. Should we have looked at effect of economic growth on things other than fertility like education, heights of children? (but we were not interested in those questions). We did explore heterogeneity of relationship between growth and fertility and found that deep recessions had a significant relationship to fertility but not booms. We also found that particular countries were driving the results. Finally, we also looked at how long term growth was related to long term changes in fertility. What else could we have done?

1st year statistics taught to Econ graduates could be much better — if taught with specific examples of applied work. Is there a book/lecture notes you’d recommend that teaches the important concepts with examples for applied work? Please let me know.

My reply:

1. I’m not really interested in multiple comparisons or multiple hypothesis testing. To put it another way: the classical multiple comparisons question is that you see a pattern in data and you want to know if it’s “statistically significant,” but you need to adjust for the fact that it’s only one of N possible comparisons you could’ve done. (The appropriate N is not the number of comparisons you *did*, it’s the number you *could have done* had the data been different; see further discussion of this point here.)

The whole deal with multiple comparisons is the selection problem. My solution to the selection problem is to include *all* possible comparisons of interest, ideally using a multilevel model (as discussed here) or, if you want to stick with classical approaches, a multiverse analysis.

2. My colleagues and I do lots of multilevel analysis with big surveys; see for example here. Regarding your question about how to think about multiple analyses of the same dataset: I think the best approach would be to conduct a single analysis looking at all the comparisons of potential interest. I’m not saying that’s easy, but I think it’s the way to go. Here’s an econ paper by Rachael Meager that does Bayesian multilevel modeling. It’s not quite what you’re asking, but I think the same principles can apply to the sorts of problems you are interested in.

3. Regarding the questions for your own research: I’m not quite sure what I would’ve done—to answer that question would require some thought! When I see a crappy paper, it’s easy for me to think of a million things I could’ve done better. It’s more of a challenge to make useful contributions to existing careful work. I have lots of confidence that our methods could make a difference, but I guess it could take some effort.

Speaking generally, here are some tips:

(a) Forget about what’s “significant” and what’s not. Make a table/graph of estimates of everything you might care about. Where an estimate is, relative to some “significance” border, is pretty much irrelevant. A statement such as, “deep recessions had a significant relationship to fertility but not booms,” can mislead. Better to just estimate all these things and accept the uncertainty that results.

(b) Consider lots of interactions. Again, though, expect that most things will not be statistically significant—remember 16—but that doesn’t mean they’re not important. Instead of thinking of your study as establishing definitive truths, think of it as a step forward in our imperfect understanding.

4. Finally, are there books I can recommend with examples? I like Angrist and Pischke; I like my own book with Jennifer and Aki; I think there must be a lot more out there, maybe readers can help?

Chatterjee adds:

One quick clarification.

If a data set has N variables, then the number of possible comparisons is literally N_C_2. Is your suggestion that I include all N_C_2 comparisons? Some of them might not even make economic sense ex-ante. If not then the question is how to choose? Suppose I was interested in the relationship between income and infant mortality, how to choose what other comparisons to include? The basis of that must be some economic theory, right?

My response:

I don’t think that all these comparisons are of interest, but in any case you can implicitly estimate all of them by estimating the vector of all N effects; see the examples in this article.

As to the question of what comparisons to study, yes, I’d expect this to be guided by theory. In my response above, I was assuming that you already had theoretical reasons for studying the things you were looking at.