There’s a meta-principle of mathematics that goes as follows. Any system of logic can be written in various different ways that are mathematically equivalent but can have different real-world implications, for two reasons: first, because different formulations can be more directly applied in different settings or are just more understandable by different people; second, because different formulations more conveniently allow different generalizations.

Familiar examples from classical physics include Newton’s laws, conservation rules, the least-time principle, and Hamiltonian dynamics, which (if you’re careful with the details) represent different ways of saying the same thing, but which can be effective in different ways for problem-solving and understanding, and which lead to different generalizations. Similarly, quantum mechanics can be expressed using different formalisms: Schrodinger’s equation, path diagrams, etc. I don’t remember my physics very well so I’m probably garbling some of the above details, but I think the general principle holds.

In Euclidean geometry, it’s been well known for a long long time that the famous 5 postulates can be reformulated in different ways, which can make a difference when we want to generalize to non-Euclidean systems (for example, the sum of the angles of a triangle on a sphere will always be greater than 180 degrees). Similarly, in analysis, the Bolzano-Weierstrass theorem can instead be a postulate if you want to do things that way.

With that history in mind, we realize that sometimes it can be helpful in understanding a problem to bring up a different formulation, even if mathematically equivalent to existing frameworks, because of the connections it makes to particular areas of applications, or because of how it conveniently generalizes.

**Causal inference**

And that brings us to the topic of causal inference, which already can be expressed in several mathematically equivalent ways using regression modeling, potential outcomes, or graphical models. See the book by Morgan and Winship for review and discussion of all these frameworks.

Here I want to talk about another way of looking at causal inference, this time using multilevel modeling.

The basic idea is as follows: *causal* comparisons are *within* a person, but we can often only directly make *descriptive* comparisons *between* people. (More generally, we could replace “people” by “causal units of analysis,” which could be schools or cell cultures or countries or whatever.)

I was thinking about this when reading the discussion thread on this post, and also when attending a presentation recently which featured the estimation of a causal effect without, it seemed to me, a clear sense of what was being estimated.

In the above-linked thread, people were saying that predictive inference is different from causal inference, I was saying that causal inference *is* predictive inference, and everyone seemed to be talking past each other.

It struck me that some of the confusion is arising from a lack of clarity, not about causal inference but about predictive inference.

There are lots of (mathematically equivalent) definitions of causal inference, but what exactly is “predictive inference”? It depends on what is being predicted.

Causal inference is, I believe, unambiguously about comparisons within people (or, more generally, within units), but prediction can be about anything.

When correspondents on the thread were saying that predictive and causal inference are different, they were thinking about predictions between people. For example, if you measure (x_i, y_i) on a bunch of people, i=1,…,n, and then you run a regression of y on x, and then you use this to predict y_i for new people, that’s predictive inference between people and does not directly address any causal question—not without some strong assumptions. Not just distributional assumptions about p(x,y) in the population, but assumptions about variation in y *within* a person. I think that’s what Judea Pearl is talking about when he says that causal inference goes beyond statistics. All the statistics in the world on p(x,y) in the population—data, model, theory, whatever—isn’t enough to answer questions about variation in y within a person. It’s as if statistics is living on a flat surface, and causal inference is the third dimension. No amount of movement on the floor, no matter how efficient, will take you into the air. So I think that’s what Pearl is getting at. Econometricians are making a similar point with their notation, by looking at various approaches for estimating causal effects using regression, and pointing out that these all require assumptions about potential outcomes.

OK. Now suppose you have multiple observations on each person, or multiple potential observations. (For the purpose of probability modeling, it doesn’t matter if these additional measurements are observed or just latent or potentially observable.) Causal inference refers to particular statements about these potential observations, and causal questions about these multiple measurements per person can be addressed using statistical models.

To put it another way, we reach the “third dimension” by considering within-person comparisons. All the between-person analysis in the world won’t take you to that third dimension, not without some strong assumptions. (Even a clean randomized experiment can only tell you about average effects, not anything about individual effects unless you’re willing to assume something about the distribution of these within-person comparisons.)

This is one reason that Eric Loken and I say that, in a psychology experiment studying effects on individual people, you should do within-person comparisons as this is the only direct way to study causal effects. (As Rubin wrote, in causal inference, design trumps analysis.) If all you have are within-person comparisons, you have to make big assumptions to get into the air.

To slightly adapt Pearl’s framing, the distinction is not between “a statistical quantity” and “a causal quantity,” but rather between a between-person comparison and a within-person comparison. It just happens that in statistics we typically learn about causal inference for within-person treatments in the context of data that only allow between-person comparisons.

I don’t know that Pearl fully realizes the way in which, using multilevel modeling, we can combine within- and between-person inference and incorporate causal structures into statistical models. I think it’s been a mistake of the field of statistics (including my own textbooks) to present casual inference as if between-person comparisons from randomized experiments are a “gold standard” for within-person causal inference, and I can see why this would legitimately frustrate Pearl and others.