Causal inference in AI: Expressing potential outcomes in a graphical-modeling framework that can be fit using Stan

David Rohde writes:

We have been working on an idea that attempts to combine ideas from Bayesian approaches to causality developed by you and your collaborators with Pearl’s do calculus. The core idea is simple, but we think powerful and allows some problems previously that only had known solutions with the do calculus to be solved in the Bayesian framework (in particular the front door rule).

In order to make the idea accessible we have produced a blog post (featuring animations), an online talk and technical reports. All the material can be found here.

Currently we focus on examples and intuition, we are still working on proofs. Although we don’t emphasise it, our idea is quite compatible with probabilistic programming languages like Stan where the probability of different outcomes for different counterfactual actions can be computed in the generated quantities block.

I took a quick look. What they’re saying about causal inference being an example of Bayesian inference for latent variables makes sense; I think this is basically the perspective of Rubin (1974). I think this is a helpful way of thinking, so I’m glad to see it being expressed in a different language. I’d recommend adding Rubin (1974) to your list of references. This is also the way we discuss causal inference in our BDA book (in all editions, starting with the first edition in 1995), where we take some of Rubin’s notation and explicitly integrate them into a Bayesian framework. But the causal analyses we do in BDA are pretty simple; it seems like a great idea to express these general ideas in more computing-friendly framework.

Regarding causal inference in Stan: I think that various groups been implementing latent-variable and instrumental-variables models, following the ideas of Angrist, Imbens, and Rubin, but generalizing to allow prior information and varying treatment effects. It’s been awhile since I’ve looked at the Bayesian instrumental and latent-variables literature, but it’s my recollection that I thought things could be improved using stronger priors: a lot of the pathological results that arise with weak instruments can be traced to bad things in the limits with weak priors. These are the sorts of examples where a full Bayesian inference can be worse than some sort of maximum likelihood or marginal maximum likelihood because of problems of integrating over the distribution of a ratio whose denominator is of uncertain sign.

I guess what I’m saying is that I think there are some important open problems of statistical modeling here. Improvements in conceptualization and computation (such as may be demonstrated by the above-linked work) could be valuable in motivating researchers to push forward on the modeling as well.