Abuse of expectation notation

I’ve been reading a lot of statistical and computational literature and it seems like expectation notation is absued as shorthand for integrals by decorating the expectation symbol with a subscripted distribution like so:

displaystyle mathbb{E}_{p(theta)}[f(theta)] =_{textrm{def}} int f(theta) cdot p(theta) , textrm{d}theta.

This is super confusing, because expectations are properly defined as functions of random variables.

displaystyle mathbb{E}[f(A)] = int f(a) cdot p_A(a) , textrm{d}a.

For example, the square bracket convention arises because random variables are functions and square brackets are conventionally used for functionals (that is, functions that apply to functions).

Expectation is an operator

With the proper notation, expectation is a linear operator on random variables, mathbb{E}: (Omega rightarrow mathbb{R}) rightarrow mathbb{R}, where Omega is the sample space and Omega rightarrow mathbb{R} the type of a random variable. In the abused notation, expectation is not an operator because there’s no argument, just an expression f(theta) with an unbound variable theta.

In this post (and yesterday’s), I’ve been following standard notational conventions where capital letters like A are random variables and their corresponding lower case variables used as bound variables. Then rather than using p(cdots) for every density, they are subscripted with the random variables from which they were derived, so the density of random variable A is written p_A.

Bayesian Data Analysis notation

Gelman et al.’s Bayesian Data Analysis book overloads notation using lower case $a$ for both $A$ and $a$. This requires the reader to do a lot of sleuting to figure out which variables are random and which are bound. It led to no end of confusion for me when I was first learning this material. It turns out disambiguating a dense formula with ambigous notation is easier when you already understand the result.

The overloaded notation from Bayesian Data Analysis fine in most applied modeling work, but it makes it awkward to talk about random variables and bound variables simultaneously. For example, on page 20 of the third edition, Gelman et al. write (using textrm{E} for the expectation symbol and round parens instead of brackets and italic derivative symbol),

displaystyle textrm{E}(u) = int u p(u) du.

Here, the u in textrm{E}(u) is understood as a random variable and the other u as bound variables. It’s even worse with the covariance definition,

displaystyle textrm{var}(u) = int (u - textrm{E}(u))(u - textrm{E}(u))^{T} du,

where the u in textrm{var}(u) and textrm{E}(u) are random variables, whereas the other two uses are bound variables.

Using more explicit notation which distinguishes random and bound variables, includes the multiplication operators, specifies range of integration, disambiguates the density symbol, and sets the derivative symbol in roman rather than italics, these become

displaystyle mathbb{E}[U] = int_{mathbb{R}^N} u cdot p_U(u) , textrm{d}u.

displaystyle textrm{var}[U] = int_{mathbb{R}^N} (u - mathbb{E}[U]) cdot (u - mathbb{E}[U])^{top} cdot p_U(u) , textrm{d}u.

This lets us clearly write variance out as an expectation as

textrm{var}[U] = mathbb{E}[(U - mathbb{E}[U]) cdot (U - mathbb{E}[U])^{top} ],

which would look as follows in Bayesian Data Analysis notation,

textrm{var}(u) = textrm{E}((u - textrm{E}(u))(u - textrm{E}(u))^T)

Conditional expectations and posteriors

The problem’s even more prevalent with posteriors or other conditional expectations, which I often see written using notation

displaystyle mathbb{E}_{p(theta , mid , y)}[f(theta)]

for what I would write using conditional expectation notation as

displaystyle mathbb{E}[f(Theta) mid Y = y].

As before, this uses random variable notation inside the expectation and bound variable notation outside, with Y = y indicating the random variable Y takes on the value y.