More on absolute error vs. relative error in Monte Carlo methods

This came up again in a discussion from someone asking if we can use Stan to evaluate arbitrary integrals. The integral I was given was the following:

displaystyle alpha = int_{y in Rtextrm{-ball}} textrm{multinormal}(y mid 0, sigma^2 textrm{I}) , textrm{d}y

where the R-ball is assumed to be in D dimensions so that y in mathbb{R}^D.

(MC)MC approach

The textbook Monte Carlo approach (Markov chain or plain old) to evaluating such an integral is to evaluate

displaystyle frac{1}{M} sum_{m=1}^M textrm{normal}(y^{(m)} mid 0, sigma),

where the marginal distribution of each y^{(m)} is

displaystyle y^{(m)} sim textrm{uniform}(Rtextrm{-ball}).

I provide a Stan program for doing this below.

If you want to do this without Stan, you can rejection sample points in the ball very easily in low dimensions. In higher dimensions, you can generate a random angle and radius (the random radius needs to be drawn non-uniformly in dimensions higher than 1 to account for increasing area/volume as the radius increases). Stan could’ve also used an angle/radius parameterization, but I’m too lazy to work out the trig.

Most draws can contribute zero

If the volume of the R-ball is large compared to the volume containing the draws you get from textrm{normal}(0, sigma), then most of the draws from the R-ball will have roughly zero density in textrm{normal}(0, sigma). The effect of dimensionality is exponential on the difference in volume.

Absolute error

Even the volume of the R-ball is large compared to the volume containing the draws you get from textrm{normal}(0, sigma), the (MCMC) central limit theorem still controls error. It’s just that the error it controls is squared error. Thus if we measure absolute error, we’re fine. We’ll get an estimate near zero which is right to several decimal places.

Relative error

If the value of the integral is 0.01 or something like that, then to get 10% relative error, we need our estimate to fall within 0.001 of the true answer. 0 doesn’t do that.

What we have to do with a naive application of Monte Carlo methods is make a whole lot of draws to get an estimate between 0.009 and 0.011.

Relation to discrete sampling

This is related to the discrete sampling problem we ran into when trying to estimate the probability of winning the lottery by buying tickets and computing a Monte Carlo estimate. See my previous post, Monte Carlo and winning the lottery. The setup was very hard for readers to swallow when I first posted about it (my experience is that statisticians don’t like thought experiments or simplified hello world examples).

Stan program

The complexity comes in from the constraining trasnform to the R-ball and corresponding Jacobian adjustment to make the distribution in the ball uniform.

For the transform, I used a stick-breaking procedure very similar to how Stan transforms simplex variables and the rows of correlation matrix Cholesky factor. The transform requires a Jacobian, which is calculated on the fly. An approach that could be done with uniform draws purely through a transform would be to draw an angle (uniformly) and radius (non-uniformly based on distance from the origin); this would probably be more efficient.

/**
 * Sampling for this program computes
 *
 *   INTEGRAL_{theta in R ball in N dimensions} normal(theta | 0, sigma) d.theta
 */
data {
  int N;
  real R;
  real sigma;
}
parameters {
  vector[N] alpha;
}
transformed parameters {
  // transform alpha to the unit ball
  vector[N] theta;

  // log_abs_det_J accumulates the change-of-variables correction
  real log_abs_det_J = 0;
  {
    real sum_sq = 0;
    for (n in 1:N) {
      real mult = sqrt(1 - sum_sq);
      theta[n] = alpha[n] * mult;
      sum_sq += theta[n]^2;
      log_abs_det_J += log(mult);
    }      
  }
  
  // scale from unit ball to R-ball;  no Jacobian because R is const.
  theta *= R;
}  
model {
  // the Jacobian's triangular, so the log determinant is the sum of
  // the log of the absolute derivatives of the diagonal of J, i.e.,
  // sum(log(d.theta[n] / d.alpha[n])), as computed above

  target += log_abs_det_J;

  // theta is now uniform on R-ball
}
generated quantities {
  // posterior mean of E will be:
  //    INTEGRAL_{y in R-ball} normal(y | 0,  sigma) d.y
  //    =approx= 1/M SUM_{m in 1:M}  normal(theta(m) | 0, sigma)
  //             where theta(m) drawn uniformly from R-ball
  real E = exp(normal_lpdf(theta | 0, sigma));
}

As noted in the program inline documentation, the posterior mean of E is the result of the integral of interest.