# My review of Ian Stewart’s review of my review of his book

A few months ago I was asked to review Do Dice Play God?, the latest book by mathematician and mathematics writer Ian Stewart.

Here are some excerpts from my review:

My favorite aspect of the book is the connections it makes in a sweeping voyage from familiar (to me) paradoxes, through modeling in human affairs, up to modern ideas in coding and much more. We get a sense of the different “ages of uncertainty”, as Stewart puts it.

But not all the examples work so well. The book’s main weakness, from my perspective, is its assumption that mathematical models apply directly to real life, without recognition of how messy real data are. That is something I’m particularly aware of, because it is the business of my field — applied statistics.

For example, after a discussion of uncertainty, surveys and random sampling, Stewart writes, “Exit polls, where people are asked who they voted for soon after they cast their vote, are often very accurate, giving the correct result long before the official vote count reveals it.” This is incorrect. Raw exit polls are not directly useful. Before they are shared with the public, the data need to be adjusted for non-response, to match voter demographics and election outcomes. The raw results are never even reported. The true value of the exit poll is not that it can provide an accurate early vote tally, but that it gives a sense of who voted for which parties once the election is over.

It is also disappointing to see Stewart trotting out familiar misconceptions of hypothesis testing . . . Here’s how Stewart puts it in the context of an otherwise characteristically clearly described example of counts of births of boys and girls: “The upshot here is that p = 0.05, so there’s only a 5% probability that such extreme values arise by chance”; thus, “we’re 95% confident that the null hypothesis is wrong, and we accept the alternative hypothesis”. . . .

As I recall the baseball analyst Bill James writing somewhere, the alternative to good statistics is not no statistics: it’s bad statistics. We must design our surveys, our clinical trials and our meteorological studies with an eye to eliminating potential biases, and we must adjust the resulting data to make up the biases that remain. . . . One thing I like about Stewart’s book is that he faces some of these challenges directly. . . .

I believe that a key future development in the science of uncertainty will be tools to ensure that the adjustments we need to make to data are more transparent and easily understood. And we will develop this understanding, in part, through mathematical and historical examples of the sort discussed in this stimulating book.

As you can see from the above excerpts, my review is negative in some of the specifics but positive in general. Stewart had some interesting things to say but, when he moved away from physics and pure mathematics to applied statistics, he got some details wrong.

A month or so after my review appeared, Stewart replied in the same journal. His reply is short so I’ll just quote the whole thing:

In his review of my book Do Dice Play God?, Andrew Gelman focuses on sections covering his own field of applied statistics (Nature 569, 628–629; 2019). However, those sections form parts of just two of 18 chapters. Readers might have been better served had he described the book’s central topics — such as quantum uncertainty, to which the title of the book alludes.

Gelman accuses me of “transposing the probabilities” when discussing P values and of erroneously stating that a confidence interval indicates “the level of confidence in the results”. The phrase ‘95% confident’, to which the reviewer objects, should be read in context. The first mention (page 166) follows a discussion that ends “there’s only a 5% probability that such extreme values arise by chance. We therefore … reject the null hypothesis at the 95% level”. The offending sentence is a simplified summary of something that has already been explained correctly. My discussion of confidence intervals has a reference to endnote 57 on page 274, which gives a more technical description and makes essentially the same point as the reviewer.

I also disagree with Gelman’s claim that I overlook the messiness of real data. I describe a typical medical study and explain how logistic and Cox regression address issues with real data (see pages 169–173). An endnote mentions the Kaplan-Meier estimator. The same passage deals with practical and ethical issues in medical studies.

Here’s my summary of what Stewart said:

1. My review focuses on my own areas of expertise, which only represent a small subset of what the book is about.

2. His technically erroneous statements about hypothesis testing should be understood in context.

3. He doesn’t mention the bit about polling. Maybe he agrees he made a mistake there but he doesn’t want to talk about it, or maybe he didn’t want to look into polling too deeply, or maybe thinks the details of exit polls don’t really matter.