Thinking about election forecast uncertainty

Some twitter action

Elliott Morris, my collaborator (with Merlin Heidemanns) on the Economist election forecast, pointed me to some thoughtful criticisms of our model from Nate Silver. There’s some discussion on twitter, but in general I don’t find twitter to be a good place for careful discussion, so I’m continuing the conversation here.

Nate writes:

Offered as an illustration of the limits of “fundamentals”-based election models:

The “Time for Change model”, as specified here, predicts that Trump will lose the popular vote by ~36 points (not a typo) based on that 2Q GDP print.

On the other hand, if you ground your model in Real Disposable Income like some others do (e.g. the “Bread and Peace” model) you may have Trump winning in an epic landslide. Possibly the same if you use 3Q GDP (forecasted to be +15% annualized) instead of 2Q.

It certainly helps to use a wider array of indicators over a longer time frame (that’s what we do) but the notion that you can make a highly *precise* forecast from fundamentals alone in *this* economy just doesn’t pass the smell test.

Looking at these “fundamentals” models (something I spent a ton of time in 2012) was really a seminal moment for me in learning about p-hacking and the replication crisis. It was amazing to me how poorly they performed on actual, not-known-in-advance data.

Of course, if you’re constructing such a model *this year*, you’ll choose variables in way that just so happens to come up with a reasonable-looking prediction (i.e. it has Trump losing, but not by 36 points). But that’s no longer an economic model; it’s just your personal prior.

Literally that’s what people (e.g. The Economist’s model) is doing. They’re just “adjust[ing]” their “economic index” in arbitrary ways so that it doesn’t produce a crazy-looking number for 2020.

To support this last statement, Nate quotes Elliott, who wrote this as part of the documentation for our model:

The 2020 election presents an unusual difficulty, because the recession caused by the coronavirus is both far graver than any other post-war downturn, and also more likely to turn into a rapid recovery once lockdowns lift. History provides no guide as to how voters will respond to these extreme and potentially fast-reversing economic conditions. As a result, we have adjusted this economic index to pull values that are unprecedentedly high or low partway towards the limits of the data on which our model was trained. As of June 2020, this means that we are treating the recession caused by the coronavirus pandemic as roughly 40% worse than the Great Recession of 2008-09, rather than two to three times worse.

This is actually the part of our model that I didn’t work on, but that’s fine. The quote seems reasonable to me.

Nate continues:

Maybe these adjustments are reasonable. But what you *can’t* do is say none of the data is really salient to 2020, so much so that you have to make some ad-hoc, one-off adjustments, and *then* claim Biden is 99% to win the popular vote because the historical data proves it.

Elliott replies:

We take annual growth from many economic indicators same as you, and then we scale it with a sigmoid function so that readings outside huge recessions don’t cause massive implausible projections. That’s empirically better than using the linear growth term.

We also find that there’s a strong relationship between uncertainty and the economic index, so we’re putting much less weight on it right now than we normally would. That all sounds imminently sensible to me!

Honestly is sounds like the main difference between what you’ve done in the past and what we’re doing now is that we’re shrinking the predictions in really shitty econ times toward the prior for a really bad recession — a choice which I will happily defend.

And they get into some hypothetical bets. here’s Nate:

On the off-chance our respective employers would allow it, which they almost certainly wouldn’t in my case, could I get some Trump wins the popular vote action from you at 100:1? Or even say 40:1?


25:1 sure


So that’s your personal belief? That starts to get a lot different from 100:1.

As of today, our model gives Biden a 91% chance of winning the electoral vote and a 99% chance of winning the popular vote. That 99% is rounded to the nearest percentage point, so it doesn’t exactly represent 99-to-1 odds, but Nate has a point that, in any case, this is far from 25-to-1. Indeed, we’ve been a bit uncomfortable about this 99-to-1 thing for awhile. Our model as written (and programmed) is missing some sources of uncertainty, and we’ve been reprogramming it to do better. It takes awhile to work all these things out, and we hope to have a fixed version soon. The results won’t change much, but they’ll change some.

Why won’t the results change much? As we’ve discussed before, our fundamentals model is predicting Biden to get about 54% of the two-party vote, and Biden’s at about 54% in the polls, so that gives us a forecast for his ultimate two-party vote share of . . . about 54%. Better accounting for uncertainty in the model won’t do much to the point forecast of the national vote, nor will it do much to the individual state estimates (which are based on a combination of the relative positions of the states in the 2016 election, some adjustments, and the 2016 state polls), but it can change the uncertainty in our forecast.

Unpacking the criticisms

As noted above, I think Nate makes some good points, so let me go through and elaborate on them.

1. Any method for forecasting the national election will be fragile and strongly dependent on untestable assumptions.

Yup. There have only been 58 U.S. presidential elections so far, and nobody would think that we could learn much from the performance of Martin Van Buren etc. When I started looking at election forecasting, back in the the late 80s, we were only counting elections since 1948. We’ve had a few more elections since then, but on the other hand the elections of the 1950s are seeming pretty more and more irrelevant when trying to understand modern polarized politics.

Nate gives some examples of election forecasting models that have seemed reasonable in the past but which would yield implausible predictions if applied unthinkingly to the current election, and he says he uses a wider array of indicators over a longer time frame, which is what we do too. From a statistical perspective, we have lots of reasonable potential predictors, and the right thing to do when making a forecast is not to choose one or two predictors, but to include them all, using regularization to get a stable forecast.

There is value in looking at models with just one predictor (that’s what we do in the election example in Regression and Other Stories) because it helps us understand the problem better, but when it’s time to lay your money down and make a forecast, you want to use as much information as you can.

Another twist, which we include in our model and discuss in our writeup, is partisan polarization has increased in recent decades, hence we’d expect the impact of the economy on the election to be less now than it was 40 years ago.

But no matter how you slice it, our model, like any others, has lots of researcher degrees of freedom, and I agree with Nate that you can’t make a highly precise forecast from fundamentals alone.

To flip it around, though, you have to do something. Our fundamentals-based model forecasts Biden with 54% of the two-party vote. We gave this forecast some uncertainty, but maybe not enough, which is why we’ve been revising the model. I think this is a matter of details, not a fundamental disagreement with Nate or anyone else. It would be a mistake for us or anyone to claim that their fundamentals-based model can make a highly precise vote forecast right now.

There’s one place where I think Nate was confused in his criticism, and that’s where he wrote that we “say none of the data is really salient to 2020, so much so that you have to make some ad-hoc, one-off adjustments, and then claim Biden is 99% to win the popular vote because the historical data proves it.” First, we never said the data aren’t salient to 2020! We do think that the economy should be less predictive than in earlier decades, but that’s not the same as setting a coefficient to zero. Second, our 99% does not come from the forecasting model alone! That 99% is coming from the forecast plus the polls.

Again, we do think that our forecast interval was too narrow and that our analysis did not fully account for forecasting uncertainty, and that should end up lowering the 99% somewhat. That’s one reason I think Nate’s basically in agreement with us. He appropriately reacted to this 99% number and then I think slightly misunderstood what we were doing and attributed it all to the fundamentals-based forecast. And, to be fair, the fundamentals-based forecast is the one part of our model that’s a black box without shared code.

I just want to say, again, that we need some kind of forecast. We could just say we know nothing and we’ll center it at 50%, but that doesn’t seem right either, given that Trump only received 49% last time and now he’s so unpopular and the economy is falling apart. But then again he’s an incumbent. . . . But then again the Democrats have outpolled the Republicans in most of the recent national elections. . . . Basically, what we’re doing when we have such conversations is that we’re reconstructing a forecasting model. If you want, you can say you know nothing and you’ll start with a forecast of 50% +/- 10%. But I think that’s a mistake. I wouldn’t bet on that either!

2. How to think about that 99%?

I wouldn’t be inclined to bet 99-1 on Biden winning the national vote, and apparently Elliott wouldn’t either? So what are we doing with this forecast:

There are a few answers here.

The first answer is that this is what our model produces, and we can’t very well tinker with our model every time it produces a number that seems wrong to us. It could be that the model’s doing the right thing and our intuition is wrong.

The second answer is that maybe the model does have a problem, and this implausible-seeming probability is a signal that we should try to figure out what’s wrong. That’s the strategy we recommend in Chapter 6 of our book, Bayesian Data Analysis: use your model to make lots of predictions, then look hard at those predictions and reconsider your modeling choices when the predictions don’t make sense. If a prediction “doesn’t make sense,” that implies you have prior or external information not already included (or not appropriately included) in the model, and you can do better.

This indeed is what we have done. We noticed these extreme predictions awhile ago and got worried, and we’re now in the middle of improving our model to better account for uncertainty. We’ve kept our old model up in the meantime, because we don’t think our probabilities are going to change much—again, with Biden at 54% in the forecast and 54% in the polls, there’s not so much room for movement—but we expect the new model will have a bit more posterior uncertainty.

At this point you can laugh at us for being Bayesian and having arbitrary choices, but, again, all forecasting methods will have arbitrary choices. There’s no way around it. This is life.

But there’s one more thing I haven’t gotten to, and that’s the difficulty of evaluating 99-to-1 odds. How to even think about this? Even if I fully believed the 99-to-1 odds, it’s not like I’m planning to lay down $1000 for the chance of winning $10. That wouldn’t be much of a fun bet.

I think we can move the discussion forward by using a trick from the judgment and decision making literature in psychology and moving the probabilities to 50%.

Here’s how it goes. Our forecast right now for Biden’s share of the two-party vote is 54.2% with a standard error of about 1.5%. The 50% interval is roughly +/- 2/3 of a standard error, hence [53.2%, 55.2%].

Do we think there’s a 50% chance that Biden will get between 53.2% and 55.2% of the two-party vote?

The range seems reasonable—it does seem that, given how things are going, that a share of less than 53.2% for Biden would be surprisingly low, and a share of more than 55.2% would be surprisingly high. Polarization and all that. Still and all, [53.2%, 55.2%] seems a bit narrow for a 50% interval. If I were given the opportunity to bet even money on the inside or the outside of that interval, I’d choose the outside.

So, yeah, I think the uncertainty bounds at our site are too narrow. I find it easier to have this conversation about the 50% interval than about the 99% interval.

That said, there are the fundamentals and the polls. So I don’t think the interval is so bad as all that. It just needs to be a bit wider.

At this point, we’re going in circles, interrogating our intuitions about what we would bet, etc., and it’s time to go back to our model and see if there are some sources of uncertainty that we’ve understated. We did that, and we’re going through and fixing it now. In retrospect we should’ve figured this out earlier, but, yeah, in real life we learn as we go.

Criticism is good

Let me conclude by agreeing with another of Nate’s points:

These are sharp, substantive critiques from someone who has spent more than 12 years now thinking deeply about this stuff. These models are in the public domain and my followers learn something when I make these critiques.

One of the reasons we post our data and code is so that we can get this sort of outside criticism. It’s good to get feedback from Nate and others. I wasn’t thrilled when Nate dissed MRP while not seeming to understand what MRP actually does, but his comments on our forecasting model are on point.

I still like what we’re doing. Our method is not perfect, it has arbitrary elements, but ultimately I don’t see any way around that. I guess you can call me a Bayesian. But there’s nothing wrong with Nate or anyone else reminding the world that our model has forking paths and researcher degrees of freedom. I wouldn’t want people to think our method is better than it really is.

Again, I like that Nate is publicly saying what he doesn’t like about our method. I’d prefer even more if he’d do this on a blog with a good comments section, as I feel that it’s hard to have a sustained discussion on twitter. But posting on twitter is better than nothing. Nate and others are free to comment here—but I don’t think Nate would have anything to disagree with in this particular post! He might think it’s kinda funny that we’re altering our model midstream, but that’s just the way things go in this world when our attention is divided among so many projects and we’re still learning things every day.

Another way to put it is that, given our understanding of politics, I’d be surprised if we could realistically predict the national vote to within +/- 1 percentage point, more than three months before the election. Forget about the details of our model, the crashing economy, forking paths, overfitting, polling problems, the president’s latest statements on twitter, etc. Just speaking generally, there’s uncertainty about the future, and a 50% interval that’s +/- 1 percentage point seems too narrow. If we’re making such an apparently overly precise prediction, it makes sense for outsiders such as Nate to point this out, and it makes sense for us to go into our model, figure out where this precision is coming from, and fix it. Which we’re doing.