Wednesday, November 4, 2020

Don’t kid yourself. The polls messed up

To continue our post-voting, pre-vote-counting assessment (see also here and here), I want to separate two issues which can get conflated:

1. Problems with poll-based election forecasts.

2. Problems with the pre-election polls.

The forecasts were off. We were forecasting Biden to get 54.4% of the two-party vote and it seems that he only got 52% or so. We forecasted Biden at 356 electoral votes and it seems that he’ll only end up with 280 or so. We had uncertainty intervals, and it looks like the outcome will fall within those intervals, but, still, we can’t be so happy about having issued that 96% win probability. Our model messed up.

But, here’s the thing. Suppose we’d included wider uncertainty intervals so the outcome was, say, within the 50% predictive interval. Fine. If we’d given Biden a 75% chance of winning and then he wins by a narrow margin, the forecast would look just fine and I’d be happier with our model. But the polls would still have messed up, it’s just that we would’ve better included the possibility of messing up in our model.

To put it another way: a statement such as “The polls messed up,” is not just a statement about the polls, it’s a statement about how the polls are interpreted. More realistic modeling of the polls can correct for biases and add uncertainty for the biases we can’t correct. But when the vast majority of polls in Florida showed Biden in the lead, and then Biden lost there by a few percentage points, that’s a polling error. As the saying goes, The prior can often only be understood in the context of the likelihood.

P.S. To address a couple issues that came up in comments:

– We can’t really say how much the polls messed up in any state until all the votes are counted.

– The polls messed up more in some states than others. Florida is the clearest example of where the polls got it wrong.

– If you include a large enough term for unmodeled nonsampling error, you can say that none of the polls messed up. But my point is that, once you need to assign some large nonsampling error term, you’re already admitting there’s a problem.

– Saying that the polls messed up does not excuse in any way the fact that our model messed up. A key job of the model is to account for potential problems in the polls!

Ultimately, there’s no precise line separating problems in polls with problems in poll-based forecasts. For a simplified example, suppose that all problems would be fixed by adding 2 percentage points to the Republican share for each poll. If the pollsters did this before releasing their numbers, we’d say the polls are fine, no problem at all. A bias corrected is no bias at all. But if the bias is just sitting there and it needs to be corrected later, then that’s a problem, whose scope is reduced by not eliminated by adding an error term allowing the polls to be off by a couple points in either direction.

To put it another way, we already knew that polls “mess up” in the sense of having systematic errors that vary from election to election. Much of poll-based election forecasting has to do with making adjustments and adding uncertainties so that this doesn’t mess up the forecast.



from Hacker News https://ift.tt/38body7

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.