I got into poll aggregation in 2004 to reduce endless chatter about outlier polls. Hmmm, how’d that work out…
— Sam Wang (@SamWangPhD) November 2, 2014
At least one journalist is chattering about whether there’s a late break in polls for Republicans…based on one data point, which is probably statistical noise. Some people are hopeless. Then again, several polls today have pushed the Meta-Margin almost as far as it’s been toward Republicans this campaign season.
He is missing a far more important point: Final election results can vary across-the-board from midterm polls – in the same direction. This last-minute polling bias is typically 2-3 percentage points – five times larger than the bias in presidential years. The direction of the bias is unpredictable. (Read this for a review of the subject.) This is why I care about the exact margins for front-runners McConnell (R-KY) and Shaheen (D-NH). Their states are early-reporting. From them we can make a rough estimate of nationwide polling bias, as follows.
Even compared with the last week of polling, results can differ substantially by several percentage points. Even at the last minute, the bias (or bonus, if you look at it from the point of view of final results) can be rather large. Candidates could still win if they trailed by a margin of less than 3 percentage points in the week before the election. Here are the details for 2010 and 2012:
(click the image to see data going back to 2004)
Here, negative numbers (in red) indicate a GOP final win or polling lead, positive numbers (in blue) indicate a Democratic final win or polling lead. In 2010, the error was enough to flip the result to the opposing candidate in Colorado and Nevada. In 2012, no such errors occurred. The last column is the “bonus”: how much either party overperformed polls on Election Day.
Democrats, do not be fooled by this sea of blue. The error can go in either direction, as I wrote on Friday. It is not correlated with which party had a wave that year. It might possibly have to do with late shifts in voter opinion.
Statistically, we know that this is a correlated error. Here is why. The median year-by-year error, which measures systematic error (i.e. nationwide error) is much larger than the standard error of the mean, which is much less than 1 percentage point. This proves that the overall polling bias bounces around substantially from election to election. Furthermore, midterms vs. presidential years differ (using a one-tailed t-test, p=0.03). In short – midterms are weird, and there may well be an unpredictable overall error this year. There are six Senate races whose medians are within two percentage points. Republicans could win all six – and Democrats could win all six. Based on past midterm polling, both of these outcomes are within the range of possibility.
Here is the nationwide bonus compared with polls in Senate races that were ultimately decided by 10 percentage points or less (for data going back to 1990 read this).
Let’s call this last-minute bonus “Delta.” It is the opposite of the bias I am writing about. I am proposing to estimate Delta as early as possible on Election Night. In Kentucky and New Hampshire, which end voting very early, I hope to get some indication of what Delta will be.
For example, if McConnell wins by only 4 percentage points, he is underperforming polls by 2 points – and so might Republican candidates in other states. If he wins by 9 points, Republican candidates in other states might also do better than expected. Conversely, if Shaheen only wins by 1 point, this would indicate trouble for other Democrats. And if Shaheen wins by 5 points or more, that is good for Democrats nationwide.
This calculation of Delta might give an early-evening indication for four close races that should eventually be resolved by the end of the night: Iowa, Colorado, Kansas, and North Carolina. It will be of less help for Alaska, which is slow to report, and Georgia, which might go to a January runoff.