Princeton Election Consortium

Innovations in democracy since 2004

Outcome: Biden 306 EV (D+1.2% from toss-up), Senate 50 D (D+1.0%)
Nov 3 polls: Biden 342 EV (D+5.3%), Senate 50-55 D (D+3.9%), House control D+4.6%
Moneyball states: President AZ NE-2 NV, Senate MT ME AK, Legislatures KS TX NC

Tiebreaking, secret sauce, and the case of the missing cell phones

November 21st, 2008, 9:25pm by Sam Wang

On Election Eve I gave predictions that came very close to the final outcomes. However, I made an assumption about missed cell phone users. Today I will show you that the same prediction arises without this assumption if we use the tie-breaking “secret sauce” I mentioned on Election Eve. What’s left is just polls, with no add-ons at all.

The secret sauce is variance minimization. Variance minimization (VM) allows us to use polls older than the last seven days of polling, which was the rule used for the daily snapshot. VM is also what I used to predict individual state outcomes. In addition to these uses, it can also be used to identify the last game-shifting event: the first Obama-McCain debate on Sept. 25th. After that, the rest was noise.

The basics. The underlying concept is that variation in individual measurements arises from two sources: (1) sampling repeatedly from a fixed distribution, and (2) changes in the distribution. On average, sampling from a fixed distribution gives the same standard deviation, independent of the number of samples taken. If the distribution shifts or changes shape, then the standard deviation would be likely to change.

Therefore if we plot the standard deviation of the Median EV Estimator over various time windows from Date X until Election Eve, allowing X to vary, the standard deviation should stay approximately constant for periods during which the race did not shift – but rise steadily in periods of true change. The plot looks like this.

Until October 2nd, the standard deviation of the EV Estimator decreased steadily. Then it reached a plateau. This suggests that after the 2nd, the EV Estimator was sampling repeatedly from the same parent distribution.

The last game-shifting event: Debate #1. For the snapshot, the averaging rule was to use all polls with a median date within 7 days of the most recent poll. On October 2nd this would have included polls taken on or after September 25th, approximately coincident well with the first Obama-McCain debate (September 26th). So it appears that in terms of the Electoral College, October events did not shift the race. This includes the second and third debates, the VP debate, and Joe the Plumber.

This is not to say that nothing happened. The Popular Meta-Margin did rise steadily during the first half of October:
History of Popular Meta-Margin for Obama since April 1
The rising Meta-Margin was a consolidation of the electoral standings, as oppossed to a shift in the race.

The unchanging nature of the race suggests that the best EV estimator would use all polls with a median date of September 27th or after. An estimator over this longer time window gives a probability distribution that looks like this.

This distribution’s median and mode are both 364 EV (not 367 because of microscopic differences in the probabilities of safe states). This is the same as my Election Eve prediction.

The average is 365.2 EV. Although this number is pleasingly close to the final outcome of 365 EV, it is a coincidence since we did not model the Nebraska 2nd Congressional District, which gave 1 EV to Obama.

Polls over periods as short as 2 weeks also give values of 364 or 367 EV. This is important because the graph above also shows a possible drop in standard deviation around October 23rd – one week after the last debate. Unfortunately, 1 week was probably too short, since it gave 352 EV, which does not match results on any longer time scale.

Note the remarkable spikiness of the distribution. This arises from the fact that only three states were uncertain: Indiana (11 EV), Missouri (11 EV), and North Dakota (3 EV). Since Indiana and Missouri have an equal number of electoral votes the distribution is “degenerate,” with six peaks instead of eight. 96% of the distribution is contained in just six outcomes: 353 (12%), 356 (12%), 364 (24%), 367 (24%), 375 (12%), and 378 (12%) EV. If we could find a more determinate probability of a Democratic win in North Dakota, there would be only three peaks (353, 364, 375) corresponding to how many 11-EV states were won by each candidate. Because of degeneracy, a guess of 364 EV would have a 48% probability of being correct. This leads us to…

Maximizing the confidence of individual state predictions. VM can be applied to individual states. This is how I arrived at my final predictions for Indiana, Missouri, North Carolina, and North Dakota. The plot of standard deviation against number of polls for Missouri looks like this:

Assuming that we are using more than the last 7 days of polling (8 polls), this plot suggests that we should average 15 polls (oldest poll 10/17-20; mean margin McCain +0.3%). Taking a similar approach gives 11 polls in Indiana (10/23-24; McCain +1.9%), 11 polls in North Carolina (10/27-29; Obama +0.5%), and 5 polls in North Dakota (10/6-8; McCain +0.6%). The sign of the mean predicted the correct winner in 3 out of 4 states. The error was Indiana, where the final margin was Obama +0.9%. Although 3 out of 4 is not quite statistically significant (P=0.06), it’s enough to justify trying VM again for future tiebreaking.

The North Dakota prediction was reasonable for a second reason. Polls with an Obama lead were commissioned by Democratic-leaning organizations (DailyKos/Research 2000 and the United Transportation Union). But since variation in pollster reliability carries a bias of unknown size, I was glad to avoid using that information.

What about the cell-phone effect? Assuming a cell-phone effect appears to be unnecessary. At this point we can estimate the size of the net bias, which includes the cell phone effect and other offsets (such as the fabled Bradley effect). Based on 68% confidence intervals of the median EV estimator, the net bias was [-0.6%, +0.4%], where positive values indicate a hidden bonus for Obama. Therefore I suggest that the cell-phone effect and other offsets sum to less than a percentage point.

Conclusions on the uses of variance minimization (VM). VM provides a reasoned approach to a problem faced by all electoral hobbyists, identifying how much polling data to use. Applied to the 2008 race, VM suggests that the sum of net polling biases, including the cell phone effect, is indistinguishable from zero. By removing the need for a cell-phone correction, a VM-based approach reconciles the logic behind my national popular-vote and electoral-vote predictions.

Finally, the big lesson: polls taken in groups don’t need corrections. Let’s repeat that a few times…

Tags: 2008 Election

10 Comments so far ↓

  • Observer

    Don’t be too hard on yourself, Sam. You’re only 0 for 2 on eschewing corrections. Another big chance in only four years.

  • Independent

    Very impressive. There is real added value here.

  • AAF

    Very interesting.

    A few questions:

    1. The X axis on the first graph, labeled date, is in numbers — are those the numbered-day of the year? It’s a bit confusing because they also happen to be roughly the range of EV’s we were moving through during that period.

    2. To my untrained eye, this looks like the kind of analysis that can be used predictively, before an election. Is that right? Or is it really only useful after the fact as a post-mortem analysis?

    3. Did you back-test it against 2004 and whatever other samples you have?

    4. The graph you show for Missouri is interesting — is it a snapshot, analyzing on election day how standard deviation is affected by how far back you go in including polls, or is it showing how the standard deviation would have looked over time if on each day during the campaign you had included X days of polls?– and did you do the same thing for the national race?

  • mddemocrat

    You are well on your way to debunking all theories of “tinkering” with poll reults…thanks again for sharing!

  • BCC

    Great analysis.

    Why don’t you give Nate Silver a call, get him to promise to post the transcript, and then hurl insults at him? That would help raise your visibility.

    I’m kidding, of course- you are the anti-Ziegler.

  • Sam Wang

    BCC – I’m not sure he would want our first extended conversation on this subject to be one that turned public. On the other hand, this year there is more interest in this kind of discussion than I would have imagined likely.

    AAF – I’ll think about how to label these graphs more clearly.
    1. Yes. That’s why I added “Oct. 1″ and “Nov. 1.”
    2. Yes, it’s useful prospectively, and perhaps also to identify campaign turning points.
    3. Not yet, though it’s a good idea.
    4. This graph shows the SD of the last N polls, where N is on the horizontal axis. I could have plotted date horizontally, but it seems clearer to me in its present form.

  • BCC

    Do *you* have a transcript from said conversation? Nerd (and I use the term positively) wars can be entertaining.

  • William

    Why wouldn’t he want your first extended conversation on this subject turned public?

  • Sam Wang

    BCC – Certainly not the Ziegler conversation, though I really didn’t follow that flap. Nerd wars: to paraphrase Jon Stewart, do you really want me to be your dancing monkey?

  • Ken

    “Finally, the big lesson: polls taken in groups don’t need corrections. Let’s repeat that a few times…”

    I would rephrase as, “Reasonably well performed polls taken in groups don’t need corrections.”

    Presumably the polls that provided the data set were good (or at least balanced for error) and therefore a meta-analysis provided the necessary power to discern reality. A meta-analysis won’t correct for situations where the pollsters are consistently off. Nontheless, there’s no question that poll aggregation at the very least will minimize built in errors.

    Also, a meta-analysis given the current polling group could still end up being wrong in situations where there’s still not enough statistical power to detect a difference or deal with voting problems (such as Floridao 2000).

Leave a Reply to BCC (Cancel)