Princeton Election Consortium

A first draft of electoral history. Since 2004

Presidential prediction (Take 2)

August 8th, 2012, 9:00am by Sam Wang


Thank you for the feedback on this prediction, which uses polls only and no econometric variables. Because it is based on direct measurements of opinion, I contend that it provides a starting place for other, more assumption-based models.

Arguments were made for making an adjustment to the November re-elect probability for President Obama, which I initially gave as 91% (1-sigma 285-339 EV, 2-sigma 257-358 EV).  August 8 update: The use of a longer-tailed distribution to allow for black-swan events brings the probability down to 87%. My analysis follows below.

In the previous post, I used highly detailed EV trajectories that I calculated in 2004 and 2008 to estimate a range for where 2012 might go. Several of you wanted to know if my choice for the Meta-margin SD (2.2%) was supported by pre-2004 races. The answer: yes, to the extent that the data allow us to see.

Looking at Gallup national data back to 1972, in 5 out of 6 re-election races, the leader from the post-primaries to early August went on to win in November. The exception was 2004, when Kerry led Bush by an average of 1%.

Variability for those years is challenging to estimate because in each case Gallup had fewer than a tenth as many polls as I had for 2004 and 2008. For example, in 2004 Gallup national data was more variable (Gallup standard deviation=4.9%) than indicated by the Meta-analysis SD that year (MMSD~2%). Overall, Gallup re-election race data have standard deviation values (2.9 to 4.9%) similar to Gallup 2004/2008 (4.9% and 2.4%). In other words, 2008 and 2004 do seem to resemble past races. Therefore my estimate of MMSD=2.2% is provisionally OK.

In a second concern, you pointed out that voter sentiment is likely to have a fat-tailed distribution rather than fall on a Gaussian (normal) curve. Evidence for this comes from Ronald Reagan’s 1980 advance from a 3-point lead in June-August to an eventual 10-point victory:Gallup 1980

Note that in this case, the outcome did not change. Reagan’s lead only widened. (Update: a sharper view, consistent with the above graph, has been posted by John Sides. If you look at his snapshot,  the 1980 campaign is no longer an outlier with respect to MMSD. Maybe the fat tails aren’t so fat after all!)

The other five races looked more like this:Gallup 1996

As a rough estimate, the likelihood of a fat-tail outcome is apparently 1/6. If we think of it as having a 50-50 chance of flipping the race (i.e. it could also widen Obama’s lead), its probability is 1/12 = 8%, giving a re-elect probability of 92%. (Update: this might be modeled by using a t-distribution. For an MMSD of 2.2%, the re-elect probability becomes 87%. Assuming MMSD=3.0% would lead to 80%.)

Finally, a few of you expressed a desire to see econometric variables added. To re-emphasize, I have two reasons that I will not do so. First, to my knowledge it is not definitively demonstrated that those variables add new information beyond what is already present in polls. Second, it is ineluctably true that they will add uncertainty. For these reasons I will remain exclusively poll-based.

Now, a comment to modelers: This prediction could be used as a Bayesian prior to which the econometric variables are added. For example, in past years, how have 3rd quarter unemployment and August-to-November poll movement been related, and with what distribution? Then multiply that by my prior.

In summary, for the reasons above, I’ll revise the initial estimate very modestly. Look for its inclusion soon in the right-column chart.

This mostly wraps up the long-term Presidential outlook. To continue the analogy to weather, the National Weather Service issues short-term and long-term forecasts. At some point one wants to know the short-term outlook. I estimate that the polls-alone measure are good for about 3 weeks of short-term prediction. Look for that on this page sometime after Labor Day.

Tags: 2012 Election · President

9 Comments so far ↓

  • Bill N

    I have read the posts here on possibly adding predictors, such as economic variables, to your prediction equation. I have thought a bit about this and have a couple of thoughts. I need to say up front however that I am not exactly clear on the prediction formula you are using, and have filled in my ignorance with the assumption that it is some form of regression equation. I also think I read a post that said Nate Silver uses regression equations.

    If this is the case, I have three concerns about adding new variables. The first is aggregation bias. For example, an economic variable such as unemployment measured at the national level would mean something different than unemployment at the state level. A second issue would be measurement error in the independent variables. The effects of measurement error in the IV is rather clear when you have a single IV, but essentially unpredictable when you have two or more IVs measured with error. A final concern is the omitted variables problem. My research has been in the social and behavioral sciences but has not included any work predicting the outcomes of elections. These issues would be of concern to me in the regression models I work with. I am interested to see if you think there would be similar concerns in predicting the outcome of an election with multiple IVs in a prediction equation.

    Thanks for your response and for doing this site! It is awesome!

    • Sam Wang

      Actually, the point here is to avoid doing any regressions, and to avoid adding variables. Read the post carefully, especially the part where I point out that adding economic variables might not add information, but would certainly add noise. The prediction here is based on polls only, and is therefore more transparent. The outcome is a re-election probability of 84-91%.

      I am still thinking about the assumptions that I have made. Concerns are detailed lovingly in the comment threads, which are constructive conversations with readers.

  • Sam Wang

    Attention geeks: seems to me that the correct approach is not to assume a wrong value for MMSD, but to use a distribution that has fatter tails.

    I am currently considering using the t-distribution for convenience. For instance, the t-distribution for a 2.0-sigma lead and 3 degrees of freedom gives tcdf(2.0,3) = 0.93, a 93% win probability. The normal distribution gives 97.7%.

    Comment please?

  • wheelers cat

    Im totally geeked out. I had a class in weather modelling in undergrad and the prof introduced Catastrophe Theory as part of the curriculum.
    If you want to stay with your weather forecasting framework that might be an interesting line of investigation.
    I agree that adding economic variables is just adding noise.
    What demographic that is Obama or lean Obama is going to flip to Romney if the economy gets worse?
    I cant think of one.
    And there just arent enough ‘undecideds’ left in the swing states to change the EV map.

  • Rick L

    Very interesting calculations. I hope you’ll pardon a simple question: suppose Obama’s odds of winning state A is Pa, and his odds of winning state B is Pb. Because of errors, bias and changes in opinion between polling times and election time, the true odds are Pa+dPa and Pb+dPb. Does your model take into consideration the fact that dPa and dPb may be correlated?

  • Gerry

    Not being much of a geek or statistician you all are a little over my head. When you are talking about economics are you referring to Citizens United as that seems as though it could be a big variable in this race? Lots of new money…doesn’t always make a difference but…

  • Jeffrey Milstein, Ph.D.

    How have you taken into account measurement of racial bias? Polls would tend to understate the degree of racial bias in a Presidential election. The 2008 election is the only precedent in which an Afro-American has been a candidate, whereas all polling and voting data prior to 2008 did not include an Afro-American candidate.

    • Sam Wang

      This idea has been around for some time. It is called the Bradley effect, and has been studied in detail by Daniel Hopkins. Basically, it used to occur but it went away. The 2008 election is but the most recent example, in which Obama’s performance in opinion polls was the same as on Election Day.

Leave a Comment