Our Presidential predictor isn’t quite ready, but Andrew Sullivan’s link motivates me to put up a provisional draft on how I’m thinking about it.

For the impatient, here is the punchline: the long-range prediction is a November Obama win with 10-1 odds. (Update: other assumptions could take it down somewhat).

And now the explanation, with caveats (which will be updated over the coming days).

My overarching goal is to draw upon a better-established field, weather forecasting, for a conceptual framework. This post gives a method for a long-range forecast (i.e. July/August to November). At a future date I will address the issue of short-range forecasting, which seems to be possible with an outlook of about 1 month.

As I wrote yesterday, in attempting to make a prediction it is often very hard to know whether information has already been accounted for. Does adding third-quarter unemployment improve predictive power, and is it already reflected in polls? However, adding such a variable will always increase uncertainty. Therefore any complex model is potentially *less* useful than its simpler cousin.

A demonstration of the power of simplicity, using *state polls alone* the Princeton Election Consortium’s simple algorithm did very well in 2004 and 2008: errors of <5 EV and equivalent to <0.5% popular vote margin. Can this be harnessed in for predictive purposes? To quote Miss Sarah, you betcha.

First, let’s examine the 2008 EV history:

It bounces around, but note that from mid-June to the start of September, it showed a comfortable lead for then-Senator Obama, and spent most of its time within several dozen EV of the final outcome. Under the assumption that the EV estimator is a day-to-day gauge of the race, a long-range prediction can then be made.

On any date in Summer and Fall of 2004 and 2008, a good estimator of the November outcome was what had happened so far: a Bush-Kerry dead heat and an Obama lead. Similarly, today our best estimate of Election Day performance is the average of June and July 2012. Right now that’s approximately Obama 315 EV, Meta-margin of +3.0%.

How to estimate the June-to-October variability? Again, use the past. The standard deviation of the 2008 EV estimator was 28.9 EV. In 2004, the standard deviation in the Bush-Kerry race was 29.0 EV. Let us assume that 2012 is similar.

A long-range predictor of the November outcome can then be predicted by building a confidence interval around the 315-EV midpoint. This is best done in units of the Meta-margin, defined as how much popular-vote swing would tie the race. In 2008 it looked like this:

Why use these units? Because the Meta-margin is roughly distributed like a Gaussian. (For tech-weenies: MATLAB kurtosis = 2.73.) In 2008, the standard deviation of the Meta-margin was 2.2%. **Therefore our November prediction is Obama +3.0 +/- 2.2% (1 sigma).**

Now we must map it to units of electoral votes. For now I’ll use the 2008 relationship (the 2012 relationship won’t be representative until late in the season):

That gives a 68% (1-sigma) confidence interval of 285-339 EV shown in the red band at the top of this post, and a 95% (2-sigma) confidence interval of 257-358 EV shown in the yellow band. (Note the resemblance to a hurricane strike zone!)

One can also derive a win probability. An average Meta-margin of +3.0% with an SD of 2.2% gives a lead of 3.0/2.2 = 1.36 sigma. Plugged into the Gaussian distribution function (MATLAB: normcdf(1.36,0,1)) this gives a 91% win probability for Obama of 91%, or as I said the other day, 10-1 odds.

This argument is open to discussion, and I’ll stop now. Have at it in comments.

Adam Billet// Aug 3, 2012 at 7:03 pmI have been following your analysis since 2004 and find it extremely enlightening. I was concerned it wasn’t going to be here for this cycle. I was very pleased when you resumed it.

badni// Aug 3, 2012 at 7:11 pmHow isn’t this extrapolating from just two data points? Don’t you have to look at many elections to see how volatile they tend to be, or how many times a candidate with a certain lead ended up winning?

Matt McIrvin// Aug 3, 2012 at 7:40 pmI guess the relevant assumption is that the Meta-Margin will wander around randomly with some distribution resembling a Gaussian, and Election Day just samples that random distribution at some moment.

In, say October, the autocorrelations in time would become important, and the uncertainty ought to go down.

Sam Wang// Aug 3, 2012 at 7:44 pmExactly, Matt. This is my plan.

Badni, I think you are underestimating what is done here. There are two time series, not two time points. They are fairly well-behaved and are assumed to have the following properties: (a) on Election Eve, they end up at the final outcome; (b) they are sampled from a well-behaved distribution, a Gaussian; and (c) they have a width parameter. One could refine (b) and (c), but the result won’t change much. One could imagine an even more black-swan-like event. But like the Spanish Inquisition, nobody expects those, let alone models them.

One could mine pre-2004 polling data (probably a minor issue, but worth looking into). The 2004 and 2008 races have events that were considered large at the time: the Swift Boat campaign, Sarah Palin, and major economic collapse. They are the only races with such dense polling data.

What I am giving is an

true probability, as opposed to some more elaborate calculations that give that appearance. The probability here could be used for long-term planning!Billy// Aug 3, 2012 at 9:00 pmSam, why do you use historical SDs for the metamargin when you could simply use the accumulated SD since June 2012? In this way the prediction should get more accurate as polling data accumulates towards Election Day (just like a weather forecast). When you use the historical SD, you could argue that any aggregate data from previous cycles are appropriate. Of course, my point would be moot if you had data showing that the historical SDs for the meta-margin were pretty much the same, but you had only showed it for 2008.

Thanks for an interesting article! I find your blog to be much more “rigorous” than other sites, even though you don’t post as often.

Sam Wang// Aug 3, 2012 at 10:11 pmI was tempted by that. However, variability for 2012 does not encompass the variability of a whole campaign season. Badni and you have me thinking about extracting information from pre-2004 campaigns, based on national polling data.

Henry// Aug 4, 2012 at 12:28 amIf you’re right, I can make something like a 50% expected return by November on Intrade. Even given fees, opportunity cost and the risk of Intrade accounts being frozen, that’s such a phenomenal return that I should invest a large proportion of my net worth in it. Would you recommend I do so? Are you doing so yourself? If not, why?

Sam Wang// Aug 4, 2012 at 5:53 amSee my previous essay on this topic. Basically, it seems like a good bet. However, is it ever advisable to “invest” a large proportion of one’s net worth in a gambling site?

Bill N// Aug 4, 2012 at 9:27 amI am just thinking off the top of my head here. The 2004 and 2008 time series can be conceptualized as a sample of 2 from a population of such time series. The 2012 series is a still unfolding third such time series. Might the cross correlation (it would be a lag 4-year cross correlation?), together with the autocorrelations in these time series give information to help refine the estimate of the variability (your b and c above)? You may have in a way already addressed this in your post where you comment that the estimate of the variability (the width parameter) is not likely to change to a practically significant degree.

The other thing I am thinking, and unfortunately this moves I think in the direction of what Nate Silver tries to do, is to adjust your estimate of variability for systematic effects such as the new voter ID laws in some states and the efforts, such as in Ohio, to limit voting hours in some counties, and not in others. The problem with this is that you have to make certain assumptions when you model such effects, and this can lead you into some of the same problems you mentioned concerning Nate’s modeling.

By the way, I have never heard of the black swan event before, and I am assuming this is a metaphor for a relatively rare, unusual event which is for all practical purposes unpredictable.

Sam Wang// Aug 4, 2012 at 11:39 amThere is the possibility of modeling detailed effects such as voter ID, but the problem is one of verification. Which effects are largest? Which effects are already contained implicitly or explicitly in polling data? The parsimonious approach is avoid such details. And with a very few exceptions, I am unaware of empirical evidence that such corrections matter. If one does want to do something more complicated, as he does for a living, the best way would be to do it in a way that allowed performance to be assessed afterward.

In regard to historical data, the problem is a simple one: estimate the likely SD from July to October in units of major-candidate popular margin. This can be obtained from Gallup national tracking data. Looking over that, it appears that some years are more variable, for example when neither candidate is up for re-election.

I agree that time-series information from past years is useful for constraining what may happen this fall. The autocorrelation of the EV estimator (or the Meta-margin) can be used for short-term prediction. This is my current plan for September/October.

Olav Grinde// Aug 4, 2012 at 11:32 amThis is most enlightening! Your comparison between recent elections and 2012 enabled me to get more of a grasp on what is going on here. Although I did undergraduate studies in mathematics and liked “clean” probability theory, I must confess I always had difficulty with statistics.

I have a question: would it be worthwhile to do a historical study of past presidential elections, using your methods on the state polls that were then carried out? And moreover, if you have done so, what do such studies show?

Olav Grinde// Aug 4, 2012 at 11:44 amBill N addresses the integrity of the election, and mentions two systematic efforts to undermine it by swinging the voting results in a “desired” direction.

It strikes me that there are many other factors that have the potential of undermining election integrity, and I would be most interested if you were to address some of these.

TheLastBrainLeft// Aug 4, 2012 at 12:39 pmObama is going to lose Ohio, Florida and Virginia. If there’s a path to 270 without these states, I’d be shocked.

LondonYoung// Aug 4, 2012 at 12:52 pmNice.

The one thing I would add is that in real world experience, outcome distributions have fat tails. So, you might say that Obama’s chances are capped at 91% right now, and his actual win probability is lower due to the weaknesses of the model.

Note that intrade is 97.4/97.8 on “one of Obama or Romney to win”. If nothing else, presidents and candidates have taken a lot of lead over the years – and intrade bettors recognize this.

So, Obama should be greater than 50% (since he is favored) and less than 91%. So, the next step is how much to haircut the 91% in the direction of 50/50 … So, Prof. Wang, any thoughts on the next order correction?

Sam Wang// Aug 4, 2012 at 3:59 pmLondonYoung, a better estimate of SD might come from examining pre-2004 races, some of which are rather variable. This leads to a sum of normal distributions. This leads to…t-distribution, I hope, since I can type tcdf() easily.

Ethan Straffin…yes, Silver’s distribution is always smooth-looking. It comes from having lots of probabilities between 20% and 80% in states that are not actually in play. This is the correlated-error thing people talk about. It’s part of what I refer to when I point out the conceptual error of “counting uncertainty twice.” Anyway, all that false uncertainty leads to a far larger number of likely outcomes.

Regarding specific outcomes, I don’t have anything further to say about that, since I am not much into listing permutations – there’s a reason I automated this! (Rachel Findley below explains more below.)

Sam Wang// Aug 4, 2012 at 5:19 pmGallup has several time series of Presidential tracking polls (see here and here).

The SD of the Democratic-Republican margin for re-election races is 4.9% (2004), 2.9% (1996), 4.3% (1984), and 4.0% (1980). For races with two new candidates it’s 2.4% (2008), 5.8% (2000), 4.9% (1992), and 7.7% (1988).

Two thoughts:

(1) By this measure, 2008 and 2004 were not exceptional years. But boy, GHW Bush made quite a comeback in 1988.

(2) I wonder if re-election races are less variable in general, for instance because one candidate is more of a known quantity (the referendum-on-the-incumbent hypothesis).

LondonYoung: In some sense, fat tails are like asking if this year has a high SD, which we won’t know until afterward. Perhaps a t-distribution with a low-ish number of degrees of freedom. normcdf(1.36,1,0)=0.9145 and tcdf(1.36,8)=0.8945. That’s a decrease in the win probability from 91%…to 89%. That is a fairly minor adjustment.

LondonYoung// Aug 4, 2012 at 12:58 pmOh, and LastBrain – go to 270towin and set OH, FL, VA *and* NC to Romney – all others tossups to Obama. Obama wins, and the map wouldn’t shock me as a final outcome.

Olav Grinde// Aug 4, 2012 at 1:15 pmLondonYoung, that’s a fascinating scenario — and a striking rebuttal of LastLeftBrain’s point.

Ethan Straffin// Aug 4, 2012 at 3:17 pmHi Sam…thanks for a most interesting site! I was wondering if you could comment briefly on the shape of the EV histogram, which looks strikingly different from (say) Nate Silver’s graph over at 538.com. Both graphs have their highest spike where you’d expect at 332 (in which Obama wins all states that currently lean toward him, including the tightest races in FL and VA), with lower spikes at 319 (Obama loses VA) and 347 (Obama adds NC). But beyond that, your graph has some features that puzzle me. The height of the spikes at 313 and 326 might be justified if Obama’s on shakier ground in Iowa than Nate thinks, but it seems bizarre to me that there are no significant spikes at all below 313: not even at 303 (Obama wins VA but loses FL) or 290 (Obama loses both VA and FL). These are the third and sixth highest spikes on Nate’s graph, and they’re clearly plausible scenarios, so I’m wondering what’s up.

Rachel Findley// Aug 4, 2012 at 5:32 pmEthan, the spikes are where (a) there’s a pretty high probability of getting an EV total in the general ballpark and (b) there are several ways of getting that EV total. For example, there are 3 ways for Obama to get 332 EV using 270towin’s “tossup” states : Romney gets NC; Romney gets CO and IA; Romney gets CO and NV. The specific locations of the spikes reflect the lumpiness of the electoral college winner-take-all system; the prominence of a general region reflects the likelihood of a candidate winning about that many votes. 217, 221, 284, and 290 are all a lot more likely than their neighbors–but the probability of each is less than 1% so they don’t show up on the chart. The whole point of using a calculation is that the number of possibilities is too dazzling to go through one by one.

LondonYoung// Aug 4, 2012 at 6:12 pmIntuitively, I think 89% is just not a reasonable re-election probability at this point in time. But I don’t have a quantitative handle to offer on why (other than my respect for intrade). t-distributions just don’t have fat enough tails, and I also think there needs to be some “reversion to 50/50” component as well …

Sam Wang// Aug 4, 2012 at 8:43 pmLondonYoung – I don’t respect InTrade, quantitatively speaking. They get the sign correct most of the time. Their prices are monotonic with what actual probabilities. See this analysis from a few years ago. I would understand your skepticism better if it were rendered as an argument regarding the shape and tail structure of the prior distribution. For example, if you imagine the SD of the Meta-margin is 4% instead of 2.2%, then tcdf(3.0/4,8)=0.76. However, this is getting into the territory of building a calculation to suit one’s intuitions.

Ethan Straffin, please read the FAQ. Your intuition is wrong. Your example lists 10 states. Those states have 2^10=1024 possible ways they can fall. In this case, your scenario covers 0.1% of the total probability, which contributes to probability at 303 EV exactly, not near 303.

Ethan Straffin// Aug 4, 2012 at 6:22 pmSam, Rachel…thanks, I get all that, but I’m still not convinced that it explains the discrepancies. Try this simple experiment: start with the default toss-up state map on 270towin, and give NV, CO, IA, WI, OH, PA, NH, and VA to Obama, and NC and FL to Romney. This gives Obama 303 EVs and is clearly a highly plausible scenario: in fact, it differs from what both models agree is the MOST likely scenario only in the outcome for Florida, the swingiest state in the country! Yet there is NO spike on Sam’s histogram anywhere near 303. I simply can’t see how this is possible.

LondonYoung// Aug 5, 2012 at 9:04 am“this is getting into the territory of building a calculation to suit one’s intuitions” – this is a useful statement! Yes, that is exactly what I want to do. On the one hand, I might call this “step one” of the scientific method – forming the hypothesis. On the other hand, I have spent long years conduction arbitrage of one thing against another with little thought for the overall picture. So, here I really care about the overall picture, and may be misleading myself.

Thus, I am tempted to look at your intrade data from last cycle and model it. By eyeball, I start by saying there is a 5% chance for either side of a disruptive event that will clinch the election. The “interior range” looks like 50% + 3*(Poll Lead). So, I am tempted to use the intrade data to say there is a 10% chance of on “October surprise” and a 90% chance of stochastic drift. This would pull your 89% Obama win prob down to about 84%.

The homework would be to see if 10% of elections are indeed disrupted in this way … But, thinking back to arbitrage, it seems that pairs trading approaches would be much safer on intrade than stacking up directional bets …. Kinda thinking aloud here …

badni// Aug 5, 2012 at 10:18 amI think your assessment of the SD of Gallup time series in various elections addresses my question, about data points. My point about data points was that you were looking at time series of points where there were only a grand total of two data points of the two most important variables: (matchup, and year (implicitly, the big events, economy, etc., of that year)). I think if your comparison of the prior Gallup years gives you at least a rough idea that the volatility of races with incumbents is similar, that pretty much addresses my issue.

But does it address this question: In all races where a candidate led by a certain margin at a certain date (or averaged a certain lead over the most recent 14 days, or something like that), what percent of those races did the leader eventually go on to win? I think you probably have to either cancel out convention bounces, or only calculate for dates at least a few weeks before and at least a few weeks after, or do a fairly long trailing average margin, since conventions clearly throw off the predictive value of any given day’s polls.

In any case, I am sure you would need to analyze that in a more sophisiticated way, since on any given date there is a vanishingly small sample of races where a lead was precisely X, but I think the point is correct: The only thing that matters in the end is win/loss.

As a layman, my question is always this: “my guy is up (down) X% with Y days to go. Is he safe? (toast)?

Sam Wang// Aug 5, 2012 at 12:12 pmLondonYoung– Yes, one could empirically fit to InTrade data. But pollsters measure opinion directly; everything else is an indirect measure. I could publish on the discrepancy if I cared to. So I find your point of view ironic, considering what you and I do for a living!It is easier to be skeptical/critical when the outcome is not to one’s liking. I fell prey to this in 2004, and learned from that to stick close to credible data. The sign of the outcome (D vs. R) mainly affects the gusto with which I address the task.

badni– Thank you. Perhaps I could apply the reasoning here to past data, and see if the summed probabilities added up to a correct number of hits. For example, if win probabilities in 1964/1972/1976/1984/1992/1996 (all re-election races) added up to around 500%, and the calculation predicted 5 out of 6 wins. However, data were sparser then. I am not sure whether they are good enough for this purpose.Because I view the Presidential race as mostly decided, I hope to polish up this predictor…then eventually address Senate/House. Not my strong suit, but far more important for both sides.

Sam Wang// Aug 5, 2012 at 11:32 pmLooking harder at the Gallup data back to 1968, evidence for fat-tailed distributions comes from Ronald Reagan’s 1980 advance from a June/July tie to a 10-point victory. The other six re-election races had the same June/July leader and November winner.

The SD is challenging to estimate from a handful of polls, all from one organization. For example, Gallup in 2004 is much more variable (SD=4.9%) than indicated by the Meta-analysis (SD~2%). For purposes of a win probability calculation, imagine that SD=3%. This gives re-elect = 84% (and SD = 4% gives re-elect = 77%).

Thank you for prodding me to look into this.

LondonYoung// Aug 5, 2012 at 2:39 pmI agree with what you say, except I would add the following: what voters say when answering pollsters right now is only a part of what scientists can say about what the voters will do in November.

Atlanta’s capitulation to Sherman was a rather foreseeable probability in 1864, but the actual news of the fact seems to have wildly changed all perceptions of Lincoln’s reelection chances.

To base a probability of reelection only on one set of the data is to assign zero value to all other data. I understand the “once bitten twice shy” attitude you have to adding in more data, but there is more data …

Sam Wang// Aug 5, 2012 at 5:49 pmIf I had a measure of how much genuinely new information was in these other variables, I would use the information with pleasure. From that point of view, what I have published in this post would be a prior, which could then be modified by other information. If I had a graduate student in this area, we could kill that problem pretty well.

Your suggestion would be very effective for individual local races, as well as predicting who will control the House and Senate in 2013. As you point out, this would have practical ramifications for real-life policies.

Brad S// Aug 5, 2012 at 11:32 pm>Obama is going to lose Ohio, Florida and Virginia. If there’s a path to 270 without these states, I’d be shocked.

270towin.com has Florida & Ohio leaning fairly blue, currently. Virginia seems to be a dead heat. Giving VA to Romney, I still have the race at: Obama 315-219.

Last time, just after Palin was announced & Obama’s odds dropped to 50-50, I put a significant amount on Obama via InTrade, which gave me a nice double. This time, I’ve wired several times more than in 2008 & am currently strongly bet on Obama again.

Two months ago, when things were much less obvious than today, I carefully looked at all the top-line state polling data on a couple of different sites, colored in my own 270-to-win map as conservatively as possible & determined that “maybe” the race could end up tied, with Romney winning in the House of Representatives.

I started betting on Romney to win, but as the polling data came in & as I looked carefully at Romney’s weaker speaking ability vs. the President’s, I quickly reversed my bet & am pretty comfortable now being “long” Obama in the high 50’s.

>If you’re right, I can make something like a 50% expected return by November on Intrade. Even given fees, opportunity cost and the risk of Intrade accounts being frozen, that’s such a phenomenal return that I should invest a large proportion of my net worth in it. Would you recommend I do so?

Not that anyone is asking me, but I would definitely *not* recommend betting heavily (vs one’s disposable income) on this election. The odds are currently well under 2-1 & there is in fact significant “tail risk” due to economic factors beyond anyone’s control.

I didn’t comment on Sam’s recent InTrade posting, but here’s some free advice from s/o who invests for a living & has taken home decent money from InTrade: it’s a good venue for long-shots only. The recent Supreme Court ACA decision was a case in point: 90% to near-zero overnight.

As in all speculation, one really should be getting extremely well-paid for the risk, b/c a lot of the time one’s opinions & thinking are wrong.

A good example of a political “value bet” that I like is Rubio for Republican VP, currently just over 10-1. Might be a loser, but it’s not a ridiculous bet that Romney would choose a “diversity” running mate just as McCain did. I placed a pretty aggressive bet, but still a fraction of my bet on Obama. While 10-1 is much more of a “phenomenal return” than 2-1, it’s commensurately more uncertain.

Matt McIrvin// Aug 6, 2012 at 4:14 amI was just looking at the 1980 Gallup data the other day. It’s a bit strange: Reagan pulls ahead at the very end, but it just doesn’t look like a landslide at all, not like 1984. It looks as if their numbers for both Carter and John Anderson were on the high side, and the discrepancies all added up to a margin for Reagan that was large enough for a big electoral win. But of course there wasn’t the sheer amount of polling back then that there is now.

Sam Wang// Aug 6, 2012 at 8:21 amMatt– One interesting aspect of the 1980 campaign is that the one and only debate occurred quite late, on October 28th. So that last-minute surge coincides with the debate.wheelers cat// Aug 6, 2012 at 8:01 amBrad S

Mitt might ask him, but I am 95% certain Rubio will refuse.

He has has a much better shot at the top of the ticket on 2016….plus there is the two mormon ticket thing.

Anyone without organic conservative tendency disadvantage understands that Obama is going to win by this point.

Henri Lafantunette// Aug 7, 2012 at 11:43 amI totally agree with wheelers cat.

wheelers cat// Aug 8, 2012 at 12:53 ammerci henri

the other datum is Rubio is only 35. No way he accepts.

I think Romneys other two bignamefactor choices are Christie and Ryan.

Christie is grossly fat. unelectable in the visual age.

So I like Ryan. Not Jindal, looks too foreign.

http://www.nationalreview.com/blogs/print/313326

Bill N// Aug 8, 2012 at 11:27 amIf Ryan is the VP choice, I think it will function to focus attention more than it is now on the Ryan budget and its associated cuts.