Our Presidential predictor isn’t quite ready, but Andrew Sullivan’s link motivates me to put up a provisional draft on how I’m thinking about it.
And now the explanation, with caveats (which will be updated over the coming days).
My overarching goal is to draw upon a better-established field, weather forecasting, for a conceptual framework. This post gives a method for a long-range forecast (i.e. July/August to November). At a future date I will address the issue of short-range forecasting, which seems to be possible with an outlook of about 1 month.
As I wrote yesterday, in attempting to make a prediction it is often very hard to know whether information has already been accounted for. Does adding third-quarter unemployment improve predictive power, and is it already reflected in polls? However, adding such a variable will always increase uncertainty. Therefore any complex model is potentially less useful than its simpler cousin.
A demonstration of the power of simplicity, using state polls alone the Princeton Election Consortium’s simple algorithm did very well in 2004 and 2008: errors of <5 EV and equivalent to <0.5% popular vote margin. Can this be harnessed in for predictive purposes? To quote Miss Sarah, you betcha.
First, let’s examine the 2008 EV history:
It bounces around, but note that from mid-June to the start of September, it showed a comfortable lead for then-Senator Obama, and spent most of its time within several dozen EV of the final outcome. Under the assumption that the EV estimator is a day-to-day gauge of the race, a long-range prediction can then be made.
On any date in Summer and Fall of 2004 and 2008, a good estimator of the November outcome was what had happened so far: a Bush-Kerry dead heat and an Obama lead. Similarly, today our best estimate of Election Day performance is the average of June and July 2012. Right now that’s approximately Obama 315 EV, Meta-margin of +3.0%.
How to estimate the June-to-October variability? Again, use the past. The standard deviation of the 2008 EV estimator was 28.9 EV. In 2004, the standard deviation in the Bush-Kerry race was 29.0 EV. Let us assume that 2012 is similar.
A long-range predictor of the November outcome can then be predicted by building a confidence interval around the 315-EV midpoint. This is best done in units of the Meta-margin, defined as how much popular-vote swing would tie the race. In 2008 it looked like this:
Why use these units? Because the Meta-margin is roughly distributed like a Gaussian. (For tech-weenies: MATLAB kurtosis = 2.73.) In 2008, the standard deviation of the Meta-margin was 2.2%. Therefore our November prediction is Obama +3.0 +/- 2.2% (1 sigma).
That gives a 68% (1-sigma) confidence interval of 285-339 EV shown in the red band at the top of this post, and a 95% (2-sigma) confidence interval of 257-358 EV shown in the yellow band. (Note the resemblance to a hurricane strike zone!)
One can also derive a win probability. An average Meta-margin of +3.0% with an SD of 2.2% gives a lead of 3.0/2.2 = 1.36 sigma. Plugged into the Gaussian distribution function (MATLAB: normcdf(1.36,0,1)) this gives a 91% win probability for Obama of 91%, or as I said the other day, 10-1 odds.
This argument is open to discussion, and I’ll stop now. Have at it in comments.