Princeton Election Consortium

A first draft of electoral history. Since 2004

A brief history of Presidential prediction

November 1st, 2008, 10:37pm by Sam Wang


Most of you are interested in predictions of Tuesday’s outcome based on state polls. This is an area that started in 2000, acquired cult status in 2004 (WSJ link and alternate), and went fully mainstream in 2008. But before that, political scientists took their stab at the prediction problem. How do all the methods compare?

Here’s the upshot: In the spring and summer, political scientists generate the best predictions. Starting around Labor Day, state poll aggregation becomes the way to go. Online hobbyists like Andrew Tanenbaum (electoral-vote.com), Nate Silver (FiveThirtyEight), and I all use the same data, so our estimates tend to be quite similar. Small differences arise because we use different definitions of which polls are fresh, as well as some other assumptions. On Wednesday morning, after the votes are in, we’ll find out which of us is closest.

And now, a brief history of prediction.

Models by political scientists. For years, political scientists have been interested in using economic variables and incumbent-president approval ratings to predict Presidential election winners (for one review see this paper, especially page 424). They focus on the popular vote, which is a good but not perfect predictor of the winner, a lesson learned in 2000. That year they overestimated Gore’s popular win percentage. Some models incorporate national polling data into their models, which improves performance after Labor Day (no surprise there).

Such models do well during spring and summer months of the election year. This one, by Wlezien and Erikson, has a hit rate of about 75%:

That’s a time when state polls alone aren’t predictive, as my analysis from 2004 showed:

Median EV estimator from 2004 race

In 2004, both state polls and political scientists succeeded in their own way. The Meta-Analysis indicated an Electoral College outcome of Bush 286 EV, Kerry 252 EV, the correct result. On the political scientist side, see the last paragraph on page 750 of Wlezien and Erikson’s paper, which predicted a narrow Bush win. The two approaches converged nicely. This year, political scientists’ models predict a large Democratic victory (for a summary of many of them, read PollyVote).

My synthesis of this is that we can know what the natural “set point” of a Presidential campaign from political scientists’ models. The current projection of Obama 364 EV, McCain 174 EV, may not be far from that point.

In this context, what polls show us is movement toward and away from the set point. These movements can be driven by campaign events as well as other events. Readers of the Princeton Election Consortium know the big events: the “Celebrity” ad, McCain’s houses gaffe, the Palin VP selection (and subsequent flameout), and the first debate.

Poll meta-analysis. Now let’s turn to state poll meta-analysis. Many of you are familiar with my nitpicking over Nate Silver’s methods. But at core, we are doing the same thing: (1) estimating state by state win probabilities, then (2) combining all the probabilities to generate a distribution of possible outcomes. To my knowledge the main pioneers in this activity were me and Andrea Moro, as well as a few others.

One question comes up: can we improve on state polls alone? If only state polls are used to generate win probabilities, then my approach is more or less optimal. Beyond that, I would define improvements as additional modeling steps that (a) increase the probability of getting the winner right, or (b) increase the accuracy of the EV total or vote share.

Nate Silver is attempting to make such improvements. We don’t know yet whether his many assumptions (which keep changing – here’s today’s installment) constitute any improvement over “naked” polls. He claimed success in primaries, but he hasn’t been tested in a general election. This Tuesday’s election will serve as that test. For that matter, he could try his hand at the 2004 data – the acid test there would be whether he could get Wisconsin right.

Although I’ve made much of the differences between our methods, what’s interesting is that his added assumptions lead to few significant differences. For example, consider our state-by-state Obama win probabilities, which are quite similar.

Comparison of FiveThirtyEight and PEC win probabilities, 1 Nov 2008

Comparison of FiveThirtyEight and PEC win probabilities, 1 Nov 2008

For all intents and purposes, these probabilities are identical (r=0.989). Discrepancies may arise from the fact that he accepts polls over a much larger time window, or from a national poll-based correction he applies. The fancier stuff probably doesn’t matter much. Here are the biggest differences:

State Electoral votes PEC win % 538 win %
Florida 27 98% 64%
Indiana 11 50% 34%
Missouri 11 50% 43%
North Carolina 15 77% 66%
North Dakota 3 50% 24%

Of these differences, Florida has the largest effect on the expected outcome.

Anyway, I think all that extra math is like watching the chef at Benihana. The knives fly around to impress and scare the customers, but the product tastes the same as other restaurants, and it’s okay as long as no damage is done.

-Sam

P.S. Since electronic markets derive their predictive value from polling data, I’ll cover them separately. But not tonight.

P.P.S. Is Benihana still around?

Tags: 2008 Election

33 Comments so far ↓

  • Evans

    Benihana may have been replaced by any number of other, similar restaurants, but I doubt the state-by-state method will be supplanted until we change the electoral college method of electing a president.

    I’ve noticed that pretty much all these polling websites are saying the same thing, so I mostly go one place or another for the level of commentary. As each of you comes up with a statistical answer… it’s going to be tough to say which method is better when it could be just luck. Even on aggregate, in one election, you still have a small sample size of 50 states, and a smaller sample, about 15, that anybody even cares to look at the polls about. So, in those 15 states, you have both presidential and senate races pretty well polled (and all people really care about predicting well). Is 30 tests enough to declare a winner?

  • Takuan

    I can’t comment much on the actual substance of your post, as it’s quite beyond me (but interesting nonetheless!), but I can at least tell you that I did visit a Benihana in my area a few years back and can verify that they are still around.

    Thank you for the fantastic information. Even if I don’t truly understand it, I love to read this site.

  • Marc

    If there is no significant difference in the results, then simplest method should be chosen…

  • Sam Wang

    Evans – I agree that a comparison will be hard. In any event, the overarching message of the Meta-Analysis is that we can estimate the overall EV outcome even without being certain of individual states. So I am more confident of getting closest on the EV total than on getting individual states correct.

  • The Liberal Crab

    Sam, good thoughts. My only problem (and maybe I don’t read the commentary enough) is that everything is purely based on the stats, without qualitative analysis to go with it.

    This occurred to me as I was analyzing some of the data from Mark, Nate, and yourself. I have spent a lot of years analyzing EVM for Navy programs (including multi-billion dollar aircraft carriers). As a consultant, I tried to bring commentary and a qualitative analysis to the strict quantitative results usually provided to our clients. What I was able to do was improve on the forecasts – because numbers are lagging indicators. This technique was able, time after time, to project the eventual future success (or failure) of the program.

    When Mark and Nate (I didn’t read much from you) were claiming that the tightening a week ago was just statistical noise – I was already railing on the fact they were missing what was going on (sometimes relying too much on numbers does that). Three days later – everyone was claiming there was tightening.

    I already posted what I think Tuesday will look like. We’ll see how good this EVM technique moves to polling (BTW, also Six Sigma greenbelt…so do have some stats background, but nothing like you all have).

  • Sam Wang

    Liberal Crab – Yes, commentary’s good. I haven’t been doing so much of that. I’d better step up the pace!

    In fact, I believe the tightening is just statistical noise. The current median national margin is Obama leading by 7.0 +/- 0.9%, same it as ever was, same as it ever was…

  • Frank

    Sam, Pollster.com lists eight national polls ending 10/31/08. Their median margin is +7D as you report, but if you compare each margin with the same pollster’s previous margin, the differences (latest minus previous) are 0, 0, 0, 1, 2, 2, 2, 3. There seems to be a widening on average. What’s going on here?

  • George

    There is a political science professor — Alan Lichtman — who wrote the book “13 Keys To The White House” in 1996.

    Due to the declining fortunes of the Bush Administration on a declining home front economy and a bad foreign wars environment, he predicted way back in February 2006 that any democratic prez nominee would be elected to the White House in November 2008.

  • George

    There are other political science / economic professors who try to do prez predictions.

    One is Ray Fair at Yale who predicts Obama will get about 52% of the vote — this was before the mid-Sept. economic melt-down. Obama’s % vote may be greater now.

    Another is Professor Douglas Hibbs of Sweden who has a “Bread and Peace” model. The 5,000 war dead from Iraq and the declining growth of per capita income means the Republican Party will suffer in the popular vote totals.

  • Paul

    Liberal Crab & Sam: I like commentary too, and I think that’s the best part of 538.com. I totally agree with Sam ‘s Benihana analogy, but I love the writing on 538. It captures a lot of the subjective sense of this race, the campaigns and their ground games, the different parts of the country, and the nature of polling.

    In short: my award goes to 538 in the commentary category, PEC in the statistical analysis category.

  • gprimos1

    I was thinking that one way to compare models is to see who has the tightest CI that contains the final EV total. Although they are not explicitly listed, Nate’s model obviously has much larger CIs and probably will on election eve. So if the final EV falls within Dr Wang’s CI, we can conclude that, yes, Nate = Benihanna. If the opposite happens though, then what kind of food does that make this model?

  • mark

    gprismos1,

    I suggest a sum of squared differences. That is how my office will be running our EV Office Pool :)

    Thanks again for a wonderful resource, Sam.

    Mark

  • Sam Wang

    Frank – That’s a good way to look at it. Your numbers give a swing of 1.5 +/- 0.5%, basically over three days. Hmmm, could be.

    Paul – Truthfully, I am now realizing that it was quite brilliant of him to marry the analysis with color commentary. It’s the kind of thing that made this site cult-y in 2004. He took it to the next level.

    gprimos1 – If we do worse, it might make us the equivalent of supermarket sushi.

    However, I am thinking the difference will be quite small: 5-10 EV error in the total count, missing North Dakota’s win probability, that kind of thing.

  • George

    Regardless of who’s site is most accurate … yours is the one that sold me two books … yours and “Your Inner Fish.”

    So you win on that count, regardless of the EVs on Tuesday

  • DFS

    I have to say I find this site the most “comforting” (although I still get neurotic about movements in the meta-margin and electoral vote projection). Its because you dont post individual poll results, thereby allowing me to avoid the panic over the one or two polls that look “bad” for Obama. Every day I swear this is the only site I am going to view, and almost every day I end up regretting breaking my vow.

    Here’s hoping your model is spot on.

  • Sam Wang

    In a deep sense, the Meta-analysis cannot be wrong by definition: it’s a simple form of data reduction that reduces noise more than any other source on the Intertubes. It contains all state polls, which I supplement with an occasional comment on national polls.

    The one potential problem is the possibility that overall, polls may be systematically different from the actual result. This has been an object of speculations such as the cell-phone problem and the Bradley effect. See left sidebar for my responses to those. In any event, nobody has a good empirical answer to correcting polls. So logically, you should read this site (or perhaps other poll aggregators) and stop looking at individual polls!

    Now, as a neuroscientist, I realize that logic alone does not feed our heads

  • The Liberal Crab

    Yeah, Nate definitely adds some commentary to it. But, what I’d like to to see is rather than commentary of where we are, is using your guys skills to project out what the future will bring by being more in-depth on the leading edge of an indicator – prior to having sufficient info to see a trend. It’s difficult and reputations can suffer if you are wrong. But, I believe you all are smart enough that you’d be right more than you’d be wrong. That would separate the analysis from all the other sites…let’s face it..everyone is saying basically the same thing.

    Thanks for taking the time to consider my comments.

  • Scott

    @Sam

    Have you taken a look at this method?

    http://stochasticdemocracy.blogspot.com/

    Which uses something similar to a Kalman filtering for predictions. The state by state probabilities are all in line with both yours and 538.

  • Sam Wang

    Liberal Crab – It depends on what you mean. As early as summer, the political scientists’ set-point model tells us where things are likely to be headed. I think this can help us put campaign events in context.

    If you look through my comments, I have identified several plateaus and turning points on the first day that they occurred. This comes from watching changes in slope and noise fluctuations in the Median EV Estimator, as well as campaign events.

    Sadly, however, I am about to run out of opportunities to do this for four years – and for an open race with no incumbent, for eight years!

    Scott – I wrote about Stochastic Democracy just the other day. The work is a little rough around the edges, but that kid knows some useful math. I think that some combination of his tricks and mine would be optimal. The difficulty with any filter is identifying an appropriate half-life. Currently 7 days looks about right.

    As I have indicated, poll meta-analyzers had better obtain similar results since we are all using the same data!

  • The Liberal Crab

    Sam – you are right. As I go back and read things more closely. I do think you guys are the future of polling. Keep tweaking. I see no reason to follow the individual polls when you guys are aggregating. The fact, as you say, you all come out with similar results, speaks to the fact no one is out of wack. Of course, this is easier with a blow out election. :)

    Let’s hope we have to wait for eight years before really testing!

    BTW, you and Nate are definitely (from what I can tell) supporters of the Dems. Are there any similar Repubs doing this analysis?

  • Sam Wang

    Liberal Crab – see my 1,000,000th view post. I link to a veteran Republican poll aggregator there. Aggregators are listed by partisan leaning at 3BlueDudes.com.

  • David Shor

    Dr. Wang,

    I don’t think Half-Life would be the right term, since polls don’t “decay” exponentially. So rates of decay are not constant.

    If polls follow a non-mean-reverting random walk(or at least one with high persistence), then the “decay” of a poll’s weight approximately follows an inverse square relationship over time. ( http://stochasticdemocracy.blogspot.com/2008/08/time-discounting-of-polls.html )

    But if the “half-life” is defined as “time it takes for a poll’s standard deviation to double”, then under currently estimated parameters, a national poll with 1000 respondents would have a “half life” of about 8 days.

  • blair alef

    Prediction vs. snapshot. Nate’s (prediction)model works better a few months out from the election when there is less state polling and his sub-model (538 regression) of voter preferences by demographic and psychographic status is weighted more heavily. It was this particular regression that allowed him to predict the results of some primary elections that went counter to the polling at the time.

    He missed one very good opportunity to truely predict the race in September by not including an adjustment for convention bounce. He had identified it, and it was discussed extensively on his site at the time, but it was excluded from the model due to overwhelming popular opposition to the idea. If he had included this adjustment he would have been predicting Obama EV’s above 338 consistently since June. By not including this adjustment – a quantifiable statistical factor based on twenty years of Presidential election data – his prediction model was irrelevant for that month.

    Certain other elements of his prediction model penalize the leader more and more the closer it gets to election day (based on historical data of races tightening in the last few weeks of an election). That’s why his predictions have been more conservative in the last few weeks versus the PEC snapshots.

    I think four years from now 538 will be a great site to watch in spring and summer, PEC will be the right place to be after the conventions.

  • Mark S

    I was really glad to see this state-by-state comparison with 538.

    The most noteworthy difference is that 538 has always estimated much higher uncertainty than here. Until now the larger uncertainty on 538 occurred because theirs was a projection for the future (November 4) rather than an estimate for today. Now the uncertainty estimates should be converging – but 538 still gives McCain a 3% chance of winning. Here, McCain’s chances are in the snowball-in-hell range.

    It is may be difficult to evaluate these small probabilities – I would guess that polling history cannot distinguish between 3% and never never never (except by assuming the ubiquitous Gaussian distribution).

    However: It is part of human nature to be overconfident and to underestimate uncertainty. This is especially true for a very unlikely event – like a space shuttle disaster, or a nuclear accident, or a McCain-Palin victory. This is why I instinctively side with the highest (within reason) uncertainty estimate.

  • David Shor

    Blair,

    The convention bounce adjustment was quite questionable.

    It is a statistical sin to run a complex functional regression through so few datapoints, and he made no attempt to quantify uncertainty in the coefficients of his formula. (I’m avoiding the fact that he assumed that the convention bounce would apply equally to every state, when it turned out that McCain’s bounce was concentrated in red states)

    Moreover, the Republican convention was interrupted by a hurricane, and his vice presidential pick attracted unprecedented attention from the media. This was different than anything that had happened before, and it would have been incorrect to apply the formula at that point.

    Nate’s model was not “irrelevant” during that month, his model was pretty close to the optimal prediction at the time. McCain didn’t fall in the polls because his convention bounce faded, it fell because his vice president became a laughing stock as the economy collapsed. No model could have predicted that.

  • dw

    As someone who uses PECOTA every year in my roto league draft, your state win probabilities plot vis-a-vis 538 doesn’t surprise me one bit. His individual player predictions are often very conservative. (I don’t think he’s had a player predicted with more than 40 homers in the last 5 years, e.g.) OTOH, in staying conservative he tends to not oversell, where other player prediction systems often do. And even with a conservative player prediction system you can still end up with good team predictions.

    I tend to look at 538 as an attempt to make PECOTA into a state prediction system. PEC isn’t that; it’s the equivalent of taking all the past stats of all the players in the AL and determining the pennant winner. Of course, the AL has thousands of events (games) determining the pennant, where an election is, in the end, one single event with multiple events leading up to it.

    I’m not sure you two are trying to do the same thing. On the surface, it looks like you are, but it seems like you’re closer to that “single event” model, where Nate is using multiple events to aggregate a single model. I don’t think that’s a bad thing, but I tend to look at you and 538 and Pollster and electoral-vote and try to see where the trend is across the board. After all, while PECOTA may be my primary data source for my draft, but it’s not my only data source.

  • Eddie

    I think Mark S hits it on the head.

  • FrankS

    One difference (at least as I understand it) is that the fivethirtyeight approach models the similarity of states to determine how states that have not been polled recently are likely to have changed over time. This does not matter much right now because of the density of data points but would matter considerably if you had less data (or randomly sampled the data points).

    There are many possible models that could be developed. It is definitely an art rather than a science … the dreaded “diapers and beer” story from data mining shows that intuitions are not always right. Whether the combination of location, demographics, and other (e.g. “Starbucks to Walmart ratio”) components do well remains to be seen. Again, Nate would have to rerun with downsampled information to see if the detailed state model is helping or not.

  • Magic Dog

    “On line hobbyists.” I liked that. Who says a boring old political scientists can’t get a jab in when the time is right?

  • Magic Dog

    By the way, I don’t think 538.com (or this site) has been any better, or even substantially different from, the sites that simply take the latest polls and tell you what the results would be today. They’re both a lot of fun, and that’s enough for me.

    My own prediction method is one that I heard of almost 20 years ago. Take the change in the national unemployment rate in the second quarter of an election year, i.e., June unemployment v March unemployment.

    If it stays level or goes up, the incumbent party loses. If it goes down, the incumbent party wins. This method has worked in every election since 1948, except for 1956, when Eisenhower won in spite of a rise in unemployment that spring. I think the reason it didn’t work that year is because, in most years, a change in Q2 unemployment is part of a continuing trend, but in 1956 the change was small to begin with and then was reversed.

    In the years (1960, 1968, 2000) when unemployment was flat in Q2, the fall election was narrowly decided against the incumbent. In two of those three years (1960 and 2000), the outcome was so narrow that the legitimacy of the results were in dispute.

    Sam, you’ll need to give it over to the political scientists (as opposed to the mere hobbyists) for thorough study, but the magnitude of the change in Q2 unemployment doesn’t seem to correlate with the margin of victory. Only the direction is forecast.

    In Q2 of 2008, the unemployment rate rose by 0.4%, from 5.1% to 5.5%. I was watching this closely, and when the statistics came out in July, I sent all my friends an e-mail explaining the unemployment change metric and suggesting that they get ready for President Obama.

  • Magic Dog

    p.s.: By “incumbent” above, I mean the nominee of the incumbent party.

  • Magic Dog

    p.p.s.: Sorry for too many comments, but you don’t have an edit function. The fact that I predicted an Obama victory in July has not kept me from pure obsession this year. I am always interested in politics, but the 2008 election has been an outlier. I can’t do anything else. Wednesday will be for gloating, but what on earth will I do on Thursday?

Leave a Comment