As I’ve written many times, you should never get too concerned with a single poll. But what about the flip side – can you trust aggregates of polls? The answer is yes. In Presidential and Congressional elections since 2000, the general approach I advocate has an exceptionally good track record. Simple meta-analysis of polls should do at least as well as more elaborate models and can also outperform electronic markets. Read on.
Of the following evidence, 2004 was amply documented on this website at the time. The rest was based on publicly available data and is easily confirmed.
The evidence from 2004. On Election Eve, the 2004 Meta-Analysis indicated an outcome of Bush 286 EV, Kerry 252 EV – the exact outcome. The exact match was to some extent a lucky hit. But even at the level of single states, the only incorrect call was Wisconsin, which was won by one percentage point. As you can see from the 2004 history (here plotted using the 2008 averaging rule), the race was stable for most of October, and the outcome should have been no surprise.
Further lessons from 2004. At the time, I made additional assumptions about undecided voters splitting unequally in favor of Kerry and a difference in turnout from pollster expectations. These assumptions were wrong. This is why I am so critical of speculations that go beyond data. As I have previously written, there is very good evidence that the “Bradley effect” dissipated in the mid-1990s. The possibility of a “reverse Bradley effect” is unfounded. The omission of cell-phone-only voters may undersample support for Obama, a fact documented by the Pew Center.
2000: The origin of the Meta-Analysis. In 2000, I monitored Presidential polls using a proto-aggregate site run by Ryan Lizza at The New Republic. It became clear that the election would hinge on Florida. As we know, that turned out to be correct. This was a different picture than than national opinion polls, which suggested a Bush popular vote win that never materialized.
2006: Congressional midterm elections. At the level of the Senate and House, high-quality meta-analysis requires more data than are typically available in single races. So the thing to do is to combine data from all states.
In the Senate, polling data suggested a 50-50 chance of a Democratic takeover, which occurred. At the time, Intrade showed the probability as only 25%. Public commentators were, for the most part, caught flatfooted.
The House was harder to predict because many competitive districts had only one poll – or no poll at all. However, it was still possible to aggregate all available polls by simply counting every lead, no matter how small, as a win for that side. Pollster.com’s aggregated House polling data showed 231 D, 197 R. The remaining steps are to split equally the seven districts with no polls, and to use binomial statistics to place a confidence interval. This led to a prediction of 234.5 +/- 3.0 D, 200.0 +/- 3.0 R, a gain of 30-35 seats. The actual gain a final outcome of 233 D, 202 R, a gain of 31 seats, well within error.
Similar documentation can be found at Andy Tanenbaum’s electoral-vote.com. Tanenbaum is the pioneer of poll aggregation and is worth reading on this subject. He has analyzed data since 2000. I will provide a link when I find it.
The only remaining question is…what about this year? Answers on Monday.