November 1, 2012 by Sam Wang

(original version published on temporary site; comment thread)

In this year’s race, national polls show a tie at the moment, while state polls show a decisive Obama advantage. Here I suggest that the difference may arise from the fact that the same systematic pollster errors can have different ultimate effects, depending on whether they occur in national vs. state surveys. Based on past elections, national poll aggregates differ from election results by as much as 2.5%. During the same period, state-poll aggregation has been considerably more accurate. The core reason is this: even if state polls have the same accuracy as national polls, races at that level are usually decided by larger margins, leaving room for aggregation to remove the effect of the error. For this reason, I suggest that the Meta-Analysis of state polls provides a more accurate poll-based prediction of next Tuesday’s outcome than national polls.

In the Wall Street Journal (October 31, 2012), Karl Rove surprises basically nobody by predicting a Romney win. His reason? He cites a Romney lead in some national polls. This has become a rallying cry for the right. But is ‘the math’ correct?

Here at the Princeton Election Consortium, the Meta-Analysis points toward an Obama electoral victory. The median outcome is Obama 308, Romney 230 EV, with a Meta-Margin of Obama +2.4%+/-0.5%. To put it into plain English: If state polls are on the whole as accurate as they have been in past elections, then Obama will win.

However, national polls give a different result. National polls since October 14th give a tied median, ‘Obamney’ +0.0 +/- 0.3% (n=44 polls, median +/- estimated SEM). Indeed, the discrepancy with the Meta-Analysis has been over 2.0% all season.

What is going on? Nate Silver chewed it over yesterday. Let’s go through some possible reasons using PEC’s approaches.

Do differences in national and state poll methods account for the discrepancy? If we only accept polls from organizations that survey both the national race and individual states, we will have an apples-to-apples comparison. The result is the same: a national poll median of Obamney +0.0 +/- 0.6% (n=10 pollsters, 1 poll per organization). Dropping automated phone polls (PPP, Rasmussen, Gravis) gives Obama +0.5%, still not enough to account for the difference. Answer: no.

Are state polls slow to catch up? State polls take 10-12 days to reach a new steady state, even when the change occurs in one day, like Romney’s 5-point bounce after Debate #1. Could it be that they have not caught up with national polls? This is unlikely for two reasons:

  • In national polls, the race has been stable for the last two weeks – long enough for state polls to catch up.
  • The Meta-Analysis is moving toward Obama – opposite to the direction expected.

Answer: no.

Are there hidden advantages in non-swing states? Unlike state polls that influence the Meta-Analysis, national polls sample non-swing states. Could Romney have exceptional support in red states — or make the race close in blue states? Using polling margins from (and filling in a few missing values using 2008 returns), an average (weighted by 2008 turnout) gives Obama +2.1 +/- 0.6%. Sean Trende of RCP has done a similar calculation. That number is basically the same as the Meta-margin. Answer: no.

How is the track record of national polls? Here is a comparison of poll margins and final results.

YearFinal polling medianActual resultDiscrepancy
2008Obama +7.0 +/- 0.9 % (n=15)Obama +7.3%0.3% (0.3 sigma)
2004Bush +1.0 +/- 0.5 % (n=13)Bush +2.4%1.4% (2.8 sigma)
2000Bush +2.0 +/- 0.9 % (n=15)Gore +0.5%2.5% (2.7 sigma)

For a bell-shaped curve, the average error is supposed to be 0.8 sigma. Here it’s much larger, 1.9 sigma. Aha…here may be our culprit.

Evidently, national polls have systematic problems. Answer: national polls do about 2.5x worse at predicting the popular vote outcome than expected if the wisdom of crowds of pollsters were perfect.

How is the track record of state polls? In terms of predicting both state-by-state and overall electoral outcomes, state polls do extremely well. In 2008, I correctly identified the leader 49 out of 51 races. I called two races (Indiana and Missouri) tossups, and those races had margins within 1%. In addition, the 2004 EV median precisely matched the final outcome. In other words, state polls get it 98-100% correct. Answer: pretty darned good.

But if state polls use the same methods, why would they do better than national polls? Well, state polls have three advantages.

  1. Most state races, even in swing states, are decided by margins of 2% or greater. So an error that makes a big difference in national polls doesn’t matter nearly as much for state polls.
  2. State polls target more homogenous populations, which poses fewer technical problems to the pollster. For this reason, the systematic error might be smaller.
  3. In critical swing states they are done more frequently. This focuses the data where information is most needed.

As for why the weighted sum of state polls gives a result that differs from national polls, the only reason I can think of is (2) above: state polls might be technically easier to conduct and weight. Still mulling that one.

BOTTOM LINE: Even if national and state polls have the same flaws, they are consistent with one another. Because state poll aggregation is so powerful, the result based on state polls is likely to be more accurate. That is what I would call The Math.


Hudson Reeve says:

Sam, maybe you have posted about this elsewhere, but what is your reaction to the prediction by the Colorado University political scientists Bicker and Berry (mostly based on indirect economic rather than poll data) of an easy Romney win?:

Ken Dogson says:

Come Wednesday we shall know whose model lacks predictive power.

JamesInCA says:

He has:

Reason says:

This was written about 3 weeks ago and does not include the recently improved economic data. Not sure this is even close to accurate.

Daniel Cooper says:

Hey Sam, I used to believe in Zogby. Not so much now. He’s using the logic that independents will break in one direction or another even though evidence suggests they break evenly. What’s your thoughts on Zogby? He seems to be like Rassussmen and suddenly changing his tune last minute just so he doesn’t look bad. He just another pundit suffering from Zo-mentum?

Craig says:

Zogby was once (late 1990s/early 2000s) an excellent pollster. But his 2006/2008 Zogby Interactive work was, without a doubt the worst two cycles I have ever seen a major pollster experience. They were off by huge, laughable, margins, and he seemed blind to his work’s flaws – right up until his accounts cratered and he had to sell his brand name to cut his losses.
Since then JZ Analytics seems to have abandoned Zogby’s old methodology, and their Ohio polling contradicts his previous remarks.

Trim says:

Hello Dr. Wang,
A couple quick questions. One of the criticisms I’ve seen from conservatives about state polls (besides the ‘over sampling’ of Dems) is that many of the pollsters are overestimating turnout for Obama and instead are relying on 2008 numbers. They argue that since 2008 was a record year in turnout for Dems, it’s unlikely Obama would get similar results for 2012 and would instead see turnout more akin to 2000 or 2004. What are your thoughts on this?
Also, is it possible that 2000 was an outlier for the discrepancy between popular vote and national polls? If I recall, Bush’s arrest for drunk driving was revealed just before the election, which could have skewed the results. How are the results when you incorporate national polls from 96, 92, and 88?

JamesInCA says:

As to the first question, I think those two criticisms are really the same thing. The underlying assumption is that pollsters are somehow putting too many Democrats into the sample, be that by weighting, calling too many Democrats, or using a Democratic-leaning likely voter screen of some kind.
The one thing I can see that might be different between the “over-sampling” and “bad turnout model” critiques is an assumption that minority-voter turnout could be over-estimated this year, if based on 2008 results. More bluntly, people may mean that fewer black voters will turn out this year because Obama isn’t fresh and new this time around. We won’t know til after Tuesday, but my guess is this criticism is incorrect.

Craig says:

1996 and 1992 were actually worse than 2000. 2004 and 2008 have been absolute banner years for pollsters compared to the 90s.

Trim says:

Are the same thing? I thought the criticism was of the likely voter models for those polls showing Obama with a consistent lead. Such that, those polls that showed Obama’s best numbers were the ones that had the least conservative likely voter assessment. Pollsters like Rasmussen have a much more conservative way of determining who qualifies as a likely voter.
The argument goes that since Democratic enthusiasm isn’t quite as a high as ’08, it’s unlikely that as many Dems will actually vote. So it’s not so much that they have been polling more Dems (after all, the sampling should be random) but that they give too much weight to Dems.

JR says:

Hmm. Very thoughtful and complete, Professor Wang. You’ve given me a lot to think about.
A lot to think about because I’m a market researcher, so we do things like this pretty much all the time. I have to admit that when we want a snapshot of what a population thinks, we definitely conduct a survey of a randomized sample of that population, applying weights as needed (e.g., race/ethnicity, age, gender, etc.).
What you’re saying – and I think you’re right – is that if we instead conducted several BIG surveys of African-Americans, then another for Whites, then another for Asian-Americans, etc., and then weight them all together … that we would come up with a better estimate of the population parameter. (I hope my example makes sense.)
In the real world, of course, this is never done because it’s far cheaper just to do one big survey of everyone. But, your point is that the many, many state surveys are in effect giving us all the “sub-surveys” I talk about.
I think you’re right, but I’ll mull it. As to your final conundrum (Why don’t we get the same national number when we combine the state estimates?), could it be the n issue I’m talking about? Across all the surveys in Ohio, for example, we’ve now surveyed thousands of Ohioans. There ain’t no national survey that’s going to capture that many Ohioans, so its implied “sub-estimate” of Ohio will be less precise, and then that imprecision gets magnified across the 50 states. Just a thought.

Hudson Reeve says:

The perfect answer, scientifically. But those professors brag that they predicted every election outcome since 1980- I would like to hear Sam’s take on why, for this election, their model predictions are likely to diverge from the actual outcome…

JamesInCA says:

If, in fact, the election result is different from their model’s prediction, it is really they who will have to explain the difference. They are positing an underlying mechanism by which voters choose whom to vote for. If this election diverges significantly, they’ll have to consider what this says about their model.
Prof. Wang, on the other hand, is not proposing a mechanism by which people choose, but a method of estimating and predicting those choices. Likewise, if the election results diverge significantly from his model’s prediction, he’ll want to analyze and explain why the polls provided a systematically incorrect picture of voter sentiment and behavior.

JR says:

I have a separate request / suggestion, more about your final predictions and how you’ll present them.
I strongly suspect that you’re going to predict each state, right? And maybe with a “% likelihood” for each? That would be the best way for us (and everyone) to gauge “how you did,” with the hope that any incorrect guesses will be in the mid-range of likelihood (e.g., 51% that Obama takes VA, 49% Romney takes it).
I say this because I have to admit that I’m not a fan of the overall Probability of Obama re-election statistic. I don’t like it at 538, either. I totally get that it’s a cool way to track the race, and yes, it’s sexy. But, from a scientific perspective, it’s just not falsifiable nor testable. If Obama does indeed win, was that with 98% probability (PEC) or 85% probability (538)? The only way to know would be to track your probabilities over MANY elections, which we’ll never be able to do.
So, I am far more interested in your state-by-state predictions, including Senate. I suspect this is what you’re planning anyway, but just wanted to put in my vote.

Sam Wang says:

Yes, I resisted the % win figure for some time. But then I caved to pressure. You shame me.
I wrote about how to test predictions a week ago. See if you like that.

Reason says:

I am still concerned about CO. I checked the early voting totals and it shows R’s with a 2.3% lead. In 2008, D’s had a 1.8% lead at this time. Could the polls showing O ahead be that off?

Matt McIrvin says:

Colorado looks like a case where the polling is coming out strangely bimodal, but the +O and +R polls are approximately equal in number. So median-based averaging might give a misleading picture there, with lots of noisy jumping between red and blue. Pollster’s mean has the state on a knife edge.

Ms. Jay Sheckley says:

She has gone pale

Analytical says:

@Reason… Not to worry. This election is over. Please read Jim Messina’s memo today. R has been out organized by O in battleground states. EVs 332 + for O. This was really never close except in media.

Reason says:

@Analytical, I appreciate that. However, if you look at the GMU early voting stats, CO is still R +2.3. Those are hard numbers. So compared to 2008, O is running a 4.1% deficit now. There is still time to overcome, I realize. Time will tell. And I do not want people to think it is over. That creates complacency. People need to treat this like it is on edge and vote.

Reason says:

Also, do you have a link to that memo?

Leave a Reply

Your email address will not be published. Required fields are marked *