Comparisons among aggregators and modelers

November 4th, 2012, 6:08pm by Sam Wang

I get asked a lot about why the various aggregators differ from one another. After all, we all start with the same polling information. Today I will give a general sketch of how and why we differ – and what I view as the strengths and weaknesses. I’ll restrict myself to organizations that I am more familiar with. It’s Sunday, so not that much math.

The two big distinctions to make among the various approaches are:

(1) Do we use polls only, or do we bring in predictive indicators (e.g. economic variables)? There are many “pure aggregators”:,,,and  RealClearPolitics. If what you want is polls only, any of these sites are good. The Princeton Election Consortium uses polls only, and gets them from A few online prognosticators bring in economic variables too, most prominently FiveThirtyEight and Votamatic.

(2) Do we take a snapshot of current conditions only, or do we attempt a future prediction? I’ll review three organizations that have been making predictions all season: FiveThirtyEight, Votamatic, and the Princeton Election Consortium.

I’m leaving out political scientists who use predictors only, such as Alan Abramowitz, Ray Fair, and the University of Colorado people. As I’ve written, I categorize their models as tools to test ideas about how voting preferences are shaped. All of them do well at “post-dicting” past events. They might get the next election right, but if they don’t…so what? Make another model. This activity is research, with emphasis on the “search.” There’s nothing at all wrong with it. But it’s most useful before the election season starts. In the storm, you want the person with the instruments, not the person with the almanac.

I’ll go through the various models, going gradually farther away from polling data.

Polling data:, All of these sites present polls with relatively little additional processing., by Andrew Tanenbaum, exemplifies the first wave of aggregators. He gives simple tabulations of state polls, with the electoral vote total determined by the most recent poll. To reduce poll-to-poll fluctuation, RealClearPolitics adds simple averaging. They also leave out partisan pollsters. uses more sophisticated smoothing methods, and has remarkably good user tools to allow the construction of customized graphs. In all cases, the electoral vote count is a simple total, assigning each state one possible outcome. This is the mode of the distribution.

Pro: Gives a quick look at the race with a minimum of filtering. Averaging gives a sharper picture of any individual state.
Con 1: The total electoral estimate still fluctuates because all states are reduced to a single combination of outcomes, which usually corresponds to the single highest point on the histograms here at PEC or at FiveThirtyEight.
Con 2: The use of averaging allows an extreme outlier poll to pull the average disproportionately in one direction. This can be an issue where polls are sparse. Smoothing is not optimal for revealing sudden shifts such as the effect of Debate #1.

Snapshot plus state-polls-only based prediction: Princeton Election Consortium. Like the sites listed above, we also offer a snapshot, shown in the topline of this website. The history of the snapshot is plotted in the right column. However, our methods wring considerably more information from the data. Our fundamental output is two core numbers: the EV Estimator and the Popular-Vote Meta-Margin. They are very high-resolution measurements at a single point in time. Think of them as an electoral snapshot or an electoral “thermometer.”

How we get these numbers requires a little explanation. Briefly, in each state we do more than ask “who’s ahead?” Instead we calculate a win probability from the median of recent polls, an outlier-rejecting approach. Using a simple math trick, we then take these 51 probabilities to calculate the exact distribution of all outcomes (2.3 quadrillion). The middle of the distribution is the EV Estimator.

The Meta-Margin takes advantage of the fact that the EV Estimator calculation is very speedy (takes much less than 1 second to run). It relies on a core tool, the bias variable b. It is easy to shift all polls over by a fixed amount b. There are several reasons to care about this: (i) polls in different states tend to move together – correlated variation; and (ii) polls may all be biased by some amount, which can also be simulated by varying b. The Meta-Margin is defined as what value of b would lead the electoral college to be a perfect tossup. It’s just like a margin, which is why it’s in units like Obama +2.6%.

Finally, we also use b as a way to  game out future scenarios, and make a prediction. If we think polls can move by up to 1% in the future, then we can add up all the possibilities from b=-1% to b=+1%. The red and yellow strike zones (which are almost gone as of today) are calculated this way. Based on past elections, we can estimate what b might be.

The reason I am going off about b is that it is my way of thinking of contrasts between the Princeton Election Consortium with FiveThirtyEight. In some sense, my assumption that b has a narrow range accounts for why the two sites give different re-elect probabilities for President Obama.

Pros: Makes near-maximal use of existing state polls, whose track record using PEC’s methods is excellent. Uses medians to reject outliers. Converts Electoral College mechanisms to a Popular Vote margin, an intuitive quantity. The low noise allows accurate identification of swings in the race.
Cons: Doesn’t use national polls. Doesn’t correct for house effects. Assumes that state polls are, as a group, unbiased (though this does have support from 2004-2008).

Hybrid model: FiveThirtyEight. In addition to state polls, Nate Silver uses other variables – national polls and econometric indicators – to infer a likely election result. He used this approach to predict winners in the 2008 Democratic primaries, in that case including demographics and more. He was able to fill in some missing-data problems.

For his Presidential model, he takes several approaches. One type of variable is econometric indicators, which informed his calculation earlier in the season. His current calculation uses national and state polls, with fuzz factors to account for the possibility that these polls could contain systematic errors.

I’ll be brief without getting too far into the weeds. He takes a very conservative approach to estimating win probabilities, in the sense that he builds in ways that effectively reduce the certainty of any particular outcome. In addition to being conservative about single-state probabilities, it appears that he puts a lot of credence into the possibility that national and state polls could be off by a substantial amount. Recently he said that this cautious approach accounted for much of the 16% probability of Romney winning the election.

Let me express this idea in terms of my bias variable b. The 16-percent idea is approximately equivalent to saying that there might be overall (i.e. all pollsters combined) systematic errors in national and state polls that could drive b as high as 5% in either direction (a 95% confidence interval), given today’s Meta-Margin. However, as I pointed out the other day, b doesn’t affect state outcomes very much in most cases, since these races, even in swing states, are usually determined by a larger margin. Also, based on my analysis in 2004-2008 (where data are abundant), b for state polls is smaller on average, 1-2%. It’s larger for national polls.

Eventually, I believe that a suitable way to measure b in past elections is to perform aggregation separately on state polls and national polls, then compare actual national popular vote and EV with national-poll margin and the Meta-Margin that I have defined. This might be hard to do for earlier elections, where polling was sparser.

Pros: Takes into account national polling data; corrects for individual pollster biases; takes a conservative approach to the uncertainties.
Cons: Likely to overcount uncertainties (look at the error bars). The use of national polls may reduce accuracy of state-level Electoral College outcome. Uses econometric variables even after direct measurements (polls) are available. Time resolution not as good as a pure-state-polls approach.

Polls with a predictive prior: Votamatic. Drew Linzer’s project is a fresh approach to the problem of combining econometric variables. In his case, he uses an econometric model for long-term prediction to set up a “prior” expectation of how the race will unfold, then uses this to guide the interpretation of polling data.

As you can see, the model fluctuates hardly at all. It seems to really have an affinity for Obama 332 EV, Romney 206. At some level this is a good feature: if a prediction is accurate, it shouldn’t vary much. However, I am a bit concerned because this suggests that the prior is drawn very restrictively. In other words, it is set to ignore or shape incoming polling data. The predictive value of such a model depends a lot on the validity of the prior.

My own inclination for a model like this is to use it to fill in “missing data” problems. Many states are underpolled, such as Texas or Vermont. A strong prior can give us expectations for what would happen there. Although those outcomes are not in doubt, the vote-share is not known. This would be a good test. Another example of a missing-data problem is Senate or House races, the latter being a significant prediction prize.

Pros: Very stable prediction; keeps prior and polling data separate; high level of analytical rigor.
Cons: Dependent on the validity of the prior; doesn’t reveal much about the dynamics of the race.


As you can see, these models each have their own uses. To my own taste, I’d use them as follows.

Seeing polling data for one’s own envelope analysis:,, or RCP.
Sharp snapshot of the race as it evolves over time: Princeton Election Consortium.
Least-confident “conservative” prediction with all conceivable rational caveats: FiveThirtyEight.
Stable, model-driven view: Votamatic.

I should also note all of these sites also have their own flavor of commentary. Drew Linzer has done fascinating recent work getting into examining individual pollsters and looking for “skew” or “bias.” gives a very good daily survey of the scene at all levels, and highlights polls of particular interest. And of course FiveThirtyEight’s Nate Silver made his bones in part by the data-driven play-by-play commentary that made him famous in 2008.

It is certainly possible that I have not put my finger on key differences between these approaches. I imagine many of you are big fans of the other sites, and can offer alternative interpretations or corrections in comments.

  • wheelers cat

    Check out the sample size on this baby.

    • David Mann

      Dat sample size!

    • ChrisD

      I think they went that large so they could also do a couple dozen state polls.’s new poll box now has 18 state polls by YouGov.

    • orchidmantis

      So, we here in the Northeast do not care for Mr. Romney. In fact, Obama is more popular in the south than Romney is here. Despite his allegedly being a popular northeastern governor.

      Also: Man, can you see the southern strategy slowly folding in on itself in that last page.

    • Joel

      Sweet poll, interestingly under samples Obama’s vote share in the 2008 electorate.

    • Craig

      @ Joel:

      That’s very strange – postelection polls nearly always overestimate the winner’s vote share.

    • Some Body

      @ Joel and Craig – if the poll assumes less success in Dem turnout than in 2008, i.e. more Obama 08 voters staying home, then this makes good sense. Plus, like any poll, it has its sampling issues to face.

    • sandenberg


      Or perhaps some people do not want to admit they voted for Obama? Hard for me to believe but conceivable in some quarters

    • Frances Smith

      I went to the survey and calculated the percentages of the white, Black, and Latino voters. There was no listing for other, such as Asian or Native American. Here’s the ethnic percentages: White 77.81%, Black 13.68%, and Hispanic/Latino 8.5%.

      All indications are that the Latino/Hispanic vote will be well above 8.5% and it is likely that the white vote will be about 74%. And apparently they did not survey any other groups of voters. So how can this poll produce a valid result?

    • Steve16748

      Thanks for the link and your insightful comments all season.

  • JaredL

    “Instead we calculate a win probability from the median of recent polls”

    I’m assuming you get this using some kind of Bayesian method? If so, how what did you use to get the priors for different states. If not, how do you convert margin into probability?

  • Bill N

    Approximate 99% confidence interval for the difference between Obama and Romney (2% in favor of Obama) in this large yougov poll, assuming it is based on a simple random sample, is .66%% to 3.3% for Obama.

  • Vicki Vance

    Sorry to be less mathematical than the people who comment here, but…I am just trying to decide if I need to take Tuesday off and drive to NC (from very red SC) and help get out the vote for the President. The question is….do I need to do that? If Romney wins and I didn’t do everything I could to stop that, then I will have to kill myself and I hate when that happens. :) Please offer any advice….

    • Keith

      Yes! Please do!

    • Howard Roark

      Vicki Vance: Mathematically speaking, NC has a 0.4% chance of being the tipping point state that decides the election, according to 538. Emotionally speaking, you should definitely be in NC doing GOTV :)

    • Ms. Jay Sheckley

      Please, Vicki Vance, do. That’s brilliant. North Carolina is, and has been, teetering on the brink, strobing red white and blue. You are needed! The continued existence of FEMA means so much to the longterm economic well-being of North Carolina. It’s possible you yourself can make the difference. PLEASE!!!

    • Smug-in-NC


      Don’t bother. NC is very very unlikely to be a tipping point state.

    • Mich

      If you look over at Daily Kos, they track the early voting numbers in NC carefully. They are showing that Obama’s ground game there may have made it much closer than in the polls. Plus any polls <2% when aggregated could still go the other way (see Indiana in 2008) – 2% seems to be the magic aggregated poll number. So I'd say GOTV!

    • E L

      @ Vicki Vance: The larger Obama’s EV margin the more political capital he’ll have: 1. Bush tax cuts 2. Sequester 3. Debt ceiling. They’ll all be on his plate in the first month. Do it, please.

    • Mich

      Cab political also has some great commentary on early voting numbers:

    • Muhahahahaz

      I would say go for it! Obama doesn’t necessarily need NC, but he does have a 40% chance of winning there currently. Your support can make a difference!

    • Peter D

      Vicki, NC is good. Its not extremely likely that NC will decide the election, but it might.

      VA would be almost 3 times as impactful if you can make it! See the power of your vote at right.

    • Ms. Jay Sheckley

      I’ve been playing with the interactive maps for for months and I disagree vehemently with smug-in-nc.

      Though of course Obama has a good chance of winning without NC, there are some crazy other factors going on. NC is _needed_.

      And, if Obama were able to win without her but won her too, it could make an important difference to the congress / senate races which affects the Supreme court and everything else,, and the sense that the win is not a flike which could affect the national mood for years.

      But we CANNOT say we don’t need North Carolina. Ohio polling may not come in for 3 weeks. PLEASE go. If you Paypal, I think I can kick in $10 for gasoline and $2 for coffee. PLEASE GO! Call the campaign there and ask.

    • DJG

      Well, if Prof. Wang is correct, you don’t have much to worry about anyway. As someone else mentioned, Nate Silver has NC at .4% as a tipping point state. So, if O wins it, it will be a delicious cherry on top of an already baked cake. However, I do question the single minded obsession most people have with the Presidential race. Folks, there is a contest for the House as well, and if the One With Tears In His Eyes remains speaker, we can kiss a second term Obama agenda goodbye. Go to NC, but GOTV in the NC-07 Congressional race!

    • Ms. Jay Sheckley

      Vicki- We have plenty to worry about. It was just revealed that Mitt didnt pay taxes for 9 years. so he is taking a minor gaffe when Obama was _defending_ Mitt from the crowd booing “Mitt & the Obstructionists’ [not a band] when obama said,”No no no. Don’t boo. Just vote. Voting is the best revenge.” That’s a spin n “living well is the best revenge.” But Mitt is everywhere today plus on Fox saying Obama said “Vote for revenge.” Untrue, but how to get the word out? We MUST get our voters out please. This is a crucial time, and Romney is ahead in North Carolina, but not so much that you cant make a difference. Please!

    • sandenberg

      I was going to volunteer in Colorado but the part near me is mostly small towns and I doubted how useful a carpetbagger would be for canvassing.

    • wheelers cat

      Ms. Jay, I prefer Francis Bacon.
      “Revenge is a kind of wild justice.”
      Thats what I voted for– Justice.

    • David

      Do it!

    • Joel


      Here’s how I would put things:

      I live in very blue Seattle, Washington. On the surface of things, my contributions to the local GOTV efforts won’t really impact any national campaigns (save for WA-1, but that’s a different story). However, just the act of getting people involved in the democratic process has lasting ramifications. This isn’t just about turnout for this election, but for the next one. If you can go out there and put a friendly face on our cause, I say, do it! Keeping in mind that the polls are driven by these very efforts.

    • Collacch

      Yes. I am a cal resident who voted early and is now in Nevada until the election is over. Poll averages don’t win elections. Votes do. Polls are always wrong by some amount with some level of confidence. Part of this error reflects differences in actual turnout vs assumed turnout.

    • Ms. Jay Sheckley

      Vicki, NC just went red. They need you

    • Ken Dogson

      Stay home all day and call CO and WI.

    • John Thorn

      I am doing all sorts of canvasing in Ohio because I feel the same way. But I live in Ohio. My advice is for you to do it. It will do something for you and definitely help you to live with yourself. Remember:We should all want to do what we can to prevent the lying, bullying, weathervane from gaining power he will certainly mishandle as well as use for his supporters’ misinformed and delusionary vision for our future.

    • Prairie Pundit

      @Some body. First: I like your username. Second: I’m as guilty as the next non-neuroscientist in publicly claiming that I come to this site to calm my nerves. Perhaps we PEC-as-Xanax junkies could do a better job of explaining why we find this site so soothing. It’s not because PEC strongly predicts, in this election cycle, an outcome in line with my own desires. It’s because PEC’s elegantly simple model seems to do a very good job of eliminating the noise and spin and exaggerated uncertainty of other sources of information. Like many readers, I would still frequent this site even if Sam’s math told me bad news – just as, if I were diagnosed with a lethal form of cancer, I would rather have the counsel of an oncologist who tells it to me straight, rather than one who slaps me on the back and urges me to carry on with life as usual. Even in the face of bad news, there’s something comforting about the truth. Peace, buddy.

    • Julia

      I’ve worked as an election judge my entire adult life and I don’t know how people can go to work as if it’s a normal day. Voting is so key, even before bringing issues into it, that for someone who’s into politics enough to be on this website and considering that drive, why WOULDN’T you?

      Even if you KNEW NC would tip towards Obama without you, where would you rather be when you find that out? I’d rather be focused on the election the whole day, keeping my anxiety aimed at the fundamental building block of the system (the integrity of the vote or GOTV), than refreshing news sites at work and second-guessing myself.

  • Osso


    I just read your article and my brain hurts….

    Here is an easier way:

    “New Pew poll shows Obama in lead as he and Mitt Romney make final push toward Election Day”

  • Ken

    Gallup may release one last poll tomorrow after being down for Sandy. Any predictions? I recall they were last at +5% for Romney….

  • Rob H

    Nice piece, I think that all of us poll watchers pretty much scour the aggregating sites doing exactly what you just explained so clearly. Using all of these methods to cobble together an ever more precise view of the election. One day when I have a couple weeks I would love to aggregate the aggregates.

  • janjanjan

    I frequent and enjoy all these sites. is my favorite of the pure aggregators. The commentary, although limited, is interesting and I routinely follow the links. Votamatic is frustrating since I really don’t believe the race has been as static as projected. 538 is great, although sometimes leaves me with much anxiety. This site relieves anxiety with its precision and almost absolute certainty. Plus, I can get new numbers throughout the day, which feels like all the moving parts are being monitored. So, just call me a election prediction junkie!

    • BrianTH

      The thing is, there is going to be just one actual outcome, and it will remain an open question whether all the apparent “snapshot” shifts that happened along the way actually mattered, or whether instead the way that the process is structured tends to force a convergence on a particular predictable outcome by the end.

    • Some Body

      With all due respect, I think you like this site for the worst possible reason. You like it because it currently reports the highest certainty of all aggregators of the outcome you want to see happen. Had Wang’s model produced the best probabilities for Romney, you’d get much more anxious. But this is just wishful thinking.

      If you’re so anxious, use your time and energy on actual support for your candidate. Go canvas or do GOTV, or whatever. Don’t use PEC as Prozac; it’s not what it was designed for!

  • RC

    Now that is a sample size! My favorite portion is the symmetry of voter ID with D/I/R (please let that quiet the skewers). Notice the crossovers? There is almost an equal portion of Dems going Romney as there is Repubs going BO. By the way, if the population of a state has more Dems than Repubs most polls will show D+ due to the natural bias of the state. It also seems to show exactly what is being seen with state polls and national polls since the storm. Romney should be concerned, and Obama shouldn’t be popping open the bubbly just yet.

  • Tapen Sinha

    I get the difference. What I don’t get is why Betfair and Intrade differ. The difference is so large that even allowing for transactions costs, there is arbitrage opportunity. You can make riskless profit. Really! Why aren’t the punters doing just that and making the difference disappear? These markets are rather deep – sufficient to make such transactions possible. Now, why don’t *I* do it? If I had $50,000 spare, I would. It would be the best return you can earn in two days. Something in the order of 30 percent on your money.


    • wheelers cat

      Tapen, so good to see you!
      check here in the comments for many arguments about betting.
      it seems non-cost viable on the logistics to me.

      As for Betfair vs Intrade…Betfair is both much much larger and very British. I imagine the Brits simply cannot conceive of many americans voting for #romneyshambles #AmericanBorat.

      Maybe you can answer my question, Tapen– why would anyone believe that the Market would inform the Math? Markets are subject to Irrational Exuberance and trader manipulation.
      Unlike the Math.

    • RocketDoctor

      Very good analysis on DKos about why Intrade should be ignored.

    • Joel

      Well, the legality of these markets is something of a gray area in many parts of the United States. While the risk of running afoul of the law is admittedly very small, the cost of that outcome greatly exceeds any potential gains to be made there, IMHO.

  • Some Body

    Very clear and useful. Thank you!

  • njn

    What about :) I can tell it’s a rubbish site, but I’d love to read an analysis of exactly why it is.

    • 538 Refugee

      He makes stuff up.

    • 538 Refugee

      I won’t dignify the site with a link, but seriously, just go there and follow the link to their FAQ. It points to a couple of articles, probably written by the site owner, that explains his methodology. Basically, all polls have to be based on Rassmussen. Not sure what he is gonna due now that Rassumussen is trending to Obama though. Makes for an amusing read though.

    • Some Body

      Hmmm…. Let’s see–

      *Pure rubbish:* This site does the opposite of aggregating polls to produce a prediction–it has a prediction and reverse-engineers the poll results as they come in to fit it.

      *Pros:* Offers great entertainment and many delightful moments of laughing your a$$ out.

      *Cons:* None, so long as you don’t actually expect this site to provide any information.

    • Craig

      The funny thing is that unskewedpolls does the same thing they accuse pollsters of doing – rigging the results to show what they want, using party ID. No serious pollster does that – in fact, everybody but Rasmussen and Susquehanna avoids using party ID altogether.

    • BNW

      @Some body – That is a hilarious and spot-on review of !

  • GBIllinois

    I’m a Poll junkie too, but nowhere else can I find a group of commenters like the bunch who frequent this site. I love you all.

  • BrianTH

    Very helpful summary.

    I will note I think something like Silver’s approach may still be helpful for filling in missing state data. That is not particularly useful about now, when all the interesting states tend to have a lot of data (which in turn helps explain why Silver’s results converge with simpler aggregations/predictions in the final phase). But in the earlier stage of a campaign, it could have practical applications, say as resources are being allocated.

    • Craig

      Very true – for rarely polled states, Silver (and Drew Linzer at Votamatic) do a good job of filling in the blanks.

      I’m not sure if this is his method, but Linzer could measure the deviation from the prior in the heavily-polled states, and use that estimate the deviation in the lightly-polled states.

    • orchidmantis

      A lot of his reputation came from calling primary states using demographic data, based on how similar regions (affluent suburbs, etc) had voted.

  • Commentor


    Based upon your description, Drew Linzer’s model would appear to be a Baysian model, except that you would expect the posteriors to eventually overwhelm the prior.

    Do you think that this is an indication of a lack of “Baysiness” or that maybe Drew just happened upon a prior that was very close to the actual value?

    • Craig

      Drew uses the Douglas Hibbs model, which has above average accuracy – but still worse than polling. (Econometric models have a poor record in general.) That’s why the prior recedes over time.

    • Drew

      Actually I use the Abramowitz Time-for-Change model for my prior, assuming uniform swing from 2008. It’s starting me off at Obama with 52.2% of the two-party vote. This is a stronger position than it looks like he’ll actually end up at, but all in all, not a bad guess. Over time, the effects of that prior have worn off, and my forecasts now are based almost entirely on the current polling data.

      Anyway, the updating is definitely happening. Where you can see it most clearly isn’t in the overall EV estimate, but at the state level, where the forecasts have gradually evolved (and, hopefully, improved) over time.

    • Craig

      Huh. Mea culpa.

    • Craig

      I can attest that the state-level changes even if the EV prediction remains the same – Drew’s graphs are awesome, strongly recommended.

  • Muhahahahaz

    I would say go for it! Obama doesn’t necessarily need NC, but he does have a 40% chance of winning there currently. Your support can make a difference!

  • Khan

    Yay math, so much more simple than agenda-driven hackery.

  • dsm

    Thanks very much for the explanations.

    What could the election results tell us about the relative merits of the various approaches?

  • Olav Grinde

    Question: Why are exit polls not necessarily accurate?

    • Some Body

      Answer: because they are polls, just like other polls are (and there are going to be fewer of them this time around too).

      Exit polls do eliminate two possible sources of error and uncertainty in polls: people can no longer change their mind between the poll and the election and the pollster no longer needs a likely voter screen.

      But they are still polls, with random samples, and these samples, in addition, are not always selected in the most professional fashion.

    • Craigo

      I can’t find a public source for it at the moment, but there’s an excellent paper by Traugott, Highton, and Brady which reviews the 2004 exit poll issues. It’s a great introduction to the process.

    • JamesInCA

      @Some Body – an exit poll’s sample may be random for a single poll site, but its sample of the electorate is not random because they do not sample at every polling place. Each voter does not have an equal chance of appearing in the sample; thus it is not random.

    • Some Body

      @James in CA: Yep, that’s part of what I meant when I said “not always selected in the most professional fashion”. They’re also not all that random for each polling place.

  • Tractarian

    You said that Nate Silver recently intimated that almost all of Romney’s 16% is based on the “possibility that national and state polls could be off by a substantial amount”

    That is wrong. (Not that your link to a Daily Kos diary helps.) What Silver actually said is that Mitt’s 16% chance is largely based on the idea that the state polls are systematically biased against him. In other words, his 16% is based on the state polls being wrong, and the national polls being right. Which is pretty much the opposite of what you said.

    By the way, we are right now seeing the convergence of state polls and national polls that many Romney supporters said would eventually happen. It’s not in the direction that they hoped for, though. The last 15 national polls on Obama 9, tied 6, Romney… 0.

  • SRC

    Has anyone attempted to estimate the likely effects of voter suppression efforts (I am thinking particularly of the very long lines being reported for heavily Democratic precincts in Ohio and Florida)? Is it more on the order of a tenth of a percent, or a full percentage point, for individual states? How close would the election have to be for this to potentially alter who wins the electoral college?

    • JamesInCA

      Until you know how the effects are manifested, though election day, how could you possibly estimate it at any useful level of certainty?

  • Craig

    For what it’s worth, the national polling seems to be moving (slowly) towards the state polling. Of those conducted at least partially in November, nine have Obama leading by 1-3 points, 4 show a tie race. Romney no longer leads in any national poll.

    This does not include the recent Gallup poll, O+3.

  • Steve Marks

    Another difference among aggregators and one that is important to me. Some aggregators give estimates for the electoral vote, some give estimates plus a confidence interval, but Princeton and Fivethirtyeight both give probabilities, which for me is the most interesting statistic. What is the probability that a candidate will win? Isn’t that what we really want to know? Realclearpolitics doesn’t seem to do this. Nor does Votamatic if I am reading it correctly.

    • Drew

      But note that these probabilities are all model-dependent. What do you want to assume about the reliability and accuracy of the data? There’s no “right” answer. My model does produce a posterior probability that Obama will win over 270 EVs, conditional on my prior, the data, and the model specification. It’s very high, over 95%. The reason I don’t put this on the site is because it’s too much to explain the conditional nature of it.

      In some ways, Nate does his readers a disservice by treating his outcome probabilities as “true” — in the same way that flipping a coin has a 50% chance of coming up heads — rather than based on a model that he’s invented (and never fully made clear).

    • Brash Equilibrium

      Wait a second, Drew. The probability that flipping a coin has a 50% chance of coming up heads is also model dependent. The model is that there are two sides of the coin and we assume the coin is fair. We could flip the coin a bunch of times and keep updating our beliefs based on the strength of the new evidence. Eventually we’d flip the coin enough times to detect even the slightest amount of bias.

      The lesson here is that all estimates of uncertainty emerge from a model. A model’s estimated probability of a candidate winning is valuable information; just as valuable as the credible interval you graphically report at Votamatic. It gives us a measure of our uncertainty in the outcome that matters to voters, which is not the exact number of electoral votes a candidate gets. I don’t think explaining this to your readers would be anymore difficult than explaining anything else that underlies your model, and any of the other big name election prediction models. So I don’t think that’s a good reason for not reporting it. Neither is the argument that probability estimates are model driven. After all, so is everything else about the model’s output!

    • Drew

      Yes, you’re right, technically. I meant dependent on choices about model specification in a more application-specific way, since we can’t actually run the election an infinite number of times under every different set of conditions and calculate the proportion of those times each outcome is observed. Different people make different assumptions to fill in for the sparse data. The probabilities can’t be interpreted separately from those assumptions.

      Anyway, I’ll be doing a full accounting of my model’s posterior probability estimates after the election’s done, just like I did with the 2008 data in my paper,

    • Brash Equilibrium

      And I look forward to reading about it. I just wish I could have read about it during the race, as well.

    • Steve Marks

      Drew- When you draw a 95% confidence interval around your estimate of EVs and that interval is completely above 270, are you not saying that the probability of Obama winning is over 95%? Or rather over 97.5%, since we are only interested in a one-tailed test? If so, why not just come out and say it, as do Princeton (Sam Wang) and Fivethirtyeight (Nate Silver)?

  • JB

    Thanks very much for such a lucid explanation of the different approaches the various sites highlighted take. While I’m no math whiz, understanding the components of a probability estimate certainly helps me evaluate new data as it becomes available. And keeps me sane. Great work!

  • Jack Rems


    Your overviews just get better and better.

    But I think you should add a footnote or the usual link on the words “simple math trick.” I’m too used to seeing those words in spams.

  • irked

    Sam –

    Why don’t electoral maps accurately reflect Nebraska? Omaha (or as I like to call it Obamaha) counts for one electoral vote. Why isn’t the state divided into, say, five equal bars, with one of those bars blue? (Or a circle that is one-fifth the size, also in blue?) This drives me nuts!

    Since yours is the only one that accurately reflects the states in general, changing their size to match their electoral votes, I was hoping yours would also show Omaha’s one electoral vote.

    Thank you for your time and your site.

    • Craig

      Nobody does polling for the NE districts, so poll-based methods can’t assign them a probability without incorporating new information.

    • Muhahahahaz

      “Obamaha” only has about a 13% chance of going Obama this time around.

      In any case, it won’t make a difference anyway. :-P

      Obama 332!

    • Muhahahahaz

      Yeah… that too. There are absolutely *no* polls for the individual districts, although Nate’s model still shows 13% for Obama somehow. :-P

    • irked

      Thanks everyone! I realize it’s like looking at the tree rather than the forest, but even if you look at 2008 maps when Omaha went blue, no map accounts for that. I think once I saw RCP had a tiny dot to show it, but really it should have been a 1/5-the-size dot, ya know?

      Ok, just venting has left me “not so irked” :)

  • Paul G

    I still don’t really understand why you assume that state polls are unbiased. It may have support from 2004-2008 but two cycles doesn’t seem like a sufficiently large dataset.

    An unrelated question, I am a bit curious about the polling in the “terrible candidate” Senate races. For example, one of the very few recent independent polls of the Indiana Senate race found Donnelly leading Mourdock 47-36, with a massive 17% undecided or third party. How likely is it that most of these 17% are actually Mourdock voters who don’t feel particularly disposed to announce their support at this juncture but will pull the lever for him in the end? Although if Donnelly is really at 47% he doesn’t need that many more…

    Similarly with the Akin race, I am wondering if the polling might be off due to people not wanting to announce their support for such an ignominous character, and instead claim to be undecided, third party, or even McCaskill.

    In general, has their been any research into whether highly unpopular candidates in partisan states can sometimes overperform weak poll numbers to such a hypothetical effect?

    • Some Body

      You may be on to something. The famous “Bradley effect” in the US and “Shy Torry effect” in the UK are supposed to function in a similar fashion (and I’m sure there were quite a few studies of both; not that I can cite any). But this time it may even in fact be stronger.

    • Craig

      The evidence that the Bradley effect ever existed is weak at best. It was observed in only a handful of races, some similar races showed no effect or even its reverse, the same error occurs in races without a black candidate just as often, and and even in observed cases nobody has isolated race as the cause (Brown’s support in the Senate was overestimated the same year, but nobody mentions that).

    • Commentor

      It seems to me that there is a basis to argue that polls from 2004-2008 are the only valid data set considering the proliferation of polling. The number of polls should improve the results when aggregated and the individual polls may be better due to competition among pollsters. Further, in addition to the presidential race, you also have the state by state races. In any event, the underlying dataset is even smaller if you want similar levels of polling.

    • Ken Dogson

      We’ll call it the “Asshole Effect” should either of those two POSs win.

    • Some Body

      @Ken – well coined!

  • Blocksteel

    Sam, you may have commented on them before and I missed it, but how do you feel about DeSart and Holbrook?

  • jefflz

    Aggregating the aggregators, including RCP’s no-toss-up map,the simple average EV count over the seven aggregators mentioned, is 299. I can live with that. With all this information to the contrary, it is appalling that the media foists the “to-close-to-call ” story on the public.

  • Ked

    I was going to object to your putting Pollster in with the “just-the-polls” group, at least since their mid-September change in methodology. I took the time to review the article they wrote when introducing it ( ) and I guess, in a pure sense, you’re right. The only “external” data they’re bringing in is historical regional correlation in poll movement. I’m not very fond of the model they’re using, I think it handles FL, NC, and maybe VA and CO poorly, given the changing demographics in each. You can go back to the simple smoothed-graph mode within each state if you want to. You can’t, however, undo the new model at the national map level.

    Still, I probably wouldn’t classify them in quite the same group as EV and RCP. They’re not using simple math, they are using modeling, and they are expressing results in terms of confidence levels.

    RCP is its own class of mess, of course, with the editorial hand firmly planted on the scales, but I will give them credit for getting the idea early.

  • Nathan Duke

    The real reason Romney is in Pennsylvania today?

  • Turgid Jacobian


    I wonder why you don’t do a cume? I know the info is encoded in your distribution, I just can’t read the small font well…


  • Amitabh Lath

    Sam, good overview of the “numbers guys” (as opposed to the “gut feeling” guys who seem to be heaping scorn upon you.

    You left out one important distinction between your site and others (esp. Nate Silver’s). You publish your code. Anyone who can run MATLAB can get your files and run it, tweak it, etc.

    Silver could be more transparent and publish the formulas he uses.

    I see this as one of the main differences between a bookmaker and an academic.

    • E L

      I may be wrong on this, but I think Nate is concerned with someone using his formulas and competing. He makes money from his formula. Princeton pays Sam. PEC is a hobby for him (Some hobby!)

    • Amitabh Lath

      E L: that’s pretty much what I meant when I said Nate Silver is a bookmaker and Sam Wang is an academic.

      Both are ancient traditions (I suspect the former profession is older) and have their own ethos.

      I suspect Nate’s ultimate goal is to build a reputation as an oracle (he’s already well on his way) and get paid to apply his skills in other prediction formats.

      I suspect Sam’s ultimate goal is to get this whole thing published in some respectable anonymous peer-reviewed journal.

    • wheelers cat

      Amitabh, Dr Wang’s tradition is far older.
      Academics started out as priests.

    • Amitabh Lath

      W Cat: Priests probably came with agriculture and human settlements. Gambling probably predates Australopithecus.

      Also, probability theory was developed by a bunch on gamblers. It’s a wonder anything ever got published.

      Being of Sam’s tribe, I suspect he has a couple of dozen other things to get off his desk before he can get to writing this up in publishable form.

      Sam, I would be happy to proofread the manuscript, if you get one going.

    • Sam Wang

      You are very kind, Amit. Thank you.

      It is true that at one point I thought it would be spiffy to have a paper out of this. But then I started thinking about what good it would do me…not that much. I have other papers screaming for my attention, plus the eternal grant-writing that comes with being an experimentalist.

      I thought of sending something to a political science journal. I am not sure that political scientists like the kind of thing I do. It’s been sniffed at for lacking “theory.” Which was the point of the exercise, at some level.

    • Amitabh Lath

      Sam, I know where these sort of inter-disciplinary things fall in academic life. May all your grants get stellar reviews (fund this urgently).

      What you say about Political Science journals not wanting to publish makes me worry about Drew Linzer. His site says he is an assistant professor, I presume that means not tenured yet.

      He has been sitting on Obama 332 like a rock for ages. If Obama actually gets 332 EV, they should tenure him no questions asked.

      But since he brings no theory, they probably won’t.

    • Amitabh Lath

      By “they probably won’t” I meant they won’t *automatically* promote Drew Linzer if Obama gets 332 EV.

      I am pretty sure Drew Linzer will get tenure, given his stature in the public sphere.

    • E L

      @Sam Wang: Reminds me of the French ambassador who said when France withdrew from NATO “I oppose NATO. It works well in practice but does not work in theory.”

  • securecare

    Something I ran across yesterday that might be of interest to those into “the math thing” going forward, for future use. And even if you don’t care about the math/science you might want to watch the (3) movies that show what they are suggesting.

    “Cause or Correlation?

    Three centuries ago, Bishop Berkeley’s 1710 classic “A treatise on the nature of human knowledge,” first spelled out the “correlation vs. causation” dilemma. Sugihara et al. (p. 496, published online 20 September) present an approach to this conundrum, and extend current discussions about causation to dynamical systems with weak to moderate coupling (such as ecosystems). The resulting method, convergent cross mapping can detect causal linkages between time series.”

    Science Vol 338 26 October 2012 p 439

    “Identifying causal networks is important for effective policy and management recommendations on climate, epidemiology, financial regulation, and much else. We introduce a method, based on nonlinear state space reconstruction, that can distinguish causality from correlation. It extends to nonseparable weakly connected dynamic systems (cases not covered by the current Granger causality paradigm). The approach is illustrated both by simple models (where, in contrast to the real world, we know the underlying equations/relations and so can check the validity of our method) and by application to real ecological systems, including the controversial sardine-anchovy-temperature problem.”

    Science Vol 338 26 October 2012 p 496

  • Anon

    Is there any concern that state polling in Ohio is less accurate historically than other states (or the average of all states) given its importance in this election. I was looking at 538 and noted that state polling in Ohio was off by about 1% in 1992, 5% in 2000, 2% in 2004 (Silver didn’t post any polling data from 1996 and it looks like polling in 2008 was dead-on).

    • Ohio Voter

      There have been so many polls in Ohio that I tend to this if there were accuracy problems, the sheer n size would negate those issues

    • Joel

      More polling tends to lead to more accuracy, of course. Or what @Ohio Voter said.

    • David Mann

      Why would the polling be less accurate because the state is especially important? I don’t see the connection.

    • Anon

      I’m not suggesting that the polling would be less accurate because of its importance. But it would seem logical to me, however, that the historical accuracy of state polling in Ohio is a more important consideration in evaluating uncertainty in the projections because of Ohio’s importance in the present election. That said, I don’t have a background in statistics and didn’t particularly excel at it in college. Hence, my question. The replies are appreciated.

    • Olav Grinde

      @Anon: Unless, of course, the Ohio polling was reasonably accurate in all three elections — and the vote count was off by about 1% in 1992, 5% in 2000, 2% in 2004… ;)

      The one I’m concerned about is 2012.
      And, no, at the outset I do not trust Jon Husted & Co.

    • David Mann


      Oh, I misread what you wrote. Oops.

  • Ohio Voter

    I feel like PEC and 538 are centering onto the most likely scenario, with Obama taking just over 300 EVs.

  • Joel

    I think links are setting off the spam filter, so I’ll just throw one more aggregator into the mix.

    Darryl @ horsesass dot org

    On casual examination, looks like he uses:
    Unadjusted state polling data alone (like electoral-vote)
    Monte carlo simulations (like fivethirtyeight)

    And ends up with prediction confidences that rival Sam’s, although his method is a bit more opaque.

  • Les Honig

    Can one of you experts (maybe even Sam) explain the sudden drastic drop in EV’s at 8pm from 318 to 303? I thought the polling I was seeing today was quite good…What happened????

    • James Moore

      I’m guessing that the poll drop is that some of the organizations that stopped during the hurricane were very pro-Romney. They stopped, so they’re not in the calculations for a few days, so you got a spike for Obama. When they start, you see a big drop.

    • Ohio Voter

      NC is back to Romney territory. But Obama has a solid base of 303 EVs. NC would just be icing on the cake

    • Joel

      Obama has solidified his lead to get to 270 (ergo the rise in win%) while slipping in the polls in NC and FLA (ergo the drop in EV totals). I still wonder if NC can be had, based on the early voting numbers. FLA is even more likely to be a tossup, but I think it leans Romney this time around. Worth going for it, though!

    • Froggy

      Florida and North Carolina. The 15EV shift is half of Florida and all of North Carolina. Florida shifted from being a little on the O side to a little R leaning. North Carolina shifted from being a little R to a lot R.

    • Dean

      I see that Florida switched to a toss up favoring Romney. I like what I see in the north for Obama, from Iowa eastward, and Nevada. If Obama holds the north and Nevada, he’s in at 281 EV’s.

      I would love to see Obama get Virginia and/or Colorado. I actually really want Virginia, to pluck one state out of Romney territory. Of course, absent Florida, the best scenario is what’s happening here right now, Obama at 303 EV’s.

    • ChrisD

      YouGov dumped a bunch of state polls late this afternoon. Here are all the changes in Sam’s table from the 5pm update:

      NV: O+4 => O+5
      IA: O+4 => O+3
      VA: O+3 => O+2.5
      PA: O+3 => O+4.5
      WI: O+5 => O+4.5
      NM: O+8 => O+6
      FL: tied => R+0.5
      NC: R+0.5 => R+2
      WA: O+14 => n/a
      MN: n/a => O+7

  • Les Honig

    I am still confused…We were using state polls here; not national ones I thought and the national ones that stopped haven’t started up yet..Gallup is issuing one for the last four days in the morning. Did you mean that some state polls that haven’t been reporting had started up? Most of the ones I see on battleground states today still show Obama in the lead…like PPP and others..Were there new ones during the last three hours that showed big drop offs for Obama? The map shows the same blue areas as earlier today.

  • Les Honig

    Ah, thank you so much…I was typing my last comment when you guys responded..that makes it alot I will still keep the faith!

  • Westy

    Yeah, Obama was never going to win NC – though he could very well win FL. The smart money seems to be on Obama winning somewhere between 290 and 303.

  • maye

    Vicki: Head the other direction and help out in Northern Virginia, where GOTV will make or break that result.

    • David Mann

      That’s not the other direction. She said she’s coming from South Carolina.

  • Steve

    Obama will carry Florida. People aren’t standing in 5 hour lines for nothing.

  • wheelers cat

    I just want to express my gratitude and say how cool it is that Dr. Linzer is answering questions here.
    I learned a lot from his site while Dr. Wang was gone.

  • Brash Equilibrium

    You seem to think that each of these models is good for different tasks. Does that mean that you question the validity of using some kind of model averaging to combine them?

  • 538 Clarification

    “Cons: Uses econometric variables even after direct measurements (polls) are available.”

    1. The econometric variables are weighted less for states with more polls: the more polls there are, the less weight assigned to the econometric variables.

    2. From June to the present, the weight assigned to econometric variables is decreased as election day gets closer. At present, two days before the election, the have almost no weight whatsoever.

    • Sam Wang

      Perhaps I was unclear. I think that to get a better read on polling, the appropriate weight should have been zero at all times, including the start.

    • 538 Clarification


      I should have been clearer as well.

      3. The 538 “Now Cast” does not use the econometric variables at any point. As you said, that gives the better read on polling.

      4. Using the econometric variables – decreasing their weight down to zero as election day comes closer – is used for predictive purposes in the 538 “Forecast,” the prediction of what the results will be on Nov. 6. As election day gets closer and closer, and the weighting decreases toward zero, the “Forecast” numbers get closer and closer to the “Now Cast” numbers, until they’re identical on election day.

    • Brash Equilibrium

      What is the theoretical foundation behind the time-dependent weight given to the econometric variables, and it’s specific functional form? For Drew Linzer, the influence of econometric variables decreases over time because he is updating a well-defined prior probability distribution. For fivethirtyeight, it’s….?

    • 538 Clarification

      @Brash – I’m not qualified to answer that question in technical terms.

      In lay terms, when 538 “Forecasting” begins in June, the econometric variables are used for in a long-term prediction that shapes the interpretation of polling data, similar to what Sam stated about Votamatic.

      In lay terms, the weighting is gradually decreased – as the length of the term for the prediction becomes shorter – so that the econometric predictive factors shape the interpretation of the polling data less and less. That is, so that the polling data stands more and more on its own, until it stands entirely in its own, for the same reasons that Sam uses polling data alone to create a snapshot.

      I could say more, but those are the basics of my lay understanding of the model. As for the technical details about exactly how the weighting is mathematically decreased over time, and decreased in states where there is more polling data available, I don’t know.

    • Brash Equilibrium

      Yeah. I guess what I’m wonder is, why should we believe that the econometric model performs worse later on in the campaign than it does at the beginning? And why do we think the econometric model does better than the polls at the beginning of the election? We don’t know. What Linzer’s model says, reasonably I think, is that the econometric model is a candidate for our prior beliefs. Well, at least the prior beliefs of people who put stock in the Time for Change model. Those prior beliefs might not change one iota if the incoming poll evidence doesn’t cause a prior belief that is much different. What I wonder is if Nate Silver’s weighting is, under some circumstances, equivalent to Bayesian updating, and if those circumstances are very special. What I *know* is that there is not a good justification for just assuming that economic variables are more important at the beginning of an election than at the end.

    • Brash Equilibrium

      Ugh, forgive the many typos in my last comment.

    • BNW

      @ Brash – I checked out you’re website: nice work!

      In a sense, I see 538’s use of econometric variables in the “Forecast” – the prediction of what the outcome will be on Nov. 6 – as a way of filling in for “missing data,” or more accurately, limited polling information.

      In June, there’s a lot of polling information that we don’t have yet — namely, all the polls that will be conducted between June and Nov. 6. By Nov. 6, there’s no more missing polling information of the sort.

      The “Now Cast” – which is also done from June to Nov. 6 – is different: it’s a prediction of what the outcome will be if the election were held today. It doesn’t use economic variables. So, in a sense, Silver agrees with those who do not use economic variables for predicting what the outcome would be if the election were held today. So, by Nov. 6, he’s no longer using economic variables.

      Do you see that reasoning as flawed?

  • Tapen Sinha

    @”Wheelers cat” (AKA Kat)

    Thanks for that link.

    I am not trying to see if Intrade or Betfair is right or wrong. They may be right or they may be wrong. I am trying to understand the arbitrage opportunity (i.e., making RISKLESS money).
    If you bet Intrade, say 64.6, you get 100 if Obama wins. On the other hand, if you bet 23.5, in the Betfair market, you get 100 if Romney wins. So, your outlay is 64.6+23.5=76.1 you get 100 with probability one. Even if the transaction cost in each market is 10 percent, you still get 3.9 percent w.p.1. Actually the transaction cost is much lower. You can make close to 20% on your return with zero risk. That should drive people in droves to make money. If I had $25,000 idle money right now, I would do that in a heartbeat. I only have about $10,000 and I have to pay property taxes with that money very soon. There is one clear risk: One is in USD the other is in GBP. But that is easily hedged.


    • Froggy

      Tapen, I’d advise you to check your math before you bet the farm — 64.6 + 23.5 = 88.1. Once you factor in transaction costs (and the time and effort needed to pull off the whole thing), I’m not sure there’s that much profit in doing this.

    • Ms. Jay Sheckley

      I don’t gamble, but last night on uh FacebooT a frenemy was giving me such a hard time about PEC and calling me “faithbased” and he wouldn’t even say if he disagreed that Obama would win. So I finally said, “You want to make a bet?” And he scurried into the underbrush.

      But that was simply saying I was willing to pay for him to say what he believed instead of trolling.

      Tapen: Rule of thumb I’ve heard from pro gamblers (who by the way say they are working, not gambling, and it’s dull and true): Never bet what you can’t afford to lose. Makes perfect sense. I bear that in mind when I loan books.

  • Joel

    Here is a binary result where we could see a difference between a PEC-style system (granted, Sam doesn’t project individual races but the totals will still be a little different) and the 538 method:

    Tester v. Rehberg (Tester +1.2 on Pollster, Silver has Rehberg 7-3 favorite, based on fundamentals)

    I suppose you could throw in Berg v. Heitkamp in there, but the publicly available polling is low quality in my opinion.

    • Brash Equilibrium

      You can also calculate the odds form the PEC-style method. Simply download the histogram.csv data from their “For the geeks” page, sum the probabilities of 270 or more votes, sum the probabilities 269 or fewer votes, and divide the former by the latter. Voila. Odds. It’s just a probability divided by its complement.

