Princeton Election Consortium

A first draft of electoral history. Since 2004

The Presidential Meta-Analysis for 2016

June 30th, 2016, 9:13am by Sam Wang


As we have done since 2004, we are taking a polls-only approach to give a daily snapshot of the race – as well as a November prediction. This approach has an effective precision of a few tenths of a percentage point of public opinion, and performs very well as both a tracker and a forecast. Currently, the probability of a Hillary Clinton victory in November is 85 percent, based on polls alone.

Today, I give a brief tour of the computational approach.

The Meta-Analysis starts with a Python script that downloads recent state polls from the Huffington Post’s Pollster operation. Thanks to Natalie Jackson, the HuffPollster team, and dozens of pollsters for this stream of information, which forms the foundation of the calculation.

Where polls are not available, we use the election result from 2012. As I have written, this year’s Clinton-versus-Trump state polls are strongly correlated with 2012′s Obama-versus-Romney polls. Because no realignment is evident, past results are a good predictor of the likely outcome this year. At the moment, no more than fourteen states are genuinely in play.

State polls are converted to a win probability by calculating (a) the poll median and (b) the confidence with which this median is known, as measured using the estimated standard error of the mean (SEM). Because the estimated SEM is calculated from polling data, it therefore includes pollster-to-pollster “house effects” within it, as part of the variation. These numbers are passed to a MATLAB program for the rest of the calculation.

These two numbers are converted to a win probability using the t-distribution, a method that allows for the possibility of outlier events. Those probabilities are diagrammed above in the cartogram, whose areas are proportional to each state’s electoral votes (EV). Note that the cartogram is principally for display purposes, and its EV totals reflect only one combination of all possible outcomes, which number in the quadrillions.

To get a full snapshot of every possible outcome, these probabilities are compounded to generate a probability distribution of all possible outcomes:
Today's electoral vote histogram
The central dark blue bars represent the 95% confidence interval. The tails are plotted in green.

At nearly all times, this snapshot shows a near-certain win for one candidate or the other. Because this probability is usually greater than 99%, we do not report it.
Instead, in the banner at the top of this website, we report the probability of a final election win. That quantity includes the possibility of movement between the time the polls were taken and Election Day, in November.

The Meta-Margin and November predictions

A key parameter in PEC’s calculation is the Meta-Margin. This is the amount by which the two-candidate margin in state polls would have to change in order to create a perfect electoral tie. For example, today the Meta-Margin is Clinton +3.86%. This means that Donald Trump’s election odds would be perfectly even if he picked up a net of about 3.9 percentage points among currently-undecided voters – or if about 1.9 percentage points of Clinton supporters switched sides..

Because the Meta-Margin is in units of percentage, it is a nice quantity to work with. If we have expectations about how much the Meta-Margin may move, we can predict what should happen in November.

To estimate how much future movement may occur, we make two calculations of probability, both of which appear in the banner at the top of this website:

  1. “random drift”: This calculation assumes random drift in either direction by an amount that matches past patterns in polling from 1952 to 2012;
  2. “Bayesian”: In this approach, the up-and-down variation in the Clinton-Trump polling margin so far in 2016 is used to establish a “prior”, i.e. the expected range of future movement. Variation to date is used estimate the midpoint of the likely range of futurevalues. This variation is then estimated conservatively to vary by approximately 7 percentage points, in order to not constrain the November possibilities too much. assumed range is more than twice as broad as the observed variation in 2004-2012, which were very stable compared to elections since 1952. Commentators say anything can happen, so bring it on.

At the moment, we are using national polls to set the prior. Once we have enough Meta-Margin history, we will use that.

History of electoral votes for Obama
History of Popular Meta-Margin for Obama

The black curves indicate the median of the electoral vote estimator (top graph) and the Popular Vote Meta-Margin (bottom). For the EV estimator, the gray shaded region indicates the calculated 95% confidence interval. This confidence interval includes sampling error, variation in biases among pollsters, and changes in opinion during the period when the polls were taken. Because pollster biases tend to cancel one another on average, the true 95% confidence interval is smaller, typically less than +/-10 EV.

The graph’s calculations are explained here. Briefly, the red and yellow zones show a prediction range that combines random drift from current polls with a Bayesian prior. This Bayesian prior is calculated from the assumption that the average Clinton-Trump margin in national polls since January gives the center of the likely range of election outcomes. The prior has a Gaussian range with a sigma of +/-7%, consistent with 1952-2012 but larger than the amount of movement in the 2004-2012 election cycles. In other words, the prior is set to allow anything reasonable to happen.

The red zone is a “strike zone” showing the 68% confidence interval of probable outcomes. The yellow zone is a “watch zone” that shows a combination of the 95% random-movement confidence interval and the 95% gray-zone confidence interval. The November outcome is nearly certain to be within this range.

Tags: 2016 Election · President

49 Comments so far ↓

  • Matt McIrvin

    That crazy Kansas result mostly points out the need for more red-state polls in this cycle. (My impression is that Zogby is not well-regarded these days.)

    • Sam Wang

      I agree, it is weird. But chill – it will all settle out in time.

    • Matt McIrvin

      Hey, I’d personally find it excellent if Kansas swung Democratic over Trump. I just don’t believe it.

    • Todd S. Horowitz

      Kansas is in the middle of a major financial crisis created by Brownback’s conservative economic policies; they might be much more receptive to a Democratic candidate this year.

  • Dell Martin

    Don’t predictions based on polling assume that the two candidates/parties have equal or equivalent “ground games” in place in most states, particularly battleground states? I ask this since it seems that Trumps complete lack of local organizing at this late date is unprecedented and may make a major difference in actual voter turnout.

  • Doug Johnson Hatlem

    How many state polls does HuffPo have to report before you switch over to state polling rather than 2012? I am looking at Colorado in particular.

  • Matt McIrvin

    Man, what is up with Huffington Post’s general-election popular-vote model? It seems numerically unstable–the history of it keeps writhing about retroactively as new polls come in.

    • bks

      They switched over to an inferior flashy interface to make it seem more interactive. In the process they broke some of actual graphing and robbed us of a way to cache custom graphs.. Whether that explains the phenomenon you’re seeing, I don’t know.

    • Matt McIrvin

      They seem to have two different models–a simple one that works fine and is used for all the custom graphs, and a much fancier one that automagically kicks in when you use the default settings. The fancier one seems weirdly broken, though; I wish you could just turn it off.

      I know that for a while it was subtly broken because they were accidentally adding together some partisan breakdown figures with the main result before doing the curve fitting, but they say they fixed that and it still keeps changing in weird ways.

  • Amitabh Lath

    Regarding the projection to November, June is a cruel month. One hypothesis is that different campaigns consolidate their party factions at different times, and that feeds into the large variations in the pre-convention months. Post convention the parties are all one bug happy family and the polls become more predictive.

    One twist in 2016, one (or the other) party may not quite consolidate, or consolidate slowly in the post-convention months. In that case, the historical precedent might not hold.

  • Rob in CT

    It’s interesting that while Clinton seems to have a strong lead, the senate snapshot would produce a 51-49 GOP win (or hold, rather, as that’s a 3-seat loss). They lose seats, but not enough of them to flip control.

    Or is there less polling on the senate races at this point?

  • Olav Grinde

    Sam, do you distinguish between Trump–Clinton polls and polls that take into account Jill Stein and Gary Johnson? If not, why not?

    Are your Meta-Margin and probabilities based only on the former, or both types of polls?

    Are there any reasons for believing / not believing Stein and Johnson will have an impact on the race?

  • Deb

    Random clarification for newbies. Can you explain the difference between using the huff pollster polls and rcp?

    • Zach

      I believe RCP doesn’t use every poll (I think they filter out partisan polling firms). So HP’s data is more comprehensive.

  • Mark F.

    I’m thinking that Trump will need to win FL, OH, NC and PA to win. Lose any of those and Clinton wins. What do you folks think?

    • Josh

      If Trump wins PA it’s pretty likely he’ll also have won FL and OH and NC…and IA, and VA, and NH, etc.

    • Richard Vance

      Like last go round you will know how the night will end when PA reports.

    • Sam Wang

      For me, NH. Provides a calibration for the whole analysis.

    • Matt McIrvin

      @Josh: That would have been true in 2008 or 2012, but this year, PA is running closer than all of those other states (except NC)–maybe because of coal country. It’s one of the subtle geographic changes that has happened relative to past cycles.

    • Josh

      @Matt Perhaps PA is a bit more competitive this year than in 2012 but there’s not a lot of evidence for that so far: both RCP and HuffPost show just two Clinton-Trump polls in PA have been published in the last few weeks. Until we see a solid amount of polling coming out of PA showing the race closer than the national average I’m going to stick with the 2008/2012 assumption that PA is a few points left of the country as a whole–definitely “in play” but not a true tossup.

    • Matt McIrvin

      @Sam: And NH reports early, too, since it’s a tiny Eastern state.

    • Sam Wang

      Exactly the point. If NH comes in close to pre-election polls, then evidently polls are accurate. We already know the entire pre-election snapshot, to be precise. The single real-life measurement then anchors the entire calculation. On Election Night in 2012, I basically stopped watching returns once NH came in. And tuned in to Fox to see what Megyn Kelly and Karl Rove would say.

    • Matt McIrvin

      I think you’re basically right–while Trump could theoretically win without those states, if he doesn’t get them, he’s probably not going to.

      I think if Hillary wins PA it’s clearly all over. If Trump wins PA, it’s not all over–but if Hillary wins Florida, yeah, it’s not merely over, it’s really most sincerely over. And if the race goes the way it’s been going, Florida may not be called as late as it was the last few cycles.

  • Brad Davis

    Are the individual pollsters using likely voter models? And if so, how much could this affect the accuracy of these predictions? Could we get an estimate on the size of the effect that this could have by looking at the demographics of his supporters compared to that of typical GOP voters. Is this much ado about nothing?

    • Sam Wang

      1) not yet, mostly
      2) probably not much – those models are of surprisingly little consequence
      3) in my view, yes

    • Michael Coppola

      I believe Rasmussen is the only pollster using a likely voter screen at this point in the cycle. I also believe that it is their LV model that led them to miss badly in 2008 and 2012. They just released a Trump +4 LV poll today, which may indicate that they have not revised their LV model and that they are likely to be wrong again.

    • Brad Davis

      Thanks for replying Sam. Have any of the individual pollsters who use likely voter models to affect their predictions ever released their original data without applying the LV correction and with the LV correction?

      In general my feeling is these kinds of approaches usually just trade off variation for bias.

    • Kevin O'C

      Actually, I believe you’ll find Bloomberg and the recently released Loras poll from Iowa also uses “likely voter” screens as well. The well regarded Marquette poll uses likely voters as well, but they also report the registered voter results.

      Generally, most pollsters don’t trust likely voter screens this far ahead of an election. You’ll see more 30 days before the election. The huge variance between Rasmussen and Bloomberg is one problem with likely voter screens this far in advance. The registered voter samples are more consistent.

  • JB

    Any possibility of making the python script available?

  • Olav Grinde

    I am still trying to fully understand the Meta-Margin.

    You wrote (my emphasis): “…today the Meta-Margin is Clinton +3.86%. This means that Donald Trump would have to improve or outperform, on average, by nearly four percent in order to have an even-odds chance of winning the election.”

    Does this mean that the actual margin between Hillary and That Donald is, roughly speaking, twice as great as the Meta-Margin?

    • Sam Wang

      try the rewording

    • Olav Grinde

      Yes, very clear now – thank you!

      For some reason I didn’t interpret the Meta-Margin as votes Trump would have to gain from undecided voters.

      I mistakenly thought he had to gain it from Clinton (meaning of course that she would have to lose a corresponding number of voters). Hence my erroneous thought/question about the actual margin being roughly twice the Meta-Margin… :)

  • Amit

    I am a bit confused by the description of meta margin.

    I always understood that the meta margin captures the overall movement required in polls so as to effect an EV tie. this need not be in each state, but is the weighted average of win probability and state EVs.

    However, the description above states that DT would need to make that up in EACH state, which may not be the case. For instance, he could instead outperform only in key battleground states and even under perform in other states (not likely, i know), but still win.

    My question is – is my interpretation correct in that the meta margin represents the path of least resistance for DT to create a tie, or does it represent the movement required in EACH state?

  • Matt McIrvin

    …Also, the Meta-Margin continues to be 2 or 3 points smaller than Clinton’s lead in national polls, which Sean Trende recently argued is actually broadly comparable to her averaged lead in state polls.

    Is Clinton at a two- or three-point disadvantage from the Electoral College? It’s kind of surprising, if so; my impression had been that any EV vs. PV advantage this year was more likely to be very small and in the other direction.

    • Sam Wang

      Could be, but I suspect it’s just that state polls have to catch up. It’s hard for the Electoral College to impose that much of a disadvantage.

      That said, Trump is running weakly in deep-red states, less so in swing states. The fact that he is competitive in Pennsylvania could be enough to account for the gap.

  • Al Nigrin

    Woo Hoo! The PEC Meta-Analysis is back! Thanks Sam!

  • Matt McIrvin

    538 has started their general-election analysis too. It’s interesting to compare the two: 538′s “now-cast” currently seems much more favorable to Clinton than this map, with Clinton leading in Kansas and Arizona. I assume that’s the effect of using medians (an easy way to deemphasize really weird outliers) vs. whatever weighted-mean stew 538 uses. And then in their November projections they fuzz out the probabilities somehow to make everything look more normal.

    Their top-line November prediction is pretty much in line with PEC’s, though.

    • Sam Wang

      They just use polls, like us. Our database, from HuffPollster, lacks that Kansas poll. On my to-do list is sticking it in manually.

    • 538 Refugee

      538 has Clinton with and 80.2% chance at this point. I can see the decimal point adding ‘cred points’ but I don’t know enough about statistics to know if it is mathematically justified. At one point I would have trusted Silver to respect the concept of significant figures. It feels wrong to claim tenths of a percent five months out though.

    • Sam Wang

      Reporting the tenths place like that is b.s.

      Near the midpoint, a 1% change in probability is equivalent to about 0.003-0.007 percentage point of margin in popular opinion. This is why I round probabilities to the nearest 5%. It is also why I report Meta-Margin to less than a tenth of a percentage point. The two are comparable.

    • Brad Davis

      It seems like a lot of people confuse ‘accuracy’ for ‘precision’, but I guess that’s why they call them ‘cred points’.

    • Hanif Samad

      That’s a very interesting difference between PEC’s snapshot and 538′s now-cast: Sam’s EV probability distribution is much spikier than Nate’s. It suggests that states are much more closely correlated in PEC than in 538, which is what we’ve always expected (e.g. Trump is unlikely to win Pennsylvania while losing Ohio, making certain EV combinations much more likely). Nate is being too clever by half by explicitly modelling *error* from interstate correlations using demographic and regional factors since there is no disciplined way to do this (are polls underestimating WWC turnout? Hispanics? Both?). By accounting for everything it says nothing and just ends up making certain combinations of EV look unreasonably likely.

      This goes to the heart of modelling philosophy: how much should you say about the election knowing what we know now? The regional correlations ‘baked’ into state polls should form a very strong prior for the shape of the final EV distribution and absent any evidence, it is presumptuous to integrate over permutations of systematic error instead of just saying the aggregate of polls might have systematic error. If one still wishes to place bets on his or her demographic polling error theory of choice, what would be far more informative are scenarios: from this prior, what would happen if polls in the Northeast are systematically underestimating WWC turnout? Hispanics in the Southwest? Overestimating black turnout in the South? And so on.

    • Matt McIrvin

      538 has always been like that: even back in 2008, the EV distribution from their model had way more spread than Sam’s. They hedge their bets in various more or less arbitrary ways.

      I prefer Sam’s approach: he only really computes a detailed map for the “now-cast” and sticks to simple numbers for the November projection. It’s a better indication of what we really know and don’t know, while simultaneously being more certain about the things we actually can be certain about (e. g. who is leading now).

    • Matt McIrvin

      …And in the end, after all that effort, the “how probable is a Clinton win?” number they extract is pretty much the same as Sam’s or, for that matter, the betting markets’, to the precision that these things can even be said.