Princeton Election Consortium

A first draft of electoral history. Since 2004

February national polls are the best you get until August

May 22nd, 2016, 12:00pm by Sam Wang


Tuesday 5/24, 8:20am: in the comments, an interesting discussion here.

Some media types are going around with their hair on fire over two unfavorable polls for Hillary Clinton in which she lags Donald Trump. In response in the NYT, Norm Ornstein and Alan Abramowitz are trying to convince you that these polls mean nothing. Nothing, I tell you! Don’t Panic!!!

In a deep sense, they’re right. As I wrote the other day, opinion can move a lot between now and Election Day. And it is inappropriate to trumpet a single poll showing an exceptional result, which is what the news channels do.

However, do not throw out the baby with the bathwater. In fact, we can learn quite a lot from polls by extracting as much value as possible from them. This can be tricky because right around now, national polls are the least informative they are going to be in 2016. To put it another way, polls will be more informative one month from now – and they were also more informative a month ago. How can this be, and what do we really know about the Clinton/Trump November win probability?

Elections scholar Christopher Wlezien very kindly sent me the data that he and Robert Erikson used to construct the graphs in The Timeline Of Presidential Elections: 1952-2008. Adding in 2012 data, I took time series from 16 Presidential campaigns and calculated the standard deviation of the total movement as a function of time. This is a measure of uncertainty about November based on polls for a given day. This graph shows the ±1 standard deviation interval in red:

(Note that in my previous post I plotted the standard deviation in the Democratic vote share. However, the appropriate standard deviation to use is the standard deviation of the Democratic-Republican margin, which is twice as large. This is why I had to revise the win probability. PEC regrets the error.)

This year, January 1st was 312 days before the election. At earlier dates, the standard deviation is between 14 and 22 percentage points. You can see the variation across 16 Presidential campaigns in the gray traces. So polls before the new year really are quite uninformative.

Now look at later dates: the gray curves converge. Consequently, the standard deviation declines, and reaches a local minimum at 270 days before the election, in mid-February- close to the start of primary season. So before the primaries start, February is a time when national polls tell us a fair amount about the final outcome.

But wait! After that, the standard deviation creeps upward. The election is 169 days from now, and in about a week the standard deviation hits its maximum value for 2016. Truly, now is the single worst time to be paying attention to fresh polling data. I don’t know why this is. It could be because typically, one or both parties are still going through an active nomination contest – as Hillary Clinton and Bernie Sanders are doing now.

Amusingly, national polls won’t reach their February levels of accuracy until August. The Clinton-Trump margin in February was Clinton +5.0%. So how about if we just use that until after the conventions. Can you wait?

No? Okay, let’s do something else. There are currently  88 national polls for 2016. We can weight these to create the best possible estimate for the November Clinton-Trump margin. For the weight, use 1/sigma for the corresponding date on the graph above. For independent observations, this weighted sum is optimal. Applied to past elections, it favors the November winner in in 14 out of 16 elections (missing Reagan in 1980 and Bush in 2000), an accuracy rate of 87.5%. This year, it gives us a weighted-average margin of Clinton +6.5%.

In short, we have a situation in which today’s snapshot (Clinton +2.7%) shows a close race with a definitive Clinton lead (93% probability according to HuffPollster), and the November outlook shows a larger average expected lead (Clinton +6.5%), but a lower win probability* of 70% – the same as what I wrote the other day.

>>>

Here are some caveats and consequences that come to mind:

1) My analysis today implies that the current movement in polls is transient. If uncertainty is larger now, this suggests that there is some natural set point for the Clinton-Trump contest – one where we had a clearer picture a few months ago than we would by watching today’s news.

My general sense of the current state of the race is that Democrats are still in the midst of their nomination process, while Republicans are coming together around their nominee. Either of these dynamics would be enough for polls to become less accurate – and to favor the candidate whose nomination is settled. If true, then we might expect numbers to move back toward Clinton after the June 7th primaries. Also possible, though less likely, is continued movement toward Trump.

2) It seems to me that during periods of increasing uncertainty, it is best to incorporate older polls, on the grounds that these data points add information and decrease uncertainty. Conversely, starting at 160 days before the election (early June), I should switch to a rolling time window, since at this point polls are becoming increasingly predictive.

3) Now is a time to pay attention to non-poll-based methods. As longtime readers know, I am generally against mixing up polls and “fundamentals”-based models. But it is a good time to consider the possibility of looking at them.

However, there are surprisingly few models worth looking at. Models are subject to conceptual and technical errors. And very few fundamentals-based models have well-understood error properties. In an exception, Lauderdale and Linzer did a particularly good job in 2012. At that time, they estimated that national vote share in their model had a 95% confidence interval of +/-7% at the national level. In the units I plotted in the red curve above (+/- 1 sigma in two-candidate margin), this probably corresponds to about +/-7%. If true, that approach would be better than polls from now through August. However, to my knowledge, Linzer (who now does analysis at Daily Kos Elections) has not come out with a public calculation this year. And so I wait.

4) (added Tuesday May 24th, 9:00am): In comments, Amit Lath points out that the 2000-2012 campaigns were less volatile than 1952-1996, so maybe that should be the baseline. As it turns out, that does not affect the November prediction very much. See my response here.

*To calculate a probability, note that the weighted-average value of sigma during the time period of January 1 to now is 11.1%. The probability is calculated in MATLAB as prob=tcdf(clinton_trump_margin/11.1,3). In Excel: =1-TDIST(clinton_trump_margin/11.1,3,1).

Tags: 2016 Election · President

73 Comments so far ↓

  • Tyler

    If you look at the trends of the polls, everything is pointing to only older women and minorities supporting Hillary by November. Meanwhile young people and college educated whites are finally catching onto Trump and Never Trump is dying out on the right. So 70% for Hillary is ludicrous. Try looking through the polls a little bit.

    That graph is a good summation of “data journalism” this election. It tells us nothing, and every other variable is off the table.

  • Amitabh Lath

    The NY Times is late to the party with a May 25 article on the Erikson and Wlezien data. The upshot of this Upshot is: it’s too early to tell. Nice plots, although only going back to 1980.

  • RodCrosby

    Just a thought, Sam. What happens if you only include “open” election years in the analysis: 1952, 1960, 1968, 1988, 2000, 2008?

    • Sam Wang

      Rod, I am not your personal data analyst. I posted the data elsewhere on this thread so that you could do this kind of thing yourself. It is not hard!

      Anyway, here is your answer. Somewhat higher uncertainty than I said before (SD in two-party margin of 12 percentage points) until early August, at which point it ramps down to the usual 2 percentage points. No bias toward either party after March 15th.

  • Marc

    If you look on the HuffPo polling side, the uncertainty of the Clinton and Trump polling averages has about tripled in the past two weeks. Hide the dots for the individual polls to see this more clearly. The uncertainty for Sanders and Trump has increased, but not nearly as much. Is there an explanation for this recent large increase in uncertainty? Has this happened in past years?

    • Sam Wang

      No, the up-and-down range of data for any given period of time looks the same to me. Zoom in to Feb 1-now.

      If the natural pollster-to-pollster range is combined with the fact that there has been movement, then it looks larger. But that is because movement and variability are combined. This is not surprising.

  • whirlaway

    “My general sense of the current state of the race is that Democrats are still in the midst of their nomination process, while Republicans are coming together around their nominee. Either of these dynamics would be enough for polls to become less accurate …”

    But what is the explanation for the fact that Sanders still has the same double-digit leads over Trump while it is Clinton’s lead over Trump (which used to be in double digits as well) that is collapsing?

    • Sam Wang

      This seems fairly obvious to me. And your facts are inaccurate.

      In both the Sanders/Trump matchup and the Clinton/Trump matchup, Trump has risen about 2 percentage points since April 18. Sanders has fallen by about 2 points, while Clinton has fallen by about 4 points. Clinton is mainly attacking Trump. Sanders is attacking Clinton. Trump is attacking Clinton. Since Sanders is not taking many attacks, he is suffering less.

    • Prehistorian

      What I find odd about these comparisons between Clinton vs. Trump and Sanders vs. Trump polls is the lack of discussion of the background to them. From the polling data I have seen, a Clinton vs. Trump is, by now, thought by c. 90% of those polled to relate to a real choice they will have to make. Only c. 10% of those polled think they will need to make a Sanders vs. Trump choice in November. That makes the Sanders vs. Trump question a highly hypothetical one, and psychologists seem to agree that hypothetical questions are not answered in the same way as questions perceived to be real.

      I wonder if Sam has a view on this?

  • Rose Bud

    “And it is inappropriate to trumpet a single poll”.

    Pun intended?

  • Joel Garten

    There is talk on this site about the media or members of either political party going crazy over just a couple polls, which is on display almost daily . I would be interested to know if the same type of thing happens in the world of science, where people are generally meant to be more rational minded and understand how statistics work. I have heard about very pitched emotional battles cropping up in science.

  • Jim H

    Any thoughts on the net impact of Libertarian candidates to D/R vote share?

  • BillSct

    These days when I sense my hair is about to burst into flame from watching to much TV news, I go over to HuffPollster and look at the Democratic and Republican party Favorable/Unfavorable ratings.

    Currently they are:
    Democratic Party 46% Favorable/48% Unfavorable
    Republican Party 32% Favorable/61% Unfavorable

  • G Dogg

    We have reached the tipping point where we can no longer read the usual media suspects & expect to get anything remotely corresponding to reality. ‘Hillary Has Blown It,” “Why Trump Can Win,” etc etc. So it is time to turn off those sites & dwell with geniuses like you, Sam, and Mr. Silver. Thank you thank you thank you for this invaluable media service you perform! Signed, a fellow academic.

  • A New Jersey Farmer

    But Sam, isn’t this election, like every one before it, different when it comes to polling? The media say so.

    The breathless recap of every poll between now and the close of the Democratic Convention will be the least enjoyable part of the process. Come August 1–the feast.

  • deb

    Does not the quality of the poll matter? At least two of these (FOX, RAS) had Romney winning in 2012 up until the very last week, no?

    • Sam Wang

      I do not think poll quality matters. You should never focus on single polls anyway! And if you aggregate them, then again poll quality does not matter.

      Also, it is almost always possible to find excuses to discount data that you find disagreeable. That’s called motivated reasoning. Don’t do that!!!

    • 538 Refugee

      Dishonest/questionable polling should be held accountable. I could envision a nice companion site to this one for that. It is one point I agree with Nate Silver on.

      1/3 of Latinos supporting Trump? Black staying home in numbers greater than even the midterms? Large numbers of white people rising from the grave to vote? Well, OK, there is precedent for the last one.

    • Amitabh Lath

      I agree, weeding “bad” polls is almost certainly going to introduce bias. Using past performance to do the filtering is problematic because those firms have probably implemented changes to their demographic models. If only there were some way to mitigate extreme outliers. Has anyone thought of using poll medians? :)

    • Joseph

      I’m wondering if the bulge in the standard deviation has to do with thumbs being put on scales as the presumptive nominees become solidified. There’s not much doubt in my mind that this is consciously done, but poll manipulation would have to start petering out as we got closer to the actual election if manipulative pollsters expect to be taken seriously next time around.

    • 538 Refugee

      Amit. I don’t mean a grading system like 538. Silver claims to have uncovered evidence of down right fraud. Impossible numbers and such. With the number of folks publishing data it isn’t hard to glean information and resell/package other peoples work while making just a few changes.

      Another interesting side would be you could maybe see at a glance where polls rank in their demographic assumptions. That could be useful and you would still be basing things off of data. I still look at the underlying numbers from time to time but doing it for all polls would be daunting. A few volunteers filling in spreadsheet type forms could provide some useful insight.

    • Matt McIrvin

      Poll quality matters beyond a certain point. Say that tomorrow James O’Keefe or Roger Stone invented 100 sock-puppet polling organizations with different names that put out entirely fraudulent tracking polls once a day (perhaps fairly realistic-looking ones but with a +3 point systematic bias for Trump). No aggregation process blind to the quality of the polls could deal with that.

  • Ken

    Is the trend for the standard deviation to increase from ca. 270 to 160 days statistically different from a flat line, i.e., no change?

    • Sam Wang

      Those two timepoints’ variances are not different by an F-test. One could look for heterogeneity over that time period with a fancier test, too. However, I think this point probably does not matter that much, since one could also weight with a flat line and get a similar answer. It seems sufficient to just go ahead and weight by 1/sigma.

    • MarkS

      To me, the wording of the post doesn’t convey the impression that the trend of the standard deviation is indistinguishable from flat: “Truly, now is the single worst time to be paying attention to fresh polling data” and so on.

    • Sam Wang

      In that date range the overall SD vs. time relationship is in fact correlated, r=+0.84. This is statistically significant (5 d.f. because of data smoothing, p<0.05).

      Now get off my lawn!

  • Amitabh Lath

    Standard deviation getting larger as we get closer to the election? This is truly counterintuitive, and therefore fascinating. I can’t even think of a possible hypothesis.

    (Summer vacations?…nah, I got nothing).

    Anyway, whatever the cause, as we discussed in a previously, I believe this is a mid-20th century phenomenon and post-2000 elections probably do not show this effect.

    • Sharon Machlis

      My guess – and it is purely hypothesis –

      1) There are still a fair number of voters not paying incredibly strong attention until after Labor Day. As a result, multi-day media coverage of party conventions may cause some voters without hardened opinions to lean toward one candidate and then the other, but that’s transitory and their “real” opinions form(or re-form) after both conventions.

      2) What Sam Wang said about the nomination contests still underway makes sense as a contributing factor. I recall a fair number of Hillary supporters vowing not to support Obama in ’08 (remember PUMA?) but when faced with the reality of the general-election matchup, many reconsidered. It takes some time for losing candidates’ supporters to get over the loss and move on. (Not all will, but some do.)

    • emmy

      I dunno, I went back and looked at the RCP polling average for McCain/Obama during May 2008 , when the Dem nomination was still considered pretty contested, and Obama had a wider lead against McCain.

      I really don’t know if this election’s underlying fundamentals are really “typical” compared to recent election cycles. But soon enough we’ll know if this is closer to a 1980 scenario or if it’s just bog-standard.

    • Amitabh Lath

      The Wlezien data for 2000 and beyond shows very little deviance. Either polling got more accurate in the 21st century or people made up their minds and stuck to it.

      If we take just the 21st century polls, then the the assumption would be that predictive power of May polls is greater, and Clinton may indeed be within a few points of Trump.

    • Sam Wang

      But Amit, don’t we still have to include pre-May surveys?

    • Amitabh Lath

      It’s a very small point, that the error envelope in May about the November result seems to change slowly with time.

      If you binned the Wlezien data by decade, the envelope denoted by the red would shrink from the 1960′s to the 90′s and 00′s. But of course statistics suffers.

    • hardheaded_liberal

      Amit, have you (or anyone) posted your analysis that shows the result in post-2000 election years? I would really be interested in seeing if/where there is a point in recent election years that begins an era when all the s.d.’s of the average polling margins steadily converge.

      Possible factors contributing to a change in fluctuating variability could include (1) the 24/7 cable news cycle, an/or (2) the much earlier start to the campaign season, especially the earlier start in exposing the candidates to the public through the party debates.

      Without some concrete criterion that plausibly explains a change over time in the trends in variability of polling averages, I am concerned that just looking at trends in variability over the last three election cycles — two of the three which included incumbents running for re-election — does not have a sufficiently broad data base to justify limiting the analysis of the average margins to the 21st-century election years.

      On the other hand, if there some consistent characteristics that differentiate years with highly fluctuating margins from other years (e.g., 1992 with the strong Perot candidacy), or if the highly fluctuating margins are found only in two or three years where the same plausible explanatory characteristics were in play (maybe 1988 and 1992?) (& the trends are consistent in all other years), such factors might suggest that the fluctuating margins are outliers, even in the 20th century elections included in the data set. (Can’t see any justification for this approach if the fluctuations were common in many years, though.)

  • Bill

    Sam,

    Thank you for this analysis. I have not been particularly concerned about the recent polls, but I have been concerned about some of what Bernie Sanders and some of his supporters have been doing of late. My concern is that such behavior can hurt Democratic chances. I have been wondering if a newer analysis by Sam would show changes from the analysis he did recently. Looks like not much change.

    One thing I have found interesting is the piece written by Josh Marshall at Talking Points memo. He published a set of emails he received 8 years ago at this same point in time during the Obama / HRC primary. The similarity between these emails and current comments at such sites as DailyKos and Talking Points memo concerning the HRC / Bernie Sanders primary is somewhat reassuring. Things may not be that much different this go-around from last time. You can see these emails at: http://talkingpointsmemo.com/edblog/eight-years-ago .

    • Robert Johnson

      I have been a big fan of Sam’s analysis since the web site started. It is the most worthwhile reading on elections and polling that exists on the internet.

      I would be very interested in any statistically meaningful assessment Sam could make of the 2008 Clinton people who said they would never vote for Obama vs the 2016 Sanders people who say they will never vote for Clinton.

      FWIW, I am one of the latter and I was never one of the former for either candidate in 2008, although I deffinitely preferred what I thought of as the higher-beta Obama vs the lower beta Clinton. (My thought in 2008 was that she would be an OK president — not a disaster but not transformative; Obama was more of an unkown both to voters and to himself, and after Bush I was willing to roll the dice with someone who had a very similar expected value to Hillary Clinton, but a much higher variance — he could be terrible b/c he was so new to national politics, or he could be great because he was not rooted in national politics).

  • Some Body

    OK. Let me play devil’s advocate for a moment: the RCP average of national polls right now is Trump +0.2 (with three recent polls showing leads for Trump, vs. two for Clinton). When you decide these results are not to be trusted, aren’t you “acting like a pundit” too? Aren’t you doing the same as everyone who dismissed Trump in the primary did in August or so? (Recall that they also, rightly, claimed polls, especially national polls, were historically not very predictive at that stage).

    • tony smith

      RCP doesn’t use all of the polls. It eliminates many. Huff Pollster and 538, however, use a more generous supply of polls. Also, using statistics is FAR from what any “pundit” does. If you are equating the nuance in this post with anything among the punditry, you are being disingenuous.

    • Josh

      RCP filters polls. Check a site like HuffPost which uses all polls–they show HRC with a small but clear lead.

    • Matt McIrvin

      In August, Trump’s lead over his closest primary competitors was already enormous: 15 points or more and rising like a rocket. The pundits were dismissing his chances even in early 2016, when he was 20 points up. It wasn’t a Trump +0.2 kind of situation.

    • Some Body

      Folks, I think you are missing my point.

      Nobody has all the polls listed, and they all skip some, but RCP actually avoid the weighting HuffPost and 538 do. In any case, this is irrelevant. The current snapshot does have the race as more or less a tie.

      A claim along the lines of “This result (which, incidentally, we really really don’t like to believe is true) should be discounted because (some manipulation on historical data, which is always sensitive to a relatively arbitrary choice of initial assumptions—see, e.g., Amitabh’s point in comments to this post about 21st century vs. mid-20th century elections)” does smack of motivated reasoning.

    • Josh

      I think you are missing Sam’s point.

      It’s not that polls “aren’t to be trusted”. It’s that polling accuracy with regard to the ultimate outcome of a race is at its lowest at precisely this moment. The amount of confidence we can and should have in polling right now is quite low relative to polling that will be done in 3-4 months or, fascinatingly, polling that was done 3-4 months ago.

      If I understand correctly, you’ve picked out five polls (of the however many polls that have come out over the last week or two), read that three of them have Trump up and two have Hillary up, and concluded that the race is effectively tied, or that Trump even has a small lead. This is problematic for several reasons:

      1) You only looked at a subset of all current polling.

      2) You only looked at polls done within [x] number of days of today, where x is a number that you chose arbitrarily.

      3) Polling done roughly 6 months before an election is, with respect to long-term averages, less valuable than at any other time within an election year.

      So you’re free to draw any conclusions you like, but the conclusion you’ve drawn here is flimsy.

    • Some Body

      But Josh,

      1. The result with May polls being less predictive than February polls is not robust. Amitabh Lath looked at the numbers for only recent campaigns, rather than since 1952, and got the opposite result. That means you shouldn’t put too much trust in the validity of this generalization.

      2. I looked at the current RCP average and at what it happened to show. People started picking on RCP. That’s really irrelevant. So suppose a “better” average, or a median, would show Clinton up by 0.5% instead, or whatever. You always have to make arbitrary methodological choices in averaging polls, so there’s a range of possible end results. Either way, current polling shows Trump doing much better than we lulled ourselves into believing he could, and here we are, again, telling ourselves the story we want to hear, about why these polls, that are better for Trump, don’t really count as much as those back in February, which were better for Clinton.

      3. Again, pick a different cut-off point for the polls, or for the past campaigns in Sam’s analysis, and you can quantitatively “justify” any result you like. You need to have cut-off points, and they are mostly an arbitrary choice. So when Sam publishes a post that just so happens to give us the news we anxiously crave to hear, I’d bend over backwards before accepting the result. So far, I must say, I remain unconvinced.

    • Amitabh Lath

      Let’s be careful, we didn’t arrive at any conclusion about the Wlezien data. Yes, there is less variability recently, but until and unless we know what the mechanism is that causes the larger uncertainty starting about 200 days before the election, we cannot say that mechanism is or is not in play in this election.

      All we can say is that it (whatever “it” turns out to be) did not seem to be a big effect in the 00,04,08 elections. Probably. Or maybe it was and got cancelled out by some other effect. For all we know 2016 is when this effect comes roaring back.

      https://xkcd.com/683/

    • Josh

      @Some Body

      Meh. Sam’s doing math, getting results, and theorizing based on those results. You can disagree or not, but what you can’t do–at least, what you seem to not be able to do–is to explain why the math and its resulting conclusions are intrinsically invalid.

    • Some Body

      No, Josh, you’re still not getting it. Sam’s math is valid, of course, but you can make a different calculation, also valid, and justify a totally different story about the race. That’s what it means that the result is not robust. For all we know, as Amitabh wrote, the effect Sam writes about may come back with a vengeance in 2016, or may disappear entirely, and this is all equally consistent with a close call, a Clinton landslide and a Trump landslide.

      So in the end, what happens is that we are telling ourselves the story we want to hear, whether it is true or not. There’s always math to back up any story we like. Math is not magic, and the fact that someone uses math doesn’t make what he’s telling you true.

      Unfortunately, Sam does have a bit of a tendency toward wishful thinking sometimes. I still remember the gleeful posts about how Republicans are screwed for the 2014 midterms because of the government shutdown a year prior (all backed by good math, of course). We all remember how that turned out.

    • Sam Wang

      Wow, getting lively down here. Here’s what I think (tl;dr: Josh has captured my reasoning correctly; Amit’s suggestion to use 2000-2012 as our baseline does not change the November outcome prediction that much):

      Generally, I do not think I am telling a story that I want to hear. If we are being frank, what I wanted was a considerably higher degree of certainty. That was not the case. I also think a result of “Clinton is favored to win, only moderately so” is an obvious conclusion that a rational person could reach after looking at all the data. However, that is not an approach taken by the news media, so I decided to write it up.

      I do agree that it is really hard to overcome one’s own biases. I’d like to think I can learn from past mistakes.

      “We all remember how [2014] turned out”: Yeah, the shutdown was exciting, but I think your recollection is wrong. That was 13 months before the election, so it was obviously too early to be predictive. Also, “I still remember the gleeful posts”? I am so pleased to find a domain in which I can write about standard deviations and be called gleeful. Anyway, in October 2013 in which I estimated that the shutdown bounce would last 2 to 6 months. Which turned out to be correct. More broadly, looking back at 2014 it is clear that I was pretty open to good news for Democrats (I consider my worst error to have occurred in September 2014). But the assumptions and data were always out on the table for people to see – just as is the case this year.

      Referring back to Amit’s point about 2000-2012 races being more stable than 1952-1996, let’s explore the numbers. In 2000-2012, from February to August the D-R margin varied with an SD of 4 percentage points. During that time the D-R margin was also about 3 percentage points too favorable to Democrats (though I am interested to note that the bias was lowest in February-March and September-October). We combine those two kinds of error to get a one-sigma range of 1% too Republican to +7% too Democratic. The 1952-2012 data don’t show that much accuracy until 50 days before the election. So yeah, 2000-2012 were really stable.

      Over the last few weeks, people have gotten excited about swings that amount to a range between Clinton +2% (now) and Clinton +11% (April). That is a 9-point swing, somewhat larger than 2008 Obama v. McCain, which showed a 6-point swing over the same period. We can’t rule out yet the possibility that 2016 will be as low-volatility as 2000-2012.

      OK, so let’s use the variability in 2000-2012 as our baseline assumption (including the excessively pro-Democratic lean in pre-September polls). In that case the one-sigma range for the November outcome becomes Clinton +3.5% with an SD of 4%. That gives her a win probability of 78%. Which is basically the same result as the 70% I gave in this essay. I should also say that contra Some Body, a Trump landslide does not seem particularly likely. If he wins, it should be close. I think this is a fairly robust result.

      However, assuming 2000-2012 as a baseline gets into low-N statistics here, which makes me uncomfortable from an analytical standpoint. I think it’s pretty clear that the GOP is having a weird crisis this year. Will their turmoil break the two-party stalemate in some way this year? Will opinion be any more movable? That is actually what Trump should want. It seems possible…so I went with what I thought would be a more conservative (statistically speaking) approach – one that gives a result that gives slightly more hope to Republicans.

      If anyone wants the dataset to play with, here it is. Everything is Democratic two-party vote share except for the actual election results, way at the bottom of the spreadsheet.

      All comments are going to moderation for a little while. A saucer to cool things.

    • Some Body

      Thanks for the detailed reply!

      Just to be clear, I definitely don’t think a Trump landslide is likely, but wouldn’t want to rule out (yet) a scenario in which his surge in the polls continues, and does not reverse later on. Intuitively, I find your (and many others’) account of the current bump in terms of the Dem race not yet being over to be the most compelling, but then, maybe there’s some other factor in play, which nobody has identified yet?

      More generally, I’m not sure how well generalizations across past campaigns will apply to this year at all, so any attempt to quantify probabilities based on the past comes, for me, with a big asterisk, at least untill we have good evidence that this campaign started to behave “normally” again.

    • Sam Wang

      I agree that it is a weird political year, but opinion can still be measured, just as one can still measure temperature in a hurricane. Poll analysts get accused of being fairly bloodless and not being concerned with the substance of campaigns…but this year, that is a good thing. However strange our politics has gotten, one thing we have left is polling. In the primaries, polls did as well as they have ever done – they allowed the recognition of Trump as a contender.

      Past campaigns can give guidance to how much, and how fast, opinion can move. In election years that movement is mostly slow, and has certain limits to how far it will move. 1952-1996 showed a lot of movement, 2000-2012 showed less movement. The conservative option for data analysis is to assume that more movement is possible. To me, these are all routine and boring statements with lots of empirical evidence.

      Also, to step back: how weird is 2016, really? 1968 was pretty weird: a divided Democratic party with a violent convention, and war that was tearing the nation apart. Polls worked then too.

      Anyway, I see no evidence to suggest doing anything but the usual. You guys (“you”=people following 2016) think you’re special. But from a polling standpoint, you’re just flocking birds to be counted.

    • Commentor

      What caught my attention in this discussion is the question regarding variability. Assuming that variability/stability in this race continues to look similar to 2000-2012, which arguably represents the current polarized state of politics, would that show that Trump’s support is no so different than McCain, Romney, or Bush’s and that the race could be driven by the same demographics and/or dynamics, whatever they are, that have driven recent races?

      In other words, can the data support the conclusion that Trump’s support demographically is the same GOP support that prior candidates have had and that there is no actual reason to think this election is any different than the past few?

    • Sam Wang

      For evidence that this year is in some ways not that different, see this post.

    • Matt McIrvin

      So far, the demographics seem similar to 2008-2012, only more so. The Republican Party is increasingly becoming the White Party, and it’s hard to believe that Trump won’t accelerate that process. The polls mostly differ in how much Latino support he’ll lose relative to Romney, and whether he can somehow make it up in increased white turnout and picking off or suppressing turnout of white Obama voters.

  • Ron

    There aren’t any Dems with their hair on fire. What Ive seen are Right Wingers and media people over report the few polls that have Trump up a couple points and under reporting the 12 that have had Hillary in the lead….

    other than that inartful opening to the article it was another great read by you

    • Matt McIrvin

      I’ve seen Sanders supporters touting them as evidence that Clinton is unelectable and the superdelegates should dump her.

  • Joel Garten

    This chart shows positive values, does this imply that polling showed greater relative strength for Democratic candidates?

    • Sam Wang

      No, the standard deviation is approximately symmetric in both directions. There’s a tiny effect in the direction you say, but it’s only 1-2 percentage points, could be chance.

  • RA

    In these 16 contests, how many times has the projected leader in early February been different from the projected leader in late May, and what happened those times?

  • Newalgier

    Why not Lichtman as a fundamental model? Can’t predict margin, but does predict popular vote winner.

    • Sam Wang

      It’s worded as pop political science, so I have avoided it. Though yeah, one could look into the math to see if it is any good.

    • Matt McIrvin

      Lichtman’s criteria are so subjective, I’ve seen people try their hand at using it and getting all sorts of different results.

  • Joseph

    For whatever reason, I cast my eye to the top of your site after reading this article and noticed that President Obama’s approval rate has climbed to +2%. I was wondering: Do you happen to know how predictive would that be vis-a-vis the likelihood of an advantage in an election year? How about for down-ticket offices?

  • bks

    Why do we have to use national polls at all? Can’t we use state polls that are available now, compare them to state pools in, say, 2008 at the same point in the cycle, compare to November 2008 and then use N’s method of extrapolating to unpolled states. (By we I mean, of course, you. I have to go to the grocery store.)

    • Sam Wang

      Yes, we could use the national-poll drift values in conjunction with the state polls. Maybe I should use this sigma calculation for the Meta-Analysis this year?

      And yes, N’s method is fantastic for filling in missing data as you describe!

      Don’t forget to get eggs.

    • Some Body

      Don’t get eggs. They’re not vegan.

      But would N’s method work as well with polls, not election results, as input?

    • anonymous

      I think N’s google correlate method will work with polls, election results, price of eggs, or anything else. It finds a pair-wise correlation between any variable x across states and many (many!) google search terms across those states, and chooses the best correlating search terms. The assumption seems to be that if the search terms correlate over a large number of states, there is an underlying relationship between the correlating search terms and the variable x. Therefore, the relative frequency of those search terms in the unknown states is predictive of the relative variation of the variable x.