Princeton Election Consortium

Innovations in democracy since 2004

Outcome: Biden 306 EV (D+1.2% from toss-up), Senate 50 D (D+1.0%)
Nov 3 polls: Biden 342 EV (D+5.3%), Senate 50-55 D (D+3.9%), House control D+4.6%
Moneyball states: President AZ NE-2 NV, Senate MT ME AK, Legislatures KS TX NC

State-poll snapshot: Clinton 336, Trump 202 EV; Meta-Margin +4.2%

May 31st, 2016, 10:32am by Sam Wang

Thanks to Brad DeLong, who has collected some of my oldies from January 2016. Back then, I outlined how polling data and Republican party rules pointed clearly toward a Trump nomination.

Here at PEC we are ramping up slowly. In the top banner is a preview of the main product: a snapshot of state polls. It’s basically the same as previous years, a day-to-day snapshot of state polls for the general election. This year, I am also adding a November prediction.

Brief notes:

As longtime readers know, I use all available state polls to calculate an estimate of where the Presidential race stands on any given day. The methods are detailed here. Basically, I use the median and estimated uncertainty of each state’s recent polls to calculate a win probability. Then I calculate a compound probability distribution of all possible combinations (2.3 quadrillion of them) combined, which gives the histogram you see above. That is a snapshot of current conditions. As you can see, Hillary Clinton would win an election held today.

The Meta-Margin is defined as how far the Clinton-Trump margins in state polls would have to change, across the board, to create a perfect knife-edge race in which the median outcome is an electoral tie 269 EV to 269 EV. Today the Meta-Margin is Clinton +4.24%. This is nearly identical to today’s HuffPollster national-poll estimate, Clinton +4.3%. So state and national polls are perfectly matched at the moment. Note that national polls sample more frequently tend to capture swings in opinion before the state poll aggregate does. If Hillary Clinton’s current downswing lasts long enough, it will soon become apparent in the state-poll snapshot.

Obviously, enough states are safely Democratic or Republican that the number of possible outcomes is far less than 2.3 quadrillion. Only about 14 states are in play, for 16,384 possibilities. I use polls for all 50 states and the District of Columbia where available. In races lacking polls, usually in places where the outcome is not much in doubt, I use the result from the 2012 Obama-Romney race as a starting assumption. There has been a lot of talk this year about how all bets are off. However, as I showed a few weeks ago, Donald Trump has not scrambled the electoral map in any meaningful way. The sole exception so far is Utah, where Trump is running over 30 percentage points weaker against Clinton than Barack Obama did against Mitt Romney. However, Utah is so strongly Republican that this is probably not going to change the November outcome.

On the off chance that sparsely-polled states start to look competitive, or if evidence emerges that the electoral map is actually scrambled, I am considering using the Google Correlate method to fill in missing data. Google Correlate was remarkably useful tool in the primaries, and is an attractive alternative to using 2012 election results.

Estimating the probability of a November outcome requires an estimate of how far polls may move in the coming 160 days or so. I calculate a November win probability based on the simple assumption that polls are likely to move as much as they have in past races from 1952-2012. The idea is that state polls are accurate in the home stretch, but may move (all states together) by an amount that can be estimated using the movement in national opinion. Past movement since 1952 serves as a guide to the likely range of movement.

I mentioned before that February/March/April polls may be slightly more predictive than May/June polls. One possible reason is what Charlie Cook pointed out today is the case this year: the Democratic Party’s nomination process is not settled. He suggests that Clinton’s dip in polls is likely to be temporary, and that she should bounce back after the last primaries on June 7th. Logically, that might suggest using a longer time window of polls for making a November prediction. For now my inclination is to just use the current snapshot as a starting point, which has the disadvantage of giving a prediction that may move around too much…but has the advantage of being transparently based on today’s snapshot, which I think is more in keeping with the spirit of this website.

Note that 2000-2012 showed less variation than 1952-1996. By using a historical baseline period spanning more elections, I am making a conservative assumption that allows for the possibility of a large change, as occurred in 1964 (Johnson v. Goldwater) or 1980 (Carter v. Reagan). For a further discussion of what baseline period is appropriate to use, see this PEC thread.

Let me close with a broad statement. In the news you will see some rather hysterical statements about how all bets are off this year. That is true to an extent: on the Republican side, the national party’s positions and their rank-and-file voters’ preferences are far out of whack. In a deep sense, their decision process in 2016 became broken. But that does not mean that opinion is unmeasurable. Far from it. In the aggregate, pollsters still do a good job reaching voters. And voters are still people whose opinions move at a certain speed. To my thinking, polls may be the best remaining way to assess what is happening.

Tags: 2016 Election · President

62 Comments so far ↓

  • James

    Would a third-party challenger or challengers throw this all out the window? I’ve heard a Gary Johnson is polling at 10%. Bill Kristol is promising a conservative challenger (Romney?). Plus, Hillary is unpopular and could face a challenger from the left.

    • Sam Wang

      These are details, and of no importance for the calculation. Think of 1968, 1980, 1992, and 1996, years when a third candidate got substantial numbers of votes. We are not so special as we imagine.

    • Matt McIrvin

      If such a candidate becomes clearly nationally significant, it will start to register in these polls.

  • Joseph

    Thank you, Professor Wang! So much strum und drang, so little factual information. We really needed this!

  • Matt McIrvin

    And we’re off!

    Trump hasn’t scrambled the map, but I do think that the parameters of the map have been slowly changing. The Democrats’ strength is very gradually increasing in the coastal South and ebbing from the Rust Belt and Appalachia. But these are processes that were going on before Trump and may be too subtle to flip a lot of states this time.

  • MAT

    Yea! Been waiting for this with bated breath. Very curious to see how adding Google Correlate pans out – I’d urge you to do it just to see how it holds up.

  • anonymous

    This does point to the senate races being the nail-biters this election season. As for Brad Delong’s summary, it is nice, but does not seem to quite make up for the lack of credit given to Sam elsewhere in the media for his accurate Republican primary prediction. Even when there is polling data available, I think a disparity between specific state polls and either Google Correlate or nearby county polls from neighboring states might function as a warning sign for systematic polling errors in that particular state.

    • anonymous

      Also, isn’t the compound probability distribution for approximately 2,252 trillion (2^51) possible outcomes? Not that it matters, even with Utah included, the number of ‘swing’ states is probably less than 20, so only a little more than 1 million outcomes may need to be examined, the rest are going to be constant.

    • AySz88

      FYI, I always felt that looking at all the possibilities is unnecessary anyway – you can use a technique called Dynamic Programming. For the general idea of it, a good starting point is the Zero/One Knapsack Problem:

      Also, in the FAQ, Sam says he is using a more-clever solution involving a generating function:

    • Roger Moore

      To be picky, there are actually more than 2^51 possibilities because Nebraska and Maine each split their electoral votes, with one going to the winner of each congressional district and two going to the overall state winner. That means there are 4 possible distinct outcomes in Maine and 6 in Nebraska*, so the total should be 2^49*4*6 = 1.35×10^16 rather than 2^51.

      *There’s no way for Maine to result in a 2/2 split; if one candidate wins both congressional districts, they will also win the state as a whole. This means there are only 4 possible outcomes rather than 5. In contrast, it’s possible for there to be a 3/2 split in Nebraska if one district is a blowout and the other two are close.

  • fladem

    Here is the flaw in this analysis, which stems from an extreme over-confidence in prediction.

    Let’s start with some history. The NCPP has a document with the last poll results from 1948 to 2000.

    Let’s ask a question: in how many of those races were the polls off by more than 4 points?
    1948 – error on margin 10 points
    1952 – error on margin 9
    1956 – 3
    1960 – 2
    1964 – 5
    1968 – 2.5
    1972 – 2.7
    1976 – 2.1
    1980 – 6.3
    1984 – 4.3
    1988 – 3.2
    1992 – 2.2
    1996 4.1

    You might say polling is better now. I challenge the assumption. Harris and Gallup went door to door, and their refusal rate was far lower than pollsters get now.

    There are more polls now – but the average polling missed pretty badly (and in the same direction) in 2012 nationally.

    So I would argue you are systematically underestimating the volatility of elections.

    More broadly, I think there is significant confusion about the extent to which state polling is independent from national polling. The odds of a 5 point shift in State A and State B is not prob of 5 point shift in State A times prob of a 5 point shift in State B.

    This in turn is based on a fundamental mistake about what causes late shifts in elections. The British Prime Minster Harold McMillan once said about politics “Events dear boy, events”. Too much of the analysis assumes late shifts are a result of polling error, and not changes in actual opinion. In 1980 it was last minute talks with the Iranians over the hostages. Other examples can be given as well.

    Of course, such events are fundamentally unpredictable. And yet empirically I would argue that happen frequently enough to case significant doubt on the probabilities I see offered.

    • Sam Wang

      This comment contains errors and wrong assumptions. Please re-read the methods, as well as past posts.

      The estimate of May-November movement is based on actual volatility. The idea is that state polls are accurate in the home stretch, but may move by an amount that can be estimated using the movement in national opinion. These are simple assumptions, and include the idea that national opinion tends to move uniformly.

    • Matt McIrvin

      These are all things you could have equally well said in 2004, 2008 and 2012, all years in which simple state-poll aggregation called the presidential election with high accuracy in the EV.

      It’s possible that the shift to cell phone-only households has increased to the extent that there’s more systematic error now, but that hasn’t shown up in results yet. Sam’s methods have a good track record.

      Where they don’t have as good a track record is in midterm congressional elections, where turnout is probably harder to predict.

    • Matt McIrvin

      …Note that in 2012, many pundits attributed Obama’s win to exactly the kind of late-breaking event that is supposedly so important to consider (Hurricane Sandy). But the state polls still called the EV count right on the nose.

    • Josh

      “There are more polls now – but the average polling missed pretty badly (and in the same direction) in 2012 nationally.”

      No. National polling averages the week before the election had Obama +2; the final tally was Obama +3 and change.

      At the state level, Sam, Drew Linzer, and Nate Silver all got either 49 or all 50 states correct based on polling averages.

    • alurin

      The British Prime Minster Harold McMillan once said about politics “Events dear boy, events”.

      Macmillan would have made a great pundit.

    • Phoenix Woman

      A friend of mine recently chatted with Rob Daves, the guy behind the original Minnesota Poll and a well respected pollster.

      Per said friend, Daves says that the cellphone issue (the one that Bernie people like to say renders the polls they don’t like meaningless) was solved years ago and that only crappy pollsters still have problems with it. Michigan was also accurately predicted by a few quality pollsters who took into account things like the long history of gaming of that state’s open primaries.

    • Matt McIrvin

      The thing I could see genuinely killing poll aggregation as a method of calling elections would be a concerted attempt to spam the aggregates with enough fraudulently slanted polls to shift the result one way or another. Certainly some of these already exist; Nate Silver tries to deal with them by grading and weighting for poll quality, and Sam deals with them by using median-based methods and assuming they are uncommon enough that that will minimize their effect. But a real storm of them could make this hard to do.

      Also, if the US went to a national popular vote, it would make this harder, because national PV poll aggregates tend to be a little less accurate than these EV-counting exercises (not hugely less accurate, though).

  • Brian


    You wrote, “Today the Meta-Margin is Clinton +4.24%. This is nearly identical to today’s HuffPollster national-poll estimate, Clinton +4.3%. So state and national polls are perfectly matched at the moment.” Last week you concluded that, historically, national polls are currently at their least-predictive state:

    Do you think that the currently tight agreement between state and national polls could say anything about this race and these candidates? Or do you feel this is a largely expected phenomenon?

    • Sam Wang

      Both are snapshots of the race today; one is filtered through mechanisms of the Electoral College. The two estimates are based on different datasets. It is good when they are in agreement, but not surprising.

    • Matt McIrvin

      Since you’re using average of last three polls at a time when the frequency of polls in some states is still rather low, doesn’t this EV count actually use data going back, in some cases, a few months?

      If so, there’s reason to believe that the current EV snapshot is too optimistic for Clinton, since her national numbers have declined dramatically over that period. It may not be such a huge difference, though, since the states without a lot of polls are mostly not swing states–with a few glaring exceptions.

      (Also, HuffPo’s aggregate is down to Clinton +1.9 last I checked.)

      There was a spate of state polls a few days ago that looked really grim for Clinton: Trump winning Oregon, tied in NH, within 4 points in New Jersey. But I’m suspicious of them because, taken at face value, they’d imply that Trump would have to be winning with a near-landslide margin nationally, and I see no indication of that. It didn’t stop people from freaking out, of course.

    • Matt McIrvin

      EV counts and Meta-Margin seems to be creeping downward now as the further drop in Clinton’s national support we saw in recent weeks filters into the state numbers. There are suggestions of a major rebound in the making; if that holds I’d expect it to show up here with some time lag.

  • we_are_toast

    A bit of a side track question.
    Have you noticed a correlation between the “movement” of National polls VS swing state polls, especially in summer and fall?
    It seems this might be a way to measure the effectiveness of the massive amounts of money spent by candidates. If the movement between swing and national polls is tightly correlated, then campaign spending (mostly TV ads) is either offsetting or completely ineffective.

  • NeilP

    It would also be nice to have the “Now” probability. I don’t think I can back that out of the information provided although a Trump win has an extremely low probability < 0.1% looking at the histogram of possible outcomes.

    • Sam Wang

      I won’t be providing that. It would spend nearly all of its time at either >99% or <1%. More fundamentally, it is not a helpful measure because it quantifies the probability of an event that does not happen, the mythical “election held today.” This would then be quoted with a false feeling of certainty on other websites and in the news. I do not want this coverage.

  • Joel

    Taking the data that you provided in the linked PDF, final polling error for the leading candidate has declined around 0.19% per election cycle between 1936 and 2000. This is using a coarse least squares method, but it passes the eyeball test.

    In other words, polling *has* gotten better. And this is easy to explain, since according to the same document, the number of national pollsters has increased from one (the first seven polls in your dataset) to twelve (2000). The three biggest errors in your data set come from single pollster elections (1936, 1948, 1952), which makes complete sense.

    The number of pollsters hasn’t declined since Bush-Gore, and it’s easy to expand on that (now quite dated) report, using just RCP data:

    2004: candidate error 0.9, number of pollsters 14
    2008: candidate error 0.3, number of pollsters 15

    Just for kicks, I’ll include 2012 when RCP seemed to be selectively excluding some pollsters in their final averages:

    2012: candidate error 3.2, number of pollsters 9

    In other words, a pretty convincing argument for the existence of sampling error. But this doesn’t really address what Sam is doing here, because the national polling isn’t used in the meta-margin forecasts, and his method has worked convincingly better than straight national polling + error predictions for the last three elections.

  • Marlene Snyder

    Sam–thanks so much! You help maintain my sanity in this national delirium of ‘swen’ (it isn’t news, it is the reverse of news…)


  • E L

    Thank you, Sam. Now when will you get Trump’s attention? Your heritage should be a red flag to him. I can hear him now. After the election, you can proudly put “Trump target” on your resume.

  • Kevin

    Why 2? There is a 3rd party! :)

  • Amitabh Lath

    Welcome back meta-margin! Good to see this running again. I am feeling nervous about a) November predictions, and b) Google Correlate.

    The Wlezien data is probably the best there is to extrapolate to November but damn it’s low statistics (lower if you remove the years with an incumbent running). Hard to tell if the poll movement distribution is gaussian. Your post doesn’t say, but I bet you are using some fat-tailed Student’s-t. More conservative than a gaussian.

    As for Google, I know it worked well in the primary. But general worry about machine learning: in October we’ll be trying to dissect why beanie babies correlate with Trump and lawn tractors with Clinton. Difficult to estimate an error envelope. If you do put in an estimator based on Google I hope you allow the reader to switch back and forth with the old-fashioned poll-based estimate.

    • anonymous

      As far as I can tell, there is no machine learning, it is a direct correlation between the search term and the candidate approval/votes. I personally think it would be great if someone (at google?) could dissect the relationship between a search term and a particular candidate’s appeal across states. I just realized that Sam’s independent nearby-county method may not apply here as a sanity check, as the state polls probably do not provide county level statistics.

    • Ed Wittens Cat

      There’s no machine learning or training that I can see in N’s fab method– its pure Data Science, isnt it?

    • Amitabh Lath

      My understanding is that the google search terms are selected, and weights calculated in areas/states which have verifiable results (either polls or elections).

      Then based on the frequency of searches for these terms in unknown states, one predicts poll results.

      So the first part is like the training one does with a neural net, setting the weights of the hidden nodes using known data. The second part is using this training on a unknown sample.

    • anonymous

      My understanding is a little different: it seems to be a correlation coefficient calculation between the search term frequencies in the known states and the known polls/votes for each candidate in the known states. There is no re-weighting, the top search terms provided by google are just the ones with the highest correlation (and google probably has some super-efficient algorithm to get those correlation coefficients). Once one has the most correlating search terms, the frequency of those search terms in the unknown states can directly predict the polls/votes in the unknown states.

    • Sam Wang

      This is correct. Note that an optimal estimator is constructed by using each search term’s correlation coefficient as a weight to make a linear combination.

  • anonymous


    To address your tongue-in-cheek question seriously, the other parties are likely to mostly take votes from either the Republican or the Democratic nominee, but not both. This should bolster either Trump or Clinton, and should make the probability of a third party win in any state negligible. It might have been different only if Bloomberg was running, since he might have drawn support from both parties, and hence introduced some unpredictability.

    • emmy

      From what I understand Bloomberg did some polling (heh!) and found that he would take support primarily from the Democratic candidate, so he didn’t run.

    • Matt McIrvin

      Yes. Bloomberg seemed to believe he could actually win if Bernie Sanders was the Democratic nominee, but he knew he had no hope if it was Clinton.

    • Chip

      Bloomberg may well have seen an opportunity to run if Sanders were seen to be on track to nomination, deeming him extremely weak in the general election.

      But when writing in March about his decision not to run Bloomberg made no mention whatsoever of Sanders or Clinton. He said, “As the race stands now, with Republicans in charge of both Houses, there is a good chance that my candidacy could lead to the election of Donald Trump or Senator Ted Cruz. That is not a risk I can take in good conscience.” I found it telling that despie his centrism he believed Democrats would be more amenable to him than Republicans.

      I remember that in ’92 it was widely considered a bold move for an Arkansas governor to choose a Tennessee running mate, as it did not broaden the appeal geographically or politically. So I wonder if an Illinois/Arkansas/New York candidate might choose New Yorker Bloomberg as running mate…

    • Bela Lubkin


      I think you’re misreading the reasoning behind Bloomberg’s comment that his candidacy could lead to a Trump or Cruz presidency.

      It wasn’t that he would draw disproportionately from D voters, thus giving a traditional win to the R candidate. Rather, he might win a few states, leaving both D and R candidates’ final EV totals below the magic number of 270. You might have seen, for example, Clinton 260 / Trump 230 / Bloomberg 48, on election night.

      What happens then? The election moves to the House of Representatives, where each state delegation gets a single vote among those candidates who received any EVs at all (actually the 3 candidates with the most EVs — which is surely all of them, this year, unless there’s a true unfaithful elector).

      As there are 30+ R-dominated state delegations, it’s a fairly safe bet that the R candidate would be named president. Even if he were Donald Trump, and even if the actual EV totals were Clinton 268 / Bloomberg 268 / Trump 2.

      (All EV totals invented for example purposes out of thin air, without regard to actual distribution among current states…)

    • Matt McIrvin

      Yes, but in reality this sort of thing rarely happens: I guess the closest we came in the 20th century was Nixon-Kennedy-Byrd in 1960. Bloomberg would have to win a few states while also not acting as a spoiler overwhelmingly for one side or the other. Depending on the breaks I could see it turning into either a Clinton or a Trump landslide.

    • Mark F.

      The assumption is that the Libertarian Party candidate draws mainly from Trump’s voters, but is that really the case? Anti-war and civil liberties minded Democrats might be drawn to Johnson as well.

  • emmy

    To be in line with the gist of this post, we won’t know if Trump’s scrambled the map (or is simply on track to get a 2004 map and narrow win) until July or August. There is some suggestive polling out, but it’s only suggestive, and well, we need moar data. The sitting and waiting is hard, whether you would like Trump to win or you’d prefer him to lose.

    Clinton’s in a big downswing but her contests are going to be wrapped up in a week (the one remaining will be irrelevant to the horserace narrative), and the polling could swing into place for her. Or it might not. The “other” and “undecideds” will presumably start deciding very quickly either way when there’s only two real nominees to pick from because the process is wrapped up.

    • Matt McIrvin

      Unless Sanders keeps insisting that he’s going to try to flip the superdelegates at the convention, and enough of his supporters believe that he might, which would keep it going for another month and a half.

  • Billy

    I work in genomics, and imputation is a really interesting approach for dealing with missing data. I hadn’t seen the Google Correlate post until now, but it would be interesting to see what other methods can be used to fill in sparsely polled states.

    On the other hand, would it be possible to use a Google Correlate only model instead of polls? It would depend on how the Google userbase relates to actual votes.

    • Amitabh Lath

      According to my understanding, the Google method requires some areas where the answer is “known” and the search term weights can be determined. In the primary, this was done (by N.) in states that had already voted, and extrapolated to states that had not been polled extensively. It did reasonably well in Indiana.

    • 538 Refugee

      Wouldn’t the polled states count as “known” for this? Just because you don’t have the final results doesn’t preclude assuming some level of accuracy of the polling.

      The flip side is that any state seen as contested will be polled pretty heavily. This method could suggest some sleepers that maybe deserve to be polled more.

    • Sam Wang

      This is correct. I am interested in the idea of having an unbiased way to identify hidden upsets – or the absence of them.

    • G Washington

      Another way to deal with missing data is to use a principal component analysis, where you use reconstruct the missing results. One could feed previous results (such as past presidential elections, but also other things such as opinion polls on various issues) to determine the eigenvectors.

      This is similar to a simple correlation matrix, but has some additional information and allows you to cut out “noise.”

    • Some Body

      Using polling as input is unavoidable for this method, but has downsides. There’s polling error that gets baked into the input, and if there’s systematic error in the polls for some reason, the Google Correlate method won’t be giving an independent estimate.

      Also, will it be the median or average that will be the input, or each poll separately? The individual poll results have clearly-defined field dates, but a lot more error of all kinds.

      Either way, I definitely hope you decide to put the method to use for the general election, regardless of any considerations about changes in the identity of swing states, if only to see how well it holds up for future use.

  • Mace

    So, what would ” if evidence emerges that the electoral map is actually scrambled” actually look like? How many polls over how much time do you think it would take to trigger an examination?

    • Sam Wang

      This is an excellent question. It requires a measure of election-on-election relatedness that does not care if all states shift by a fixed amount. The correlation coefficient might do the trick.


  • Mark J

    I wonder about the enthusiasm effect. That is, does primary turnout have any predictive ability?

  • Allen_Insight

    Professor Wang,

    Thank you for all your hard work on these predictions. Really appreciate it!

    Now that we see Hillary is likely to win by a large margin, I believe it is time for an update on your “Gerrymandering Creates a Point of Weakness” analysis from 2013. Many Democratic strategists seem to believe the Republican House Majority is permanent – much like the law of gravity. But your analysis from 2013 seems to indicate that Republican House Majority is very vulnerable in a wave election because Republicans are spread much more thinly across their suburban / rural districts compared to Democrats who are packed very tightly into more liberal urban districts. A big enough Democratic wave would probably be a combination of High Democratic turnout and Low Republican Turnout. And I am guessing the loss of House Republicans would be far above the conventional wisdom of 30 seats at most. Assuming the conventional wisdom is wrong by a factor of two, Republicans could lose as many as 60 seats in the House. But again that is just my guess. And I curious as to what your model would predict.

  • blaneyboy

    Sam: Love your site. Clung to during 2012.

    I’m a bit confused. You say in the paragraph following the histogram: “Hillary Clinton would win an election held today.” But then in the comments you say you won’t provide information to promote the “mythical” election held today. What am I missing?

    Thanks for very detailed and comprehensive explanations of what we can glean from the polls.

  • Bill Herschel

    This quote from Feynman may cover everything in the approach used by Prof. Wang. It certainly would cover the Google search method. How remarkable that voting behavior is described by quantum mechanics.

    “I am not going to explain how the photons actually “decide” whether to bounce back or go through; that is not known. (Probably the question has no meaning.) I will only show you how to calculate the correct probability that light will be reflected from glass of a given thickness, because that’s the only thing physicists know how to do!”

    Feynman, Richard P.. QED: The Strange Theory of Light and Matter (Princeton Science Library) (p. 24). Princeton University Press.

  • Christopher Brandow

    where are you getting your data from right now? Pollster RSS is not up for state-level data yet. Unless I am mistaken :-)

    • Sam Wang

      manual from HuffPollster and RCP, for now. For some reason I had not anticipated that their feed would not yet be live.

  • Nesler

    You mentioned the use of the Google Correlate method to predict results for unpolled states. In this instance, would you use aggregated polling data for oft-polled states as the seed? (In place of the actual vote results from earlier primary states that was used to predict results in later states.)

  • Paul

    Looking at that Huffpo general election poll aggregator, I notice what appears at a glance to be a fairly strong inverse relationship between the Clinton & undecided lines. (You have to customize the graph to show undecided & other _and_ change the range show that they actually show up.)

    I verified on my own machine that adding Clinton + undecided gives a line that’s much stabler than either of those two lines on its own.

    One can easily invent explanations: for example, Sanders supporters might have been happy to choose Clinton when asked about a Trump/Clinton matchup in Feb, but pointedly said “undecided” in May when asked the same.

    This suggests an interesting avenue of investigation: is some of the fluctuation in each candidate’s numbers attributable to individuals shifting back and forth between undecided and just one of the candidates, but not actually changing their candidate? And could we detect that by trying to model the undecided curve as the sum of scaled variation in each candidate’s curve? Not sure if I’m making sense….

Leave a Comment