Princeton Election Consortium

Innovations in democracy since 2004

Outcome: Biden 306 EV (D+1.2% from toss-up), Senate 50 D (D+1.0%)
Nov 3 polls: Biden 342 EV (D+5.3%), Senate 50-55 D (D+3.9%), House control D+4.6%
Moneyball states: President AZ NE-2 NV, Senate MT ME AK, Legislatures KS TX NC

How Our Predictions Work (continued)

September 18th, 2014, 7:20am by Sam Wang

Mr. Sullivan, this post is for you.

Even though Nate Silver has misinterpreted what PEC did in 2010 as representing how we operate today, I see this as an opportunity to explain how we make predictions in 2014. I will then come back to a point that many readers will care about more: the assumptions put into this kind of prediction can add hidden biases, whether intentional or not.

The Claim About PEC

First, to restate the claim: here at PEC, we are said to be overconfident in our probabilities. The example given was the 2010 Nevada Senate race between Senator Harry Reid (D) and Sharron Angle (R). Everybody, including FiveThirtyEight, was confident – and wrong. Angle led Reid in the last eight surveys before the election. And yes, we were part of that crowd. It was PEC’s only wrong Senate call in 2010 or 2012 (unlike FiveThirtyEight, which in addition missed two Senate races in 2012, Montana and North Dakota). [update: I made two wrong calls in 2010, Nevada and Colorado. These wrong calls were also made by FiveThirtyEight.]

The statistical error I made in 2010 – and have since fixed – is that I talked about snapshots vs. predictions interchangeably. With the polling margins where they were, it was basically certain that Angle led Reid in the polled demographic, i.e. the population of people who could be captured in surveys. But surveys are not elections. In retrospect, there were good reasons why the polls were off: a heavy cell-phone population, lots of people moving in and out of the state, and Hispanic voters who are hard to reach. Those reasons only became apparent afterward.

This kind of uncertainty can be captured in two steps:

  1. Acknowledge that there can be some small discrepancy between final polls and Election Day results. This is what scientists call a systematic error in the polling data.
  2. Use a probability distribution that is not bell-shaped, but has “long tails.” To capture the possibility of freak events, I now use t-distributions. They are much “tail-ier” than a bell-shaped curve, and capture that “hey, crazy things can happen once in a blue moon” vibe of real elections. I have come to love t-distributions.

In the 2010 case, Gaussian statistics gave an Angle win probability of >99%, which was OK as a snapshot of the polled demographic, but not as a prediction. However, using the two-step approach above, if we use a typical systematic error between Senate poll medians and election outcomes of 1.0%, and a t-distribution with 2 degrees of freedom, the probability would become 91%. This is more plausible.


How PEC Turns Snapshots Into Probabilities

Now, let me explain to you how we apply this to making predictions in 2014. Here at PEC, we delineate three issues:

  1. How to take an accurate snapshot. First and foremose, we need a way to see where a single race (or national campaign) is, right now. Although there is no election to validate the snapshot’s correctness, it is possible to take a snapshot of the polled demographic. We take a new snapshot every day.
  2. How to estimate the degree of movement between a snapshot today, and a snapshot on Election Eve. Now, how much and how quickly does that snapshot vary over time? Let’s call the amount of that movement “sigma_movement.”
  3. How to estimate the final accuracy of the Election Eve snapshot. This is the final validation: in the home stretch, how far is the last snapshot from the actual election outcome? Let’s call this difference“sigma_systematic.”

Using the terminology above, the outcome of the election is, by definition,
OUTCOME = SNAPSHOT + sigma_movement + sigma_systematic.

And if we can understand the sigmas, then we can make a prediction.

Silver has reasonably called out my 2010 writings, in which I mistakenly assumed that sigma_systematic was close to zero, i.e. much less than one percent. Today, my current approach is to estimate how sigma_movement varies and how big sigma_systematic is. Those estimates can then be used to make a November prediction.

To turn this all back to practicalities: PEC’s current approach is to suppose that the combination of sigma_movement and sigma_systematic can be learned from polling ups and downs in 2014, and analysis of past Election Eve poll snapshots. FiveThirtyEight’s approach is to use fundamentals to generate expectations for where 2014 “ought” to be. Implicitly, their assumptions for this year make the sum of these two quantities tilt slightly toward Republicans. They are probably not being purposely partisan – they just made assumptions that are a bit more biased than usual to favor one party.

Now, do the assumptions in our prediction add a bias? I think not: our core assumption is “the future will be like the recent past.” Of course, there could be something else. Commenters in yesterday’s thread started drilling into our methods and code in a constructive manner. That is a discussion worth having.

Finally, here is a great interactive: to see what the effects are of adding fundamentals to a model, over at the New York Times, The Upshot has provided a useful Make-Your-Own-Prediction online tool. Click “Polls Only” and see how their prediction changes. It is very instructive.

Tags: 2014 Election · Senate

42 Comments so far ↓

  • Gregory Primosch

    Prof Wang, does sigma_movement absorb the effect of black swan events, or are those handled elsewhere in the model?

    And related, when estimating sigma_movment, are you using a point estimator or is some attempt made to model the range of values from past elections?

    For example, if sigma_movement over the past three elections is {3, 4, 35}, do you just take the average (14), the median (4), or do you do some kind of monte carlo simulation where sigma movement is chosen randomly over many iterations.

    It seems that if one is just using a point estimator for the term that the average may be masking the true variability of the result, but that is just my layman’s intuition.

    • Sam Wang

      No, sigma_movement is purely empirical, taken from the snapshot data running from June 1 to today. It’s quite small this year, +/-0.7% if I recall correctly.

      The black-swan idea is represented in the sigma_systematic variable and the use of t-distributions, where the tail of the t-distribution represents unforeseen events that surpass the expectations of Gaussian statistics. Reid v. Angle is the quintessential example of such an event. Therefore your example would inform estimates of sigma_systematic. There I would probably go with the median (4). The reason is that while 35 is an obviously interesting data point, the phenomenon I want to capture is “sign errors,” i.e. wrong calls. I am much less focused on predicting specific win margins.

  • Amin

    I know the back-and-forth with Nate Silver is getting to be a bit too much, but I am just wondering if your statement that Nevada was “PEC’s only wrong Senate call in 2010 or 2012” is correct. Did PEC also miss Colorado?

    • Sam Wang

      That’s a good point. You mean this. With a margin of D+0.7%, I did say then that it would be a nailbiter. It’s another one where FiveThirtyEight also leaned the same way. Obviously, using the predictive approach I have now, the probability would be fairly close to the underside of 50% for Bennet, who won.

  • Craig Barber

    I think we’re all enjoying the debate this year!

    Given the many types of distributions available, don’t we get into an “overfitting” problem in which predictors could end up choosing distributions that are match wonderfully with past data but are just too specific to that data? OR is my question nonsense? :)

    BTW, how about a number indicating the % chance that Angus King and / or Greg Orman choose the majority? (Rather like the green bar in the daily snapshot but including King as well.) Possibly this is pointless because in a group of outcomes which all tend to be so close to 50 either way, it may be that two votes, one already elected, will have a huge chance of controlling the majority.

    • Sam Wang

      Overfitting is a big problem. I couldn’t tell you whether 538 degrees of freedom is too many. At PEC, there is basically no danger of overfitting, as far as I can tell. Our issue is the estimation of 1-2 key parameters – and whether there is directional drift that we’re somehow missing.

      King/Orman: that is getting into human factors. I agree it is critical, but it would contaminate the calculation with speculations. A better way to pose your request might be to ask what poll-based predictions could help us game that out. I think DailyKos Poll Explorer has some useful probabilities that speak to your thoughts.

  • Matt McIrvin

    Amusing that this whole fight began because PEC was showing Democratic Senate control as more likely than the other models did… but now the Senate polls are swinging in a more Republican direction, so in that sense it might all end up moot.

    • axt113

      Actually it seems like all the models are converging towards the race being a tossup.

      538 is now at about 54% for GOP, PEC is at 65% if election were held today, Upshot is at 58%, Huffpo is at 51%

      Electoral vote shows a 50-50 divide (assuming Orman caucuses with the Dems)

      Everyone is basically agreeing the race is going to be basically a coin flip

    • Joseph

      “… but now the Senate polls are swinging in a more Republican direction…”

      You mean as opposed to yesterday?

  • Avattoir

    Having been a long-time follower of Silver, I’m not happy with his behavior in this argument; but that aside, I see a positive side effect, in that it’s shouldered aside all other teams (being a small number of relatively less noisy academics mostly dedicated to pure polling, and a larger number of noisy swaggering commercial outfits), and with them their models — many of which are so obviously crappy or patently wrong, their participation would only promise to strip the argument of all benefit.

    The Wang Silver debate (which for that ought to be self-evident I prefer to, say, the Silver Wang duel) is between the best known academic aggregator and proponent of using nothing but polls OTOH, and on the other, the best-known commercial aggregator who admits to depending at all on special sauce.

    But is it just me, or is it the case that, among the commercial aggregators, Silver relies most heavily on polls and cuts back the most 0n the sauce?

    • Sam Wang

      I think we don’t have an easy way of knowing the answer to your question. Best might be to compare cumulative (June to now) poll medians with the FiveThirtyEight estimate, over and over as the campaign goes on. I don’t see a time-series on their site. I must say that for transparency, I really prefer The Upshot.

      If you look at the polls-plus-fundamentals outfits (Upshot, FiveThirtyEight), FiveThirtyEight has been an outlier.

  • ArcticStones

    I think the reason for Nate Silver’s virulent attack agains PEC (or, if you will, his attack against his misunderstood conception of them) is quite simple:

    Nate Silver has a brand name to defend.

    If Dr Sam Wang is more correct (once again!), then Mr Silver has a serious problem.

    • Joseph

      If that’s Mr. Silver’s rationale, then he would have been better served by just continuing to ignore Professor Wang. My guess is that Mr. Silver has taken things a little too personally.

  • Joseph

    Sam, I went over to The Upshot and played around a little. Here’s one eye-opener:

    If you select “polls only”, Kansas goes to 82% Democratic. But if you select “fundamentals only”, Kansas goes to 99% Republican!

    I’d love to hear your theory on why that happens….

    • Sam Wang

      Basically, The Upshot’s fundamentals-based model is almost certainly predicated on the idea of a D vs. R matchup. Under conventional conditions, that predicts Roberts +28% (hover your mouse over Kansas on their list, you’ll see). To state the obvious: the game changed two weeks ago when Chad Taylor dropped out. See my two New Yorker pieces on the subject.

  • Joseph

    On fundamentals:

    I’ve been trying to figure out why President Obama’s approval ratings have been going south lately. And I think it’s because the disapproval is growing on his left because of the President’s growing use of military might in the world arena.

    (Why is this relevant to a discussion on Senate races? Because the approval rating for the President is one of those “fundamentals” that impacts the poll aggregators like 538.)

    Here’s the question: When the President’s disapproval rating is coming from his left, how strongly is that going to impact the chances of his fellow Democrats running for the Senate? IMHO, the jury’s still out whether that’s a negative (because liberals won’t vote in “protest”), a positive (because the President wins some grudging respect from the right), or a non-issue.

    If it’s a positive or a non-issue, then any negative weighting because of dropping Presidential approval numbers is a mistake.

  • Mikey Z

    Thanks for this, Sam Wang. While I have been a big fan of Nate Silver since his days as Poblano, he has been consistently a little bearish on Dem chances compared to the results (including famously speculating as to whether or not Obama was toast early on in the 2012 cycle.) I agree with you that the “fundamentals” are pretty subjective, and hardly quantifiable. Polls have their own subjectivity. For instance, The Upshot’s “house effects”, when removed, gives the Dems a strong chance, as do counting recent polls. The GOP does better when using “non-traditional” (I.E. internet, etc) polls…

    Your approach seems to make the most sense to me. As flawed as polls can be, they are less flawed than matrices and speculation like voter intensity can be. Remember how disastrously wrong the likely voter model was for Gallup in 2012, for instance?

    It does seem like a real nail-biter, but as always, I expect the Dems to outperform the pundits’ predictions, as they consistently have every national election since 1992…

  • Tony Roberts

    In light of horrific poll for Gardner in Colorado today, I am shocked, though anything but disappointed, with your uptick for the Democrats to retain Senate control? Did you model adjustments account for the Q poll showing Udall trailing by ten; and do you know something about how the KS S.Ct. is about to rule on the ballot case?

  • Steven

    What happened today, for the 5:00 update, that caused the Meta-Margin to move +D by 1%? That seems like a large movement, perhaps the largest since the Taylor drop out. A significant pro-D poll(s) or just noise? I am mainly curious, because a swing by the same amount in the other direction would have had R’s with the positive meta-margin.

  • Froggy

    This post got me looking into the 2012 North Dakota Senate race, and why the predictions of 538 and PEC were so different.

    The basic numbers:
    RCP final (straight average of the last three polls): Berg +5.7
    538 prediction: Berg +5.6, Berg with 92% chance of winning
    PEC final numbers: Heitkamp +2.0 (or maybe 1.5?), Heitkamp with 75% chance of winning
    Actual result: Heitkamp +0.9

    First of all, the difference between 538 and PEC is not primarily because of consideration of non-polling data — 538 had the polling average as Berg +3.9, and the adjusted polling average as Berg +2.9. So how did two guys look at the same polls and come out with such different predictions? The answer is that they were not actually using the same polls.

    The polls on RCP and 538 are the same group of seven polls, five with Berg leading, one with Heitkamp narrowly leading, and one tie. The Pollster polls that PEC drew on included six of those polls (a Berg +7 poll from May apparently was missed), plus an additional eight polls, all of which showed Heitkamp leading. Six of these eight polls were sponsored by the Heitkamp campaign or other Democratic groups, and two were from Pharos Research Group. RCP and apparently 538 don’t count partisan-sponsored polls, and Nate Silver decided to exclude Pharos polls when they couldn’t provide him with enough information about methodology to give him comfort that they were a legitimate outfit. (Anyone else remember the Pharos controversy during that fall? I see that they still have the results posted on a website, but they don’t appear to have done any political polls since then.) It’s hard to draw conclusions from such a lightly-polled race, but in this instance it was four polls in mid to late October, two sponsored by one of the campaigns and two from a somewhat questionable pollster, that drove the PEC prediction for Heitkamp.

    But that’s not the weird part. It looks to me that under the current PEC methodology (two weeks worth of polls in October, and only the latest poll when a pollster has multiple polls) the PEC final median would have been Berg +2, which would have resulted in the wrong call.

    Which all goes to show what? That it’s better to include all the polls, and not exclude any? That it’s impossible to avoid arbitrary assumptions that will have a significant effect on the results sometimes? That it’s good to be lucky?

    • Sam Wang

      Could have been luck…but my recollection is that the all-polls median favored Heitkamp for most days during the season.

      As for excluding polls…with averaging you need better poll-curating-and-correcting hygiene. That requires so many judgements to be made. I never do that as you know.

  • A New Jersey Farmer

    With the Begich poll and the Kansas decision, the future will be different from the recent past. But that’s a good thing.

  • 538 Refugee

    Here is a ‘piece’ on the importance of sample size:

  • John Foelster

    Dr. Wang, have you ever tried projecting contemporary standards for fundamentals backwards in time to see if they are reliable?

    I have with statewide Cook PVI and the results are pretty hilarious.

    The most surreal, and easily verifiable, failure of the statewide Cook PVI metric is the ratings for 1982-1984.

    Basically, someone talking about “Fundamentals” of the same type in 1984 would have been swearing until the cows came home on election night that Mondale had the Solid South in the bag, and it was particularly impossible for Reagan to win D+16 Georgia.

  • David Kellogg

    Sam, one small question about Silver’s tweet that you actually got two wrong in 2010. He tweets that “Wang had Buck as a 92% favorite in Colorado.” Is that the snapshot/prediction difference, or was there an error in your memory?

  • Partha Neogy

    Prof. Wang,

    I’ve always been skeptical of and repelled by “fundamentals, likely voters adjustment, special sauce”, and other bags of tricks that pollsters use. I was a little disappointed by Nate Silver, whom I still like and respect, drifting in that direction.Idon’t know if your pristine methodology will turn out to be better. I hope it does. Regardless of who does better in predicting the results of the coming elections, I think your efforts to rid the process of “black magic” is immensely valuable.

    • Craigo

      Likely voter adjustments are a necessary evil. In most states turnout won’t exceed 60% in a presidential year. In midterms and off years it’s even worse. A simple survey of registered voters the night before the election is likely to be far more error-prone than the median of likely-voter screens.

  • Partha Neogy

    The recent and self-admitted travails of Gallup (still regarded as the premier polling organization by the unaware) point to the dangers of being seduced by the past successes of fudge factors and secret sauce. Much better to steer clear of all such black magic, make the best of all available hard information, and have a rational assessment of the success, and lack thereof, of a model based on such information.

    • 538 Refugee

      Quinnipiac has had a decent track record in the past but I’ve noticed this year they have some heavily Republican lean if you go by other poll results. This brings up the whole ‘rating the pollsters on past performance’ issue. Do you rate a pollster on their final number or take the entire season into account? Sentiment can change as the election cycle goes on so would that be fair? Their early numbers could have been correct for that moment in time. Their final number could be a ‘fudge’ to bring them back in line with the crowd.

  • Art Brown

    1) For this reader, your t-distribution link is broken (but the Gaussian works fine). 2) Watching your snapshot jump from 65% to 93% in one day, a 70% prediction looks rather brave. Comment?

    • Art Brown

      1) Thanks for fixing the link. 2) For the record, I think I now understand why a snapshot fluctuation from 65% to 93% is perfectly consistent with the model behind your 70% prediction.

  • Craigo

    Whoa, huge movement in the Meta-Margin. 1.9% is as high as I’ve seen it.

  • Phil Drum

    Sam, I am wondering about the Orman caucus issue – I have heard he will caucus with other Independents (currently with the Dems) and that he will caucus with the majority – what if it is GOP 50 – Dems 49? In that case – which appears the most likely – he gives the majority to whom he chooses – anyone have a sense what he will do then?

    • Daddyoyo

      According to the Snapshot above, a 50 D (including King) to 49 R is the most likely outcome, so by the rule of going with the majority, Orman would caucus with the Dems. I do think that it is more relevant to look at policy positions. Orman is to the left of every Republican Senator and to the left of the most conservative Democratic Senator, Joe Manchin

  • jory

    as a longtime reader of both professor wang and andrew gelman, i found his (not deeply technical) take on the nate vs sam modeling approach worth a quick read:

Leave a Reply to Joseph (Cancel)