Princeton Election Consortium

A first draft of electoral history. Since 2004

Hidden errors, overconfident pollsters?

November 3rd, 2014, 10:05am by Sam Wang


I am thinking about how to get the most accurate last-minute snapshots of races, and how to turn that into a scorecard for you (and me) to use on Election Night. I’m also thinking about Brier scores as a means of evaluating the various prognosticators, including me.

In the meantime, here’s your morning reading: an excellent analysis by David Rothschild, Sharad Goel, and Houshmand Shirani-Mehr on the problems endemic to current polling practice. They analyze 2012 in great detail, and identify errors that go well beyond sampling error. Because these errors are unlikely to have been fully corrected this year, they think there’s a good chance that Democrats will outperform poll aggregates. In other words, all poll aggregators, including PEC, might carry a hidden bias. My own view is that based on historical data, errors have gone in either direction by several percentage points across the board. Go read their article!

Tags: 2014 Election · Senate

40 Comments so far ↓

  • Amitabh Lath

    I agree that pollsters do not report uncertainties correctly. But they do weight the various sub-groups they contact to create a total sample that they believe looks like the voting population. In other words, they do (try to) correct for Coverage and Non-response errors.

    However, weighting introduces error, and since these corrections are basically subjective — a gut feeling for what group is going to vote or not, and at what level — this is where the biases creep in.

    • Sam Wang

      I think it is a little unfair to call them gut feelings. It is a problem in parameter estimation, which is a challenge.

      Pollsters are by nature a (technically) conservative lot…I think it’s hard to have the nerve to fully correct for these biases. I am of the opinion that not everyone has done so.

    • Amitabh Lath

      I didn’t mean anything pejorative by “gut feeling”. It’s a fact that estimating systematic uncertainties is more art than science.

    • Glenn Reider

      Modeling requires working backwards from numbers derived from the last election cycle (mid-term or general). Obviously the ratios of party affiliation won’t bend past a certain point, but how to what bending there is? Would it be fair to call these models another type of “fundamentals?”

  • Alan Koczela

    Dr. Wang,

    In 2012, despite all the problems the article mentions, your model performed virtually flawlessly. It’s hard to imagine that pollsters have not attempted to improve their methodologies for this cycle. If they have, this would suggest that your model is even more accurate. Your model strongly suggests R will win the Senate tomorrow. It’s not certain and the estimates are subject to significant margins, but so were the model’s 2012 predictions. As for me, I prefer to leave with the gal I brought to the cotillion. The model predicts a R takeover of the Senate and I believe it. Now that the model shows a result many don’t want, talking about polling error and under-perform/over-perform smacks of magical thinking and desperation.

    • Sam Wang

      I am against magical thinking. However, inspect this table and tell me there won’t be inaccurate individual races this year. Anything within 2 percentage points is really up for grabs.

    • Amitabh Lath

      There are two things to look at: the value of the prediction, and the associated uncertainty. It is a mistake to look at just the former without considering the latter.

      In 2012 the prediction was very far away from 50/50, much beyond the (rather generous) uncertainty estimate. So some small movement well within the uncertainty bands would not change who won the election.

      In 2014, the prediction is basically right on 50/50. The small deviations of < 1 point are small compared to the attendant uncertainties. Small movements well within the uncertainty band could take one back and forth over the line.

    • bks

      While there may be error in the error estimates (and scientists are just as guilty of that as whichever straw man we’re attacking today), the approach of simply accepting the polls as data that needs no deconvolution has proved remarkably powerful. –bks

    • Alan Koczela

      Ok, let’s play the game. Looking at the direction of the bonus, ignoring the magnitude and only looking at the races people are concerned about:

      State 04 08 10

      NC R D — ?
      AK R R –; bad news Begich
      KY D D –; bad news McConnell
      CO R — D; ?
      GA — D — ; bad news Purdue
      LA — R –; bad news Landrieu
      NH — R –; bad news Shaheen
      WV — — D, bad news Capito

      No one expects McConnell or Capito to lose. GA and LA look like a runoff, which is anybody’s guess. In CO and NC, no pattern on which side gets the bonus. (I would argue no pattern in NH, but strictly speaking Shaheen looks to be on the wrong side of it.) If there is a pattern, it seems to hurt Begich.

      Again, I would argue that I don’t see any pattern in the state data and any apparent pattern doesn’t seem to help the D candidate in the really close races.

    • Edward G. Talbot

      The model was off in 2012 – Wang & Ferguson had a Brier score of .00761, not 0.0.

      And that was for the presidential election; the score was surely higher for the Senate races. In any case, the difference this year is that with all the close races, the model being off by that small amount might have a bigger impact than in other years. If just 2 races are each off by 1% from what’s currently in the Power of Your Vote graphic, the Dems keep the Senate.

      I certainly believe the model when it tells me the probability of the Republicans capturing the Senate is higher than that of the Dems holding. But it wouldn’t even take a nationwide bias for the Dems to win, just a couple of deviations from the polls. Talking about polling error in a race this close is not only not magical thinking, it is in fact a smart thing to do. Put another way – if all the close races were exactly reversed in their polling averages, most of us would be having a pretty similar discussion about the nuts and bolts. The media would be spinning it differently, but that’s the media.

  • Canadian fan

    I had read this article earlier today, and it’s quite a cold shower for pollster complacents, if ever such a thing still manages to exist. The authors have taken various polling methodologies, analyzed them one by one, and revealed an astonishing degree of vulnerability and potentiality for significant misses. The general thrust of the argument is that pollsters are frantically trying to play catch-up with an increasingly technologically mobile public that has left behind land-line phone usage as a primary source of contact, and by revealing the growing prevalence of inter-state mobility among cel-phone users – to state only two factors. The result of this is that large sectors of the voting public tend to be skipped from polling surveys, sectors that tend to be
    disproportionately Democratic.

  • Donny

    The article kinda mentions a lot of the points I’ve been trying to make. It was a good read. Election night could be exciting as the results come in.

  • Dan Nexon

    One question is the degree that the LV models in 2012 were confounded by superior Democratic get-out-the-vote exercises or simply by poor assumptions. I’m nervous about using 2012 as a baseline precisely for the latter reason. While the DSCC and allies boast of a major effort in 2014, it isn’t clear that the effect will be as large. If it wasn’t down to party mobilization but rather LV models, do we know if the bias is likely equivalent or in the same direction in 2014? Not at all clear.

  • Carol Kerner

    Sam,

    Thanks for this great site. I have a simple question, sorry if I missed it previously. Your model now predicts 51 Rs, 49D&I. Which side are you putting Orman on? As he’s said he’ll caucus with Rs if they have majority, isn’t a 51 R prediction effectively a 52 R &I prediction?

  • Glenn Reider

    All the weighting in the world will fail if those who screen their calls and refuse to participate in polling are both a) D leaning and b) a growing share of the electorate every two years. Data going back to the 90’s or even early aughts is not helpful if this is the case.

    • Alex S.

      I think that the Obama coalition is especially vulnerable to these errors. From the start, it was the goal of the Obama team to expand the electorate, to get the less interested people to vote. I could imagine that these factors would be different in a, say, Rand Paul vs. Hillary Clinton race.

    • Glenn Reider

      I agree, and if so, then Obama’s approval rating might not be what is being reported by polls, either.

  • Davey

    Sam – I took the HuffPost article (great read) to indicate that they suspect a trend is emerging from the noise of poll bias, in which results favor Democrats over predictions. You and Mr. Silver have both cautioned on this hypothesis, showing the 20-year history of unpredictable bias swings. What sort of bias error in this election, and perhaps in the next few cycles, would raise a red flag that we have entered a period where we are seeing a trend in overestimating one side over the other? If Democrats overperform in this election, would that do it, or would we write the whole election off as an outlier? And is there a threshold (ie, >D +2.5%) that you feel would be a clear sign to pollsters that they need to rethink methodology to adjust for the sort of factors discussed in the huffpost piece?

    • Sam Wang

      You ask a good question. I think it could go either way. At the same time…they were fairly compelling.

      I’d say a D+2% bonus would be enough for me to start asking whether the bias is persistent. In fairness to pollsters, I am not sure what they should do. They are a technically conservative lot, and cautious about crazy corrections. They don’t necessarily want to be putting a finger on the scale too hard!

  • Reason

    Good afternoon Dr. Wang,

    Thought I would check in with you again as it is election time. I enjoy your site very much. As a Virginian, I do question the validity of polls now. In last years Governor’s election here, almost all polls had McAuliffe consistently ahead by an average of 7 points. He won with just over 2%. Does the so called “Republican wave” in most polls show a clear bias from one party as opposed to another? It was found out that the reason the race was so close here, was because most polling was down in heavily Democratic NoVa. So as you stated earlier, it really is anybody’s race at this point.

    • Donny

      YES. This is what I believe is happening in KY’s Senate polls, only in reverse. They are polling more conservative parts of the state instead of the random pockets of STRONG Democratic support. On election day, I maintain McConnell’s margin of victory will be smaller than expected if he wins at all.

  • Scott Tetrick

    As to systematic errors, I would think that they are less likely. If I were in the business of doing polls, wouldn’t the best thing for by business be to have races so close that they need more polls? As a result, I’d try to get a situation like we have this year, with many races within the “margin of error.” That also give me as a pollster cover should I be proven wrong (since the stupid media will use the uninformed tag line “statistical dead heat”).

    While we may hope as dedicated PEC readers that the median counters my cynical business effect, it could be magnified even more this year since the past 2012 numbers by many pollsters were so far off.

    • Davey

      Scott, your comment made me consider a slightly less corrupt possibility. The incentive might not be in skewing toward a tie, but skewing toward the pack (poll herding). Let’s take our Ernst +7 poll – it’s an outlier. But the pollster is reliable, and well-experienced in sampling Iowa. It could be a perfectly valid poll. But it would seem that it would be in the pollster’s interest to fudge toward the rest. If Braley wins tomorrow, that pollster is going to get some rude responses, lol. But if they’d put out something closer to where everyone else is, they’d have the protective cover of “everyone was off.”

      Of course, herding is un-scientific and would never be admitted to, but as 538’s poll of pollsters showed this season, most pollsters say they would never do this sort of stuff, but assume most others do.

      Humans are funny. Polls are the science of identifying reality over conventional wisdom. If we allow out results to slide toward conventional wisdom, we may as well have taken the Dems and Reps money and spent the past three months on a beach blogging “thanks for the Million$, the election is a tie.”

  • fred Neuman

    It seems that everyone following this blog is rooting for the Democrats (including me) Why is that?

    • Jim

      Because people here are self-selected, and likely pretty well-educated, interested in data, science and facts and thus progressive.

      And for a progressive, the Democrats are the only game in town, however flawed.

    • Sam Wang

      I got serious in 2008-12, big years for D’s.

      For the record, I did call the 2010 GOP takeover of the House. Readership was low that year.

      I often filter partisan comments on both sides. Some get through.

    • Sam Wang

      I would not put it that way. Republicans are perfectly able to be well-educated, and are interested in data and facts. The Republican Party is currently hostile to science in many domains. But I don’t think that is the point, exactly. There are plenty of evidence-based sites that cater to GOP audiences, including Red Racing Horses, RealClearPolitics, ElectionProjection.com, and these days, FiveThirtyEight to an extent. All of them are honest with data (though every now and then, I think RCP might have too many opinions about which data they let into their database).

      Instead I would point to simpler demographic points: natural scientists and social scientists tilt liberal. And as I wrote before, I started doing this in 2008-2012, good years for Democrats. Finally, my choice of topics often appeals to Democrats. To paraphrase Mitt Romney George W. Bush, “you call them liberals. I call you my base.”

    • Gregory Long

      I think, “you call them liberals. I call you my base,” is paraphrasing George W. Bush, not Mitt Romney.

    • Brian

      “cater to GOP audiences…FiveThirtyEight to an extent.”

      If you think 538 with its D cheerleading in every article is catering to Republicans, it just shows how thoroughly Democratic your own milieu and the objective data-oriented science fan culture is. Honest Republicans may find 538 the very most D site they can stand to read but it isn’t appealing to them.

    • Sam Wang

      I actually think that for business reasons, they have to cater to both sides. Successfully or not, dunno.

  • Sam Wang

    I wrote about this in 2008. http://election.princeton.edu/2008/11/12/the-exuberance-of-likelier-voters/ Two problems with his analysis: he should have focused on close races because of possible problems in LV screens in extreme situations. Second, recent technical errors might be different. See Rothschild/Goel piece.

  • RAJensen

    In 2012 the likely voter models were all wrong. Real Politics, another poll aggregator, had Obama +0.7. Obama won +3.9. The final Gallup poll had Romney +1. Gallup was the only pollster who published both likely voter and registered voter results. Gallup had Romney +1 among likely voters but had Obama +3 among registered voters. Gallup’s registered voter data nailed it. Likely voter models had +3 GOP bias and since current polls have the GOP in tossup states of +1- to +3 points it’s a dead heat and the GOP has to sweep all the tossups and not lose any GOP held seats

    http://www.gallup.com/poll/158519/romney-obama-gallup-final-election-survey.aspx

    http://www.realclearpolitics.com/epolls/2012/president/us/general_election_romney_vs_obama-1171.html

  • Sherean

    Dr. Wang –
    Does the composition of early voters track with who votes on election day? Here in Georgia, early voters are 32% African American – a couple of points higher than what pollsters have modeled. I’m curious if early voters are a different group of folks (more D-leaning perhaps) than those who show up on election day. Is there any data on this?

    Thanks for great data analysis.

  • 538 Refugee

    Will the final prediction article feature some fuzzy dice? Obama’s trended up just slightly. Things are a little better and I think people realize it can’t all be laid at his feet. Maybe they are coming to grips with the last time the Reps held both houses? Maybe they are preparing to vote Dem?

    I don’t know if it is my gut or my hope, or both, but I think the Dem’s hold on proving that teeth do have skin. I do take pollster record into account and I think this gives me hope for a few of the close races. This is more of a vague gauge based on things I’ve read about various ones more than an actual analysis though. How about a quick little thread allowing us to go ‘on record’ with our predictions?

  • Don

    Dr. Wang,

    There seem to be a few metrics to consider when judging how well election prediction models perform. The Brier score rewards being correct and high confidence, conversely hower it does not punish overconfidence.

    A more complete scorecard would look at how often predictions fell within margins of error. A system that states predictions with 95% confidence that is right 99% of the time should be dinged for underconfidence and a system that states that an outcome that occurs was 99.99% unlikely should be dinged as well. The best score should go to the model that is wrong about 1 out of 20 times for their 95% confidence prediciton; the farther off either way the less precise the model actually is (even if it is accurate).

    Another score to follow is testing your early hypothesis that polling, even fairly early in the game, is more predictive than “fundamentals” … the measure there is how much a model’s early prediction performed against either actual results or eve of election predictions. In this cycle it seems that fundamentals have been more predictive than early polling as the fundamentals heavier models changed less over time.

    Thank you for your work on this site and your informed data-driven analysis.

  • Olav Grinde

    Dr Wang, I find it fascinating that prof. Michael McDonald’s take on the 2014 mid-term election – based on what is known about at least 17.4 million Americans who have actually already voted, rather than polls – seems to fully coincide with your own analysis:

    “My take on the early vote data … is that the Republican sweep screaming in the headlines is overblown. Senate control is up for grabs and Democrats have a decent chance to defy the polls. I expect that the election will be so close that we won’t know who won until all ballots are counted and the vote is certified several days following the election, not to mention highly probable run-off elections in Georgia and Louisiana.”

  • A New Jersey Farmer

    I’ve just been watching the data and reading the fascinating comments here for the past week.

    On the eve of the election, only one candidate in all of the contested (+-3) states, Gardner, is polling above 50% (though I don’t see how as his numbers have been in the 40s for some time), which, to me, is another piece of evidence that could fit in the Democrats-underperform-in-polls narrative. Clearly, there are not a great deal of undecideds, but there are enough that could still break left or for the Democratic incumbent to make this a 50-50 result.

    In the end, it’s been a wonderful ride for the past few months and I’m looking forward to both the political and the intellectual results. You can bet that there will be a lot of space devoted to who called the election correctly and I’ll put my confidence in Dr. Wang.

  • JayBoy2k

    Sam,
    Much better answer.
    I grew up in New England, got a degree in CompSci from RPI, excelled at IBM in pure technical role and never considered that I had a problem with data, science, facts.
    I am not Liberal, Progressive, or Democrat.
    I am attracted to sites that provide interesting analysis of Polls and Probabilities from either side of the political chasm. It helps me to expand my understanding of a broader slice of America and gets me out of an echo chamber.
    I also used to watch MSNBC every night and probably will watch their election coverage tomorrow.

Leave a Comment