Princeton Election Consortium

A first draft of electoral history. Since 2004

What constitutes a good poll?

September 6th, 2016, 1:55pm by Sam Wang


Update: Natalie Jackson from HuffPost’s pollster.com responds at the end of this post. Again I thank her for all that Pollster.com does.

Holidays are over. I see that journalists, including poll aggregators, are still focused on the Presidential horserace. As Zenger at Electoral-Vote.com has pointed out, sites such as FiveThirtyEight are under economic pressure to attract traffic. And there is nothing to attract eyeballs like a crazy Presidential race. Still, from a substantive standpoint, it might be more appropriate to spend efforts on, I dunno…issues? See this excellent critique of media coverage by Jeff Jarvis, which includes a good hard whack at the media obsession with “balance” and polls – basically, tricks to let reporters escape engaging head-on with substantive issues. If journalists insist on horserace coverage, at least focus on downticket races in Senate, House, and even state legislatures – and maybe write about some issues along the way. These races will determine the power dynamic in 2017 under the new President, whoever she may be.

Just to remind everyone, variations in this year’s race are quite narrow, consistent with the  last 20 years of partisan polarization. Polarization has made both the GOP and Democratic nominees unacceptable to nearly all supporters of the other party. In addition, Donald Trump is radioactive to about one-fifth of his own party. As a result, this year’s race is full of melodrama, but numerically stable. In 2016, the Princeton Election Consortium’s state poll-based aggregate has only varied between a median outcome of 310 and 350 EV for Hillary Clinton.

The Meta-Margin, which is defined as the front-runner’s effective lead using Electoral College mechanisms, is a very low-noise and stable measure – as opposed to single polls, which can be all over the place. You should generally ignore single polls, especially ones that surprise you. The Meta-Margin has varied between Clinton +2.5% and Clinton +6.5%, and is now at Clinton +4.0%, close to the season average of 3.8%. If it left the 2.5-6.5% range, that would be interesting. That has not occurred yet.

>>>

Now, a brief note about the virtues of casting the widest net possible when it comes to polls.

As in past elections, we rely on the good people at Huffington Post’s poll-aggregation operation for our polling feed. In 2008-2014, their quality control consisted of making sure a survey met disclosure standards such as those set by AAPOR. They also provide readers with tools to include/exclude polls by checking/unchecking boxes. I find the combined approach to be democratizing – and one of the great virtues of HuffPollster. It is also well-suited to the analysis approach used here.

This year they are being more aggressive. They are excluding from their database polls that are done on landlines only, as well as some Interactive Voice Response (IVR) surveys (“robo-polls”). The rationale is that these categories of poll are less reliable than newer methods. PEC’s current approach for data scraping limits us to whatever they provide, so this effectively excludes these polls from our calculation.

I would have preferred for them to continue to include the maximum diversity of polls, and then set their default checkboxes to leave out landlines and IVR surveys from their own averaging procedure. That would take into account their professional expertise, but would still allow users to have a single data source. The resulting feed would make them the unambiguous preferred choice over other sources such as RealClearPolitics and FiveThirtyEight.

Natalie Jackson of Pollster.com responds regarding IVR surveys: We are checking each IVR poll to determine whether it’s all landline or if it includes some cell and/or online panels. If they are all landline, we do not include it. If there’s some effort to account for cell users by supplementing with live callers and/or a web sample, we will include it.

I should say at this point that I had formed the impression – apparently false – that they were expanding their “no-landline-surveys” policy to also exclude all IVR surveys. Her response clarifies that this is not the case. I withdraw my concern!

Tags: 2016 Election · President

54 Comments so far ↓

  • AAF

    In prior elections, the main poll analysis sites focused on electoral votes predictions or snapshots.

    That’s where all the attention was, and rightly so. EV predictions are far more interesting and give so much more to talk about than the entirely abstract, and arguably meaningless, question of whether Hillary has a 72% chance or an 83% chance – a question for which, by its nature, there will never be a moment where the “right answer” is revealed.

    But this year, 538 trumpets the predicted percent chance of winning and makes the EV prediction a tiny afterthought, without even a graph tracking the changes in that prediction over time.

    Drew Linzer, whose claim to fame was that his Votamatic site showed a spectacularly steady EV prediction throughout 2012, now has a splashy big percent chance graph on Dailykos. There is also a chart which mentions the day’s EV prediction but doesn’t even offer a graph of the moves in his EV prediction over time.

    NYTimes’ Upshot doesn’t even offer an EV prediction or snapshot at all – just a histogram way at the bottom, where you can eyeball what their prediction might be (this is, of course, in addition to their huge percent chance headline and graph) And every day they compare the other sites’ stated percent chances, while never looking at anyone’s EV estimates.

    Any thoughts about why?

    • Commentor77

      It’s almost like those sites are competing with each other for clicks rather than focusing on the statistics….

      I’d like to see more pollsters ask the “who do you think will win” question such that we can get averages and see how closely it tracks the actual result over the course of the election.

  • Amitabh Lath

    I am not qualified to speak to poll quality but all things being equal I would tend to favor the “include them all” argument, as Sam does.

    But. Several years ago I corresponded with Charles Franklin of pollster.com, back when he was still at UW and Huffpost had not swallowed up the site.

    He went into some detail on likely voter filters and the biases they can introduce. He struck me as smart and sensible, not someone who would put up filters without due consideration.

  • Ted

    Hi Sam,
    I think this is unfair. It seems to me that the Huffington Post is simply trying to provide the most accurate aggregate available without confusing those who don’t necessarily understand the importance of mode. Many past studies have found that the polls that most accurately predict an election include live interviews/cellphone samples. Isn’t that what we all want? An informed and accurate measure of the election?

    • Sam Wang

      I disagree with your assessment that I am being unfair. HuffPollster provides a valuable service, and they do so without charge. Therefore they decide what they include. But I also get to have an opinion.

      I am okay with the fact that a respected professional is critical of landline surveys, and they have made their argument at length on their site. But I also think that all survey methods have problems. My own view is that it is useful to the broadest audience to have all the data available, and then have a mechanism to filter out various types of polls.

  • Michael Tiemann

    Sam, you said “Deferring to pollster judgment is very much in the spirit of what this site has done in the past. However, I admit that this approach is not systematic. I am torn about what to do, since a result midway between 2-way and 4-way matchups should get pretty close to the final result. The path of least resistance is to leave things as they are…”

    I wonder if now is the time to expand beyond statistics and consider systems-level effects. Social choice theory, as I read it, teaches that the actual mechanism for choice (such as successive pair-wise contests) greatly affects ultimate outcome. But it leaves open the question “what actually happens when four candidates appear on a ballot?” If individuals themselves make a series of pairwise decisions to cast their choice, then Arrow’s and Sen’s work might give a clue as to how the aggregation of these individual choices result in an actual social choice.

    Now, one thing you can do with statistics is to measure the variation between the predicted outcome of a series of two-way contests and the predicted outcome of the four-way contest. That, in and of itself, would be a contribution to the field. And it would not force you to choose between the two approaches, as it would be an effect you are measuring.

  • Allan

    “whoever she may be.”

    Nice.

  • AAF

    Do you have a way to spot-check how different the meta-margin and electoral vote count would be if all of those other polls were included on a given day, or is that prohibitively complicated?

    I think some sources have said that during this election cycle the landline only polls are less favorable to Hillary. If that’s systematically true then this isn’t just a case of missing a few random data sources, but actually has a long-term biasing effect. Do you know how big that effect is?

    • Josh

      I’m curious as to what these sources are? My recollection is that since landline-only data tends to come from older people with houses it actually skews right, not left. Is there a good reason this year would be different?

  • Eural Joiner

    Just a thank you, Sam, for all of your work on this site. I use it as a regular class ending activity in the high school I teach at and was happily surprised when a student who I thought wasn’t paying attention for the past few weeks spoke up. He had been arguing with a friend over the CNN poll the other day showing a Trump lead and used your site and data as his counterpoint. Now that’s some team teaching!

  • Joeff

    How do today’s numbers compare w 2008 and 2012?

  • Olav Grinde

    In my humble opinion, a “good poll” is not only dependent on sample size and sound methodology – but also on the intention of the pollster!

    Sam, without necessarily naming names, do you ever feel reason to believe that some pollster are (at least partially) in the business of creating a narrative, rather than trying to accurately measure public opinion?

  • Eric

    For those interested in why Pollster (and thus PEC) doesn’t include the Reuters/Ipsos 50 State Tracking polling, I e-mailed Pollster and got the following response:

    “We only include polls that provide full releases. Right now, Ipsos/Reuters does not provide their poll results at the state level in a detailed release form. It goes against our policy to include polls that aren’t fully transparent– that’s why we’re not able to include them right now. But, Ipsos has plans to start releasing state-level data in a full detailed release format in the coming weeks. Once they start doing that, we will add them to our charts.”

    Note that 538 includes these already (NYTimes does not). I can’t tell with DailyKos, but I wonder if some of the divergence between state expectation probabilities are driven by this and not the more nitty gritty of the methodological choices.

  • Rick Howard

    Sure, no one poll by itself means much, nor no one polling service by itself means much. Yet I can’t help but be surprised by the Washington Post/Survey Monkey results from surveys in all states. Its results are atypical. It shows, for example, Clinton leading by only 2 in Wisconsin but UP by 1 in TEXAS.

    It covers surveys performed from August 9 through September 1 but is there any other reason why it produces atypical results?

    • Sam Wang

      Generally it is correlated with other estimates, with some exceptions. TX, AZ, and OH are the notable outliers. It’s probably fine as data and an impressive effort, but it’s still one poll per state. See Brendan Nyhan’s graph:

    • Matt McIrvin

      These polls won’t do much to sway people who are convinced Clinton is rapidly cratering, since they have been in the field so long. I just got in a big argument with someone who is convinced the CNN/ORC poll showing Trump up 2 is the only legit picture of the race, and extrapolates from that to say Trump must be winning Pennsylvania.

    • Kevin O'Connell

      It’s interesting that you cite three evident outliers of the Post/Survey Monkey results.

      It’s probably important to remind people of the 95% standard, that even solid polls acknowledge that even if they did everything right, 5% of the time they’ll be wrong. Well, 5% of 50 states is 2.5, so yeah, we’d expect 2-3 to look weird.

  • anonymous

    Sure, makes sense, and I will send the email. But I wonder, did you try to suggest this offline and meet resistance from Huffpo before writing this post? If so, our emails may not yield much more success.

  • Priest

    What swung Vermont to even? The new WaPo poll has it Clinton 56-28.

    • tt

      Appears to be a data transcription error. Current version of 2016_StatePolls.csv (row 400 of the file) has Trump 45 Clinton 24 in a four-way race that includes Johnson and Stein. The actual result is Clinton 45 Trump 24.

      Given this is one of only three polls for VT in the file (one of the other two is the two-way race version of the same poll), presumably this has a dramatic distorting effect.

    • Sam Wang

      Now that is A-plus sleuthing. Thank you, tt!

      tt was looking at this file. The transcription error is now gone.

    • Adam

      That has to be a glitch. I have no idea what’s going on that would have cause VT as a +0.5…

    • tt

      Welcome. Easy catch; when reasonably mature code produces a result that weird, suspicion falls on the inputs.

      Incidentally, this made me wonder: when you’ve got two versions of the same poll (in this case two-person vs. four-person race), does the model simply choose one based on ballot access (as would be consistent with your explanatory note concerning inclusion of Nader in 2004?)

      Thanks for what you’ve done here.

    • Sam Wang

      This is challenging for a few reasons. First, ballot access is variable from state to state, and also changes over time. Second, minor-party candidates typically fade in the home stretch. Third, with Trump’s ultrahigh negatives there is some question about whether Johnson/Stein/McMullin support will in fact fade.

      If a pollster asked the question with different lineups, we currently prefer two-candidate data i.e. Clinton v. Trump). We use three-or-more-candidate when that’s the only question that a pollster asked. I think (for now) that this approach captures the probable endpoint fairly well. It is admittedly an informed guess.

    • tt

      This makes sense. Thanks for the explanation and reasoning.

      Just to nitpick, with apologies: in the case of the corrected error discussed above, you appear to be using the version of the poll with the greater number of candidates (I infer this b/c that is the version of the poll that was incorrectly transcribed in the .csv file; link to the poll is below). In this specific case, are both versions of the Post/SM poll in VT being considered at once? If so is this unusual?

      Not trying to troll or give you a hard time; just interested in your thought process.

      https://www.washingtonpost.com/apps/g/page/politics/washington-post-surveymonkey-50-state-poll/2086/

    • Sam Wang

      Hmmm, time to pick over the Python scraping code. Let me get back to you…

    • Sam Wang

      Here is what we do. My developer and I have talked about forcing the data to the 2-way matchups, but we have not implemented it. Instead, at present we choose whatever the pollster thought should be reported first. Details, all from that script:

      We scrape all polls that are available from a file whose name follows the format http://elections.huffingtonpost.com/pollster/api/polls.xml?question=16-%s-Pres-GE%%20TrumpvClinton&page=%s . We delete multiple entries as defined in the function drop_overlapping_polls, where “multiple entry” is defined by first sorting the data by pollster, start date, and end date and then dropping polls that are (a) from the same pollster and (b) have an end date that comes after the most recently encountered start date. Since Clinton/Trump and Clinton/Trump/plus results would have the same start and end dates, whichever comes first in the database is accepted.

      This means that we are currently at the mercy of whatever order the data were entered. This should match the pollster’s preference. As an example, in the Ohio feed this order varies: SurveyMonkey listed the 4-way result first, while PPP listed the 2-way result first.

      Deferring to pollster judgment is very much in the spirit of what this site has done in the past. However, I admit that this approach is not systematic. I am torn about what to do, since a result midway between 2-way and 4-way matchups should get pretty close to the final result. The path of least resistance is to leave things as they are…

    • tt

      Fascinating. Agree that juice per squeeze suggests just leaving it is probably best. Going with pollsters’ preferences may indeed capture a sliver of extra information vis-a-vis choosing at random. Perhaps one day an enterprising graduate student will mine the historical data and tell us the answer.

      I had (mis)understood that all the data in file 2016_StatePolls.csv were entering into the computation, but from the dates and repetitions therein I see this must not be the case. I am woefully under-trained in python.

      Thanks again for the explanations and your work here.

    • Sam Wang

      OK, full confession – I myself speak Python with a heavy accent. The code did the reverse of what I claimed – it took the last result reported in a single day’s release! I have fixed it now to do as I describe above. The most apparent effect is that it changes the current Arizona median from Trump +2% to Clinton +1%. Either way, that is a very narrow margin.

      Thank you for your vigilance. Also thanks to Erik Pescara, who has emailed me on this topic.

    • tt

      Thanks! This also accords with what we were observing with Vermont as well (the problematic entry was the last one for that particular poll, and fixing it fixed the computation for that state). Once again I commend you and colleagues for your commitment to quality and transparency.

  • Randy Haugen

    I thought landlines were more or just as accurate as online polls?

    • Jason Anastas

      It’s a weird standard. The idea is that pure landline polls under-sample some populations because there are now so many Americans who use cellphones exclusively. But are online polls so reliable? Still a relatively new field.

  • Michael

    Hi Sam. I know you don’t aggregate national polling numbers, but, at the moment at least, it appears that there’s a discrepancy between those national numbers and state polling numbers. Does it seem to you that this is the case, and if so, do you have any thoughts as to why it may be so? Thanks.

  • George

    When I look at HuffPost pollster, it seems the default is to include those IVR and landline polls, so confused by your assertion that they are screening them out. Maybe what looks like “default” to me really isn’t?

  • SpecialNewb

    You were a lagging inducator in 2014. You had the dems chances better for longer than the other polling sites before you changed to match them. This resulted in a lot of ‘calm down Sam says we’re good’ which is not your fault but also makes me discount the value of your nargins if you’re late to the party.

    • Phoenix Woman

      Got evidence?

    • TJ Baker

      PEC was not ‘late to the party’, but rather, it seems you simply don’t understand how PEC works.

      The %’s you see are not randomly produced numbers ‘presented’ by Sam or anyone else, they are calculated data.

    • anonymous

      As TJ Baker says, the predictions come out of a consistent open source model, but Sam also explained the reasons behind the 2014 prediction difficulties somewhere in the comments section of a previous post. I remember it as being due to both incorporating trends based on presidential year elections into the prediction and the polls themselves having a bias.

    • Amitabh Lath

      SpecialN, I remember that race well. Sam was not “lagging” in 2014 and did not make any substantial changes in his methods. The PEC house aggregate took a sharp turn towards R in the 3rd week of September, and stayed there. We discussed it in the comments section and there didn’t seem to be any proximate cause, but it was a real effect, almost a step function. I think that week is when a majority of LV made up their minds.

      There were a lot of people who base their predictions on “fundamentals” and predicted an R win in August or so. I would say they were getting ahead of the data. But even a broken clock is right twice a day.

    • Sam Wang

      As Amit says, the PEC aggregate moved toward Republicans in September 2014, just like everyone’s. There is no particular reason for any aggregator to fall much behind when it comes to a snapshot or a random drift-based model.

      It is only the Bayesian case that may be different, and even then, that’s only if the prior is strong. See my recent Senate post in which I describe an appropriate prior for 2014 and 2016. Note that my prior is fairly weak, and it doesn’t do that much at this point in the campaign season.

    • Commentor77

      Sam, isn’t the presidential model more robust than the senate model? There are more polls and we are looking at the combination all possible outcomes over 50 states rather than 50 independent outcomes. Forgive me if my jargon is incorrect.

    • Sam Wang

      Both calculations include all possible outcomes – they use the same algorithm. However, it is true that Senate races could use more polls.

    • anonymous

      Found it, it was in this post – http://election.princeton.edu/2016/05/05/looking-back-on-the-primaries-did-data-journalism-really-lose/

      To quote Sam: “I assumed symmetric random drift from August/September onward, which is mostly true…in Presidential years. Basically I think what is needed is a version of the Wlezien/Erikson data set for off-year elections. I hesitate to using such data to construct a prior…but it would be necessary for making a good prediction.”

    • anonymous

      Also, as Sam explained in his Senate forecast post (http://election.princeton.edu/2016/08/29/the-2016-senate-forecast/), the 2016 Senate prediction model is better than the 2014 version. There is nothing Sam can do about systematic bias in the polls though, the pollsters need to figure that one out.

    • James Wimberley

      I love your parapractical “nargin” . A margin that takes you into the nervous zone, presumably. 2% will do.

  • Richard Resnik

    Mr. Wang-like many Americans I have become (unhealthfully) obsessed by polling data. For those of us of a certain age (70) and to whom anything involving math and numbers was always a great challenge, could you explain the wide discrepancy in likely outcome between PEC, NYT Upshot and 538. Thanks and have enjoyed your website and related blogs. Richard Resnik

    • Amitabh Lath

      Richard:
      1) given the uncertainties of these determinations (just the known unknowns) 90% and 70% aren’t really that different.

      2) biggest differences are inclusion of national polls (PEC uses on state polls) and the use of some sort of “fundamentals” or model-based bias (I believe DailyKos uses Abramowitz’s model).

    • Jason Anastas

      Not sure how much of an effect this has but 538 also adjusts polls based on demographics and other factors before submitting them to a model – so their “polls only” forecast isn’t just an impartial modeling of the polls, as here, but still relies on their “special sauce.”

    • Sam Wang

      I think they are adjusting polls based on what the entire community of polls does. This would probably keep the median the same, overall – but increase the accuracy of individual polls, which can help because individual states don’t have identical sets of pollsters. The cost of such a procedure is that it increases the overall uncertainty by adding corrections that are themselves uncertain.

  • David

    Makes sense to me. It’s a valuable service they have provided. That said, it may be considered economic wastage to provide a service that doesn’t serve their own needs? But I’ll send the email anyway.