Princeton Election Consortium

A first draft of electoral history. Since 2004

Gallup’s man misses the point

November 17th, 2012, 1:04am by Sam Wang


(Update, Nov. 19: now with Gallup’s performance shown in graph form.)

Gallup editor-in-chief Frank Newport appears to be on a campaign against poll aggregation. In a recent essay (‘Polling, Likely Voters, and the Law of the Commons, Gallup.com) he writes:

It’s much easier, cheaper, and mostly less risky to focus on aggregating and analyzing others’ polls. Organizations that traditionally go to the expense and effort to conduct individual polls could, in theory, decide to put their efforts into aggregation and statistical analyses of other people’s polls in the next election cycle and cut out their own polling. If many organizations make this seemingly rational decision, we could quickly be in a situation in which there are fewer and fewer polls left to aggregate and put into statistical models. Many individual rational decisions could result in a loss for the collective interest of those interested in public opinion.

Oh, please. Considering Gallup’s performance in estimating the national race, this could be interpreted as a defensive move by this year’s equivalent of the Literary Digest poll (‘Landon by a landslide,’ George Mason University).

Newport misses the positive value that we bring to his activity. It is too bad, because what we do can ultimately increase the relevance of his organization. Here’s why.

First and foremost, poll aggregation is not like other forms of news aggregation. News aggregators like the Huffington Post basically recycle stories. Those of us who examine groups of polls add value for both reader and pollster:

  • For the reader, we cut through the noise. Individual polls contain two kinds of error arising from (a) inherent limitations of sampling, and (b) systematic errors made by individual pollsters. By using robust statistical tools, we reduce and cancel these errors to obtain a far superior result.
  • For the pollster, we offer a benchmark for future performance. Paul Starr pointed out to me recently that a likely reason for the improvement in political opinion polls since the 1930′s has been the fact that polls are easily compared with election results. Until recently, this comparison was limited by statistical sampling error. Now, aggregators can grade a pollster’s accuracy to within 1 percentage point.

Newport does not fully acknowledge the second point. Regarding his own organization’s performance he writes:

The “gap” difference was….well within the statistical margin of error and underscore[s] the accuracy of random sampling today.

Actually, no. Thanks to aggregation, we can say with great specificity that Gallup’s national October numbers (Romney ahead by 2% to 6%) were systematically off by 4-8% from the true margin at the time, Obama +2.0% (“A final unskewing,” November 12th). No wonder he doesn’t like us. Underneath the bluster and threat, I believe that Newport’s real problem is Gallup’s own poor showing.

The red curve indicates Gallup’s data, plotted with 1-sigma error bars. The black curve is my best estimate of the true Obama-Romney margin, based on all available national surveys (“A final unskewing,” Nov. 12). The last data point is off by about 4.0% (2 sigma), and the three data points before that are off by more.

In fairness, it was not only Gallup whose national numbers were off. National polls as a group were biased by an average of 2.4 +/- 0.4% toward Mitt Romney. State polls were a superior source of information: our Popular Vote Meta-Margin did far better than the national Romney-vs.-Obama average in predicting the national vote.

Newport is correct that poll aggregation does devalue the news value of any single poll. That is the point of the activity. It’s why I started doing it in 2004. I was driven to distraction by breathless stories on single polls. This year, I almost blew a blood vessel when I saw the entire front page of USA Today dedicated to a single Gallup poll that was an outlier. Let’s face it, news organizations love outliers. If aggregation kills that kind of story in the future, our entire nation wins.

Despite Newport’s complaints, my own view for the future of his field is bright. Aggregators like PEC bring focus to their activity and add a new dimension. However, now they have to be nimble. They can’t get stuck in a rut reporting only topline numbers. That low-hanging fruit will soon be gone.

But there are many ways they can improve their game. For example:

Focus on crosstabs. Much of the richness in polls comes in the details: knowing that young voters tilt Democratic, or that many Romney supporters would rather identify themselves as independents than as Republicans. Those details carry endless news interest.

Watch one another. This year PPP missed a big story by failing to report a sudden plunge in support for Todd Akin (R) in his Missouri Senate race to unseat Claire McCaskill (D), after his “legitimate rape” comment (“Akin sheds 8 points overnight to near-tie,” August 12). They could have caught that story if they had been willing to compare their own results with other pollsters. This pridefulness should stop.

Develop new products. The most interesting polls this season were products like the RAND longitudinal survey, in which the same respondents were surveyed repeatedly. Gallup itself had some fascinating results from their tracking poll, which showed an overnight jump in President Obama’s job approval rating after Michelle Obama’s speech (“Michelle Obama, the Great Persuader,” September 9th). Let a hundred flowers bloom!

Learn from the crowd – but don’t be afraid to go the other way. Pollsters can learn not only from each other’s biases, but also from where the polls are fielded. In the five weeks after October 1, Pollster.com showed 97 national polls. That is complete overkill. Some of that effort would have been better spent on downticket Senate or House races – or even nonswing states, which received so little love this year.

Go local! Many pollsters conduct national surveys as loss leaders. They do it for the media exposure. But there is plenty of publicity to be had in other races. If pollsters swooped in there, it would garner them publicity – and even help slow the decline in local journalism. That would be a win for pollsters – and add much-needed diversity to our media culture.

And here we come to an irony: the Gallup organization is rich in expertise, and is a leader in adding value in interesting ways. If they continue to do that – and stop complaining about the new kids on the block – they can maintain their relevance. I wish them every success.

Update: To emphasize a point above, for organizations like Gallup, I am under the impression that most of their income comes from other polling, not what aggregators analyze. Therefore I believe that the availability of public polling data is not under threat. However, their brand did take an undeniable hit this year.

Tags: 2012 Election

176 Comments so far ↓

  • Olav Grinde

    This sums things up rather nicely:

    https://sphotos-b.xx.fbcdn.net/hphotos-prn1/31634_353178101444371_1231681608_n.jpg

    Another alternative, of course, is to sell Texas to Mexico. And to use the profits for public works, to pay down the national debt, or whatever.

    After all, Russia sold us Alaska… No?

    ;)

  • wheelers cat

    wolfers.
    http://www.bloomberg.com/news/2012-11-19/crowds-are-this-election-s-real-winners.html
    so maybe Dr. Wang’s Intrade trolls were right after all?

  • wufwugy

    Not sure how many ballots are left to count, but it appears that RAND was about as accurate as accurate can be

  • Dharma

    Best explanation of Gallup’s egregious readings: The Fix Was In. I don’t buy the post facto explanation of voter mix. I think R&R saw how much influence Gallup had on aggregators and used muscle to try and swing opinion and the Media. It gave hope to the hopeless.

    From the “other brand” :

    “Thus, even though the Gallup national tracking poll is more influential than any other individual poll series in the FiveThirtyEight trend-line calculation, it still accounts for only about 12 percent of it.”

    Gallup research is heavily biased towards whoever pays for it. Not saying that they were bribed, just that their mgmt and user base is heavily Romneyesque.

  • wheelers cat

    happy black friday all!
    Ahh, the smell of consumer blood sacrifice in the morning– the Market Gods will be pleased.

  • Olav Grinde

    My wife and I are about to enjoy a beautiful red-wine-marinated rack of lamb, with a good French wine.

    I wish everyone a blessed Thanksgiving!

  • Olav Grinde

    * characterized by an opaqueness
    ** whereby they can argue

  • Mitch

    Bit late to the discussion, but here goes:

    I think I look at the issues in this discussion rather differently, and I’d be surprised if Prof. Wang doesn’t agree with me, given his day job.

    The problem I have always had with pollsters is their intellectual callowness. It is not merely the media that loves outliers, it is the pollsters themselves, as can be seen by reading the articles that accompany the releases of the polls on the pollsters websites.

    It is not “herding” for the pollsters to look at each others polls – it is a requirement of science. It if often said that repeatability is one of the central aspects of the scientific method. But pollsters will often announce their polls, without even *referencing* the results of other pollsters polling precisely the same subjects, or discussing the differences.

    Were this to happen in a scientific experiment, the scientist would be called to task for not explaining why his/her result differs from that of another pollster, and why the new result should be believed in preference to the old one.

    Failing to do this is standard operating procedure of pollsters – Gallup’s result was consistently different (and statistically significantly so) from the other pollsters throughout the election cycle, and they didn’t feel obligated to discuss why this was happening and why they should be believed in preference to the others.

    Had the pollsters been doing this all along, there would have been no reason for Prof. Wang (or Nate Silver or …) to aggregate at all. This discussion is critical to the establishment of consensus in science, and the lack thereof in political polling is why none has existed in polling – until this year.

    • Sam Wang

      This is a good point. Failure to reference one another would be unconscionable in science – but is currently the standard in the polling industry. This relates to one of my recommendations.

      As far as clusteirng goes…they don’t share methods, so it seems to be not too much of a problem. If there were a concerted lobbying effort by lots of pollsters, maybe. But even then, I imagine they’d be pretty suspicious of one another.

    • Olav Grinde

      There is another key difference between pollsters and scientists (at least those truly worthy of being called that)…

      Scientists are concerned with truth and feel compelled to subject both their methods and results to peer review.

      Pollsters, on the other hand, are concerned with brand building, which is key to their economic success. Furthermore, their methods are proprietary and are closely guarded secrets. Hence, with few exceptions, the entire business is characterized by an opaques that precludes rather than enables any meaningful peer review.

      I believe that pollsters need to start searching for a happy medium. I’m sure it’s possible for them to find a narrative that addresses the points raised by Mitch, whereby they argue why and how their polls are more believable and accurate than others — without necessarily revealing proprietary secrets.

      This is perhaps comparable to Nate Silver’s position as an aggregator. While Dr Sam Wang, who is a scientist, openly shares his methods — and even the software code — Mr Silver feels the need to protect the details of his model, since it is the core of his livelihood.

      Nevertheless, Nate Silver does share sufficient explanations about his methods to make for a very interesting narrative, and one that allows for at least some comparison with the methods of other aggregators and election predictors.

    • bks

      Mitch, If DNA microarray experiments agreed with one another as closely as Gallup matched Sam’s “true value” what a happy world genome-wide association studies would have been. Olav, Nate and Sam are also establishing brands.

      –bks

    • mediaglyphic

      If the objective of the polling industry is to create brand awareness, then perhaps Gallup’s strategy was correct. Its an old PR adage — any press is good press and Gallup got a lot of press.

  • A New Jersey Farmer

    Happy Thanksgiving everyone. Lots to be thankful for this year politically. Things are looking up.

  • Pat

    Has anyone noticed that with the latest vote counts, the margin in Colorado (O+5.5%) is now larger than the margin in Pennsylvania (O+5.0%).
    And this actually makes Pennsylania (not Colorado) the tipping-point state!

    https://docs.google.com/spreadsheet/ccc?key=0At91c3wX1Wu5dFp2dUlkNWlJeGN5NFUxa0F3cXpoLXc&pli=1#gid=0

    • mediaglyphic

      Pat,
      what an amazing spreadsheet. are you updating this yourself? if so is there a way to tell which states have the final tally. This sheet is ever more usefull than michael mcdonalds.

    • Pat

      No, I must say I found it online. I’m not sure, but this may be the same person who handles the vote count on the “2012 US presidential election” wikipedia page, which seems to have one of the most up-to-date count.
      Actually in the wikipedia table http://en.wikipedia.org/wiki/2012_us_presidential_election#Votes_by_state, they indicate in the table “yes” or “no” whether the results are final for each state.
      The only thing that’s unfortunately missing in this table are the vote percentages and the % margins.

    • mediaglyphic

      i want to see a sheet that tells us for all states with final count, is the vote higher or lower than 2008, both total vote and % of elegible VAP. i guess we will just have to wait for this.

    • mediaglyphic

      i counted it myself, it looks like 11 states have finalized voting, about 19,748,758 million votes, these same votes had a slightly higher 19,834,011 in 2008. Fl, SC, LA, ND, DE have more votes than last time around.

    • Pat

      Actually from the table in the Wikipedia page I linked to above, 15 states have certified their results apparently.

    • mediaglyphic

      which ones did i miss?

      1 Delaware 412616
      2 Florida 8411861
      3 Georgia 3932158
      4 Hawaii 453568
      5 Louisiana 1960761
      6 North Dakota 317738
      7 Oklahoma 1462661
      8 South Carolina 1920969
      9 South Dakota 381975
      10 Vermont 325046
      11 Wyoming 254658

    • Olav Grinde

      Provisional votes — a lack of transparency

      @Pat, what I would like to see are state-by-state totals that also show how many provisional votes were cast, and how many of those were included in the count.

      I am particularly interested in the percentage margin between the candidates, with comparisons between before and after the counting of provisional votes.

      It seems that this might yield valuable information.

      1) If the counting of provisional votes results in a significant change of margin, then there might be reason to question the neutrality of the process by which voters are compelled to cast provisional votes!

      2) If a very large number of provisional votes are excluded from the final count, then this might be additional evidence of vote suppression or count skewing — or at the very least evidence the need for an overhaul of the voting process in that state.

      3) Case in point: Over 600,000 provisional votes were cast in Arizona. That seems utterly disproportionate to that state’s voting population. Something is wrong!

      I must admit that I am frustrated. Many times I have searched for state-by-state overview of provisional votes, without finding such information. I have also asked several times on this forum. To me it seems strange that this information is not readily available!

    • Pat

      @mediaglyphic
      In addition to the 11 states you cites, Wikipedia says that Arkansas, Idaho, Iowa and New Hampshire results are final.
      The numbers you provide differ slightly from those at Wikipedia, though. In several cases, their vote count is higher than yours, probably meaning that they are more up to date, and indeed links to the various secretary of state website confirm these more complete counts (these are DE, FL, LA, ND, SC). In other cases, the counts you provide are actually higher than the Wiki page, even though they link to SoS sources and claim the values are final (OK, SD, VT, WY, GA, HI). For example, in South Dakota, the SoS website lists a total of 363815 votes (much less than your number) so I wonder if the difference comes from all the votes for “other” candidates.
      It’s indeed pretty odd and frustrating that even when results are supposedly final, no two sources can agree on a definitive count…

    • Pat

      @Olav
      I agree. But given that it is already hard enough to get any definite and reliable vote counts, even for states that certified their results, the detail of provisional ballots counts might be too much to ask :)
      Some secretary of state sites to provide the breakdown between absentee mail, early voting and election day (Oklahoma) but I haven’t seen anything about provisional ballots indeed.

    • Olav Grinde

      @Pat, if I was part of any election night news team, I would insist on obtaining for each state:

      1) Votes for candidates A, B and C, as they’re tallied
      2) Number of provisional votes cast

      As far as I know, # 2 is known quite early in the game. That’s why I find it peculiar that numbers are so hard to come by.

      I am astonished that none of the news networks are concerned with this. Nor it seems are any of the aggregators! In previous elections there have been millions of provisional votes. It’s as though they don’t really count.

    • mediaglyphic

      @ pat,
      i think those states have been updated more recently. (the numbers i have in my list are the 2008 results a function of copy and paste!!). In any case we will know about turnout soon enough.

    • 538 Refugee

      Nice read. I pointed out awhile back that either you understood the culture or you didn’t and it is hard to just ‘hire’ it.

  • wheelers cat

    well…its kind of topic for this post, but I think Taleb was really talking about relative evolutionary fitness of business models, which is good.
    But it will be [unfortunately] interpreted as advocating for letting the auto industry fail. And you are correct, its an issue of scale. The auto was enormous in scale. That is why letting it fail would have been a bad idea. Evolution of fitness of academic/scientific models happens too– look at the success of PEC and 538.

    I think I have to accept that Taleb is a business class elite, a red phenotype, and not an academic elite. He draws examples from his domain, not mine.
    I’m a counter-culture anti-capitalist and an academic. What makes America work pretty well is the same thing that worked for homo sap in the EEA– genetic and memetic diversity of societies and tribes.
    ;)
    But lets consider a failed business model on a more manageable scale– Hostess.
    Where Hostess really failed was in not starting a healthier sub-line of snack food. imho they could have added a Babycakes style line of gluten-free dairy-free baked goods. So when their consumer base shrank, they tried to pass their loss of market share on to labor. The way to do that successfully is to let the employees buy in until the company can retool, modernize.
    Hostess chose to sell out and blame the employees, so they chose to fail, really.

    I really, really, REALLY like Taleb. The way his brain works differently than mine is part of the attraction im sure.

    • wheelers cat

      pardon, off topic i meant.
      But I guess we are talking about the relative evolutionary fitness of LV models, arent we?

    • wheelers cat

      HA!
      epiphany….
      genetic and memetic diversity is anti-fragile.
      does that help?

    • mediaglyphic

      @WC,
      not sure if you are pointing this article out in a good or a bad way.

      A lot of the points that Taleb makes in this article are really weak. If a mistake helps a company, why is it a mistake? Restaurants don’t fail because of bad food, they fail due to things like portion control, location and poor advertising and sale promotion.

      Rule 2 what??
      Rule 3 — what about scale economies?
      rule 4 — academic knowledge comes from trial and error doesn’t it? some projects require scale and others don’t

      Perhaps Taleb is a black swan, he got lucky with one book and now its a bunch of psychobabble. I actually like his first book better (“Fooled by randomness” — Bet on the highest expected value not the most likely outcome)

    • wheelers cat

      media
      more specifically to your question, rewarding a mistake is anti-fit.
      it just builds in structural weakness to the business model.
      now you COULD look at the bank bailout as anti-fit.
      and the banks are going to fail again, trust.
      but the auto-industry bailout was actually an employee buy-in with the government standing in for the employees.

    • Amitabh Lath

      Taleb is comparing markets to biology.

      I am not a fan of comparing natural systems to artificial ones. Concepts like wealth are manmade, and need not operate under the same constraints as biological systems.

      I recall reading a simple illustration. Take a football stadium full of randomly chosen people. If you tabulate a natural quantity like height, you will get a sensible mean, and spread (std. deviation).

      If you were then to add a dozen of the world’s tallest people, these values would not change much (or at all).

      But if you carry out the same calculation with a manmade quantity like wealth, adding a dozen of the world’s richest people will completely change the mean and spread.

      Basically a long winded way to say that height is a natural concept, wealth is an artificial one. One should be careful in comparing biological systems with markets.

    • wheelers cat

      Amit, everything is evolutionary. The markets are biological and biology is the markets.
      Its all (EGT) evolutionary theory of games.
      Where I have a superstrong disagreement with Taleb is his Hayek worship. Dead White Guy Philosophy is so past its sell-buy date.
      But admittedly DWG philosophy was a selective advantage when the euro-anglos ruled the world.

    • bks

      But not the median!

      –bks

    • pechmerle

      I also thought that a lot of Taleb’s points in the article were weak. The basic concept — learn to live in a volatile world — is significant and useful, especially when dealing with artificial worlds like markets. But he doesn’t deal very effectively with how much volatility we should tolerate or want in artificial worlds like markets. A market in French antiques — sure. A market in medical care — not so much.

  • Michael Slavitch

    “First they ignore you, then they laugh at you, then they fight you, then you win. “

  • wheelers cat

    https://twitter.com/SamWangPhD/status/269065092635697153

    “Better not sign that major label deal, Sam-you don’t want to lose the indie kids!”
    good advice.

  • MAT

    Even Doonesbury is piling on:

    http://doonesbury.slate.com/strip/archive/2012/11/20

    BTW, this has been one of the best comment threads yet. Kudo’s to everyone!

    • bks

      What’s wrong with 110%? Is Doonebury being subtle or just regurgitating statistibabble?

      –bks

    • pechmerle

      bks, did you miss the joke? That nobody can “give more than 100%.” Another nod to the quants over the sloppy speaking pundits.

    • bks

      I don’t get the joke. Why can’t people give 110%? I gave 500% more attention to the polls in 2012 than I did in 2010. I drove my car 400% more when I was working than I do now that I’m retired.

      –bks

    • Suja P

      If 100% is the absolute best one can do, then 110% is simply not possible. If you are using the term compariaively, then it would be possible (I’ll give 110% more than I did last time), but not as an absolute.

    • bks

      Maybe Nate took performance-enhancing drugs thus enabling him to give 110%.

      –bks

  • Steve

    I should like to weigh in on the Gallup issues as a long time election modeler. Gallup’s problems are not just limited to 2012, because they have had problems with inaccurate state polls prior to 2012. For example, in 2004, consider Wisconsin. Kerry won by 0.9 points. But, an average of the last 12 polls in that state had Bush by 0.6. In the last 12 polls, Gallup had 2 polls, both with Bush ahead — one by 4 and the other by 8.

    In this state, Gallup was clearly an outlier. Use of the median for this state in 2004 was better than the mean. This is just one example.

    I should also like to cite another problem pollster who is no longer doing state by state polls: Zogby. Example: in 2004, Zogby had 4 out of 12 of the last polls in New Mexico all with Kerry leads. That state went to Bush by 0.2 points. Again, the median tended to discount the aberrant Zogby results. Note that Rasmussen did 92 state polls.

    Then you move to 2008 where 2 close states — Indiana and Missouri — had the presence of Zogby polls, plus Rasmussen. In both cases those pollsters were way off. For Missouri, the use of the median very much helped. For Indiana, the median did not help. I am not sure what would have helped there, other than throwing out Zogby and Rasmussen polls.

    For 2012 Zogby had no state polls to my knowledge. We now have the two least accurate pollsters (Gallup and Rasmussen) present. But, according to my calculations, the increase in polls combined with their overall quality, shows that the average of the last 12 polls in the 6 closest states did better than the median approach.

    My question: will Gallup and Rasmussen eventually go the way of Zogby?

    I guarantee you this: when I run my models in 2016, I will keep a very close eye on those 2 pollsters.

    • Jay Bryant

      And so will I, thanks in part to your analysis. Thanks for that. The other part is just my own observations that these guys always seem to be off to the right.

      If your car pulls to one side, you adjust the alignment. Pollsters should do the same.

  • William Ockham

    I am glad to see folks here looking at the likely voter screen. Let me suggest that Gallup’s fundamental problem is that they do only national polling. If you read the linked article, you will notice that their RV polling was much closer to the outcome than the LV adjustment. The issue is simple. LV screens work really well in battleground states. The people who make it through the screen are enough like the people who will actually vote to make the screen effective (i.e. to be effective a LV screen needs to be more predictive of the outcome that a simple RV poll). At the national level, this just isn’t true. Pew has pretty much the same problem, but often in the other direction. Btw, Pew is completely open about their likely voter screens.

    Also, here’s some more (indirect) evidence for my contention that there are no undecided voters, only people undecided about voting (from John Sides):

    For example, consider YouGov respondents who were interviewed first in December 2011 and then again the weekend before the Election Day—almost 11 months later. Of those who said in December that they would vote for Obama in an Obama-Romney race, 95% still preferred Obama on the election’s eve. Of those who preferred Romney in December, 94% did so again in November.

    • William Ockham

      Just to add to that last data point, 94% of the people in the Dec. 2011 poll had already made up their minds. I’m finding the yougov data quite fascinating. Here’s what they say about that big swing to Romney after the first debate:

      [YouGov] We interviewed 33,000 people before the first debate and re-interviewed 25,000 of them afterwards. The change was less than 1 per cent. You can’t do better than large-scale high-response re-interviews to measure change. But we found one interesting detail: those who had previously said they preferred Obama were significantly less likely to respond to the post-debate survey.

      [WO] The whole “swing” that showed up in the polls was from Obama supporters non-response. It didn’t change anybody’s mind about who they would vote for, but it did make Obama supporters less likely to vote (or at least less likely to make it through a likely voter screen). My point in all of this is that the key decision that every voter makes is whether or not to vote. Even the people who think they are undecided really have a predictable preference. Give me an “undecided” voter and let me ask them the following questions:

      1. Are you over 40?
      2. Are you a male?
      3. Do you consider yourself “white”?
      4. Do you want to repeal “Obamacare”*?
      5. Do you think we should have less government?

      Take each yes answer and multiply the question number by 6. That’s more or less your chance of voting Republican in a presidential election, if you are “undecided”.

      *For future elections, just insert whatever government program Republicans have made their bête noir of the campaign. They always have one.

  • wheelers cat

    So. I thought the PEC crewe might be interested in what is going to happen to the OFA database.
    http://online.wsj.com/article/SB10001424127887323622904578129432544571720.html?mod=rss_Politics_And_Policy
    When Jim Messina asks should the OFA data be used to support Obama’s legislative agenda, what springs to the collective hive mind of the PEC commentariat?
    The way to support Obama’s legislation is to take back the house. So lets work to make 2014 a referendum on immigration. Backlash redistricting like Florida voters backlashed voter suppression.

  • Beaucon

    Sam, you have done supurb work to tweak the underlying truth out from election polling. This was a terrific service, but polling does not end when the elections are over. There was a recent poll, by Peter Hart Research Associates, and organized by the AFL-CIO. And it showed, that by 64 to 17 percent, voters want to protect Social Security and Medicare benefits and address the deficit by increasing taxes on the wealthy rather than cutting entitlements. I am paraphrasing Bernie Sander on this. The point is I would love to believe this poll, if it reflects reality, or debunk it if it does not. In the coming months, we will be inundated with “damn statitics”. Do you have any advice for those who were drawn to your site looking for a reality filter. Are there a few tips for those of us who are not trained in statistical analysys, that will give us some clue to how to evaluate the flood of polls that will undoubtedly accompany the pending fiscal cliff drama?

    • mediaglyphic

      Beaucon,

      In addition to who was polled and how the results are weighted, the exact wording of the question is important. I googled the AFL-CIO peter hart poll and its hard to find any detailed information. i did find some information about a peter hart AFL-CIO poll on medicare from 2003. This poll only asked people 55 years or older!!

      We need specifics. Reality lies in the details!!!

    • Olav Grinde

      I really have a problem when people refer to Social Security as an entitlement. It’s nothing of the sort! People pay into it all their lives — and when they retire, they receive their rightful payments.

      Should we start saying that dividends are entitlements for shareholders? Wouldn’t that be rather preposterous?

    • Michael

      Why is it preposterous? Shareholders are entitled to dividends, aren’t they? And if you pay into Social Security, then you’re entitled to the benefits you receive when you retire. The problem is that we’ve let Republicans turn “entitlement” into a dirty word.

    • Sam Wang

      The word “entitlement” derives from the fact that Medicare and Social Security benefits are mandated by law, otherwise known as a title. However, it is indeed annoying that the word is often interpreted in its colloquial sense.

    • Olav Grinde

      Comrade Michael: You are indeed correct, in a literal sense. However, that’s close to irrelevant, because as Sam points out that is not how the word is used.

      * Just as using the word comrade may mean friend — but in this country you can hardly use it without pushing certain buttons.

    • wheelers cat

      Michael, we let them turn liberal into a dirty word.
      Now we are taking it back. Romney’s army of yardsigns had almost no red on them if you noticed.

  • jayackroyd

    Hah. I started to post a comment, without reading the commentary. As I should have known, the issue I was interested in was not merely asked and answered, but discussed in some detail.

    Twitter has killed off a lot of commentary communities. It’s wonderful seeing this one thriving.

  • E L

    Joe Scarborough ‏@JoeNBC
    You won’t want to miss @Morning_Joe tomorrow [11/20] when we welcome Nate Silver (@FiveThirtyEight)
    Retweeted by Nate Silver

  • Amitabh Lath

    Fellow commenters and Sam,
    Look at Sam’s plot comparing Gallup to the true margin. It should give them heartburn.

    Whatever Gallup did wrong, it manifests itself most heavily post-debate#1. It’s the “python that swallowed a pig” plot. In this case the pig is an anomalously large chunk of Romney voters. You can see that Gallup was reverting back to the mean but not quick enough. If the election were in December, they would have been ok, the pig digested.

    So let’s pretend you are Gallup. You see these numbers come in, and at first you say ok, Romney is gaining due to debate#1. But then your numbers keep going pro-Romney, while everyone else is stabilizing.

    Something is awry.

    Do you:

    1) keep going hoping everything will work out in the end?

    2) change methodology/weighting a few weeks before the election?

    3) stop reporting results until you’ve completed a level-1 diagnostic?

    The correct answer is #3, but it takes guts.

    • mediaglyphic

      @Amit,
      until the last couple of cycle’s no-one was measuring pollster error in a systematic way (at least measuring and broadcasting loudly). If the main function of polling for Gallup is to garner positive mindshare for the paid research business, then the feedback that aggregators are providing is a good incentive system.

      Economists say that incentives explain 80% of everything. Lets see if it works this time. The challenge here is that cycles are long.

    • 538 Refugee

      My list of what I think the graph shows.

      1. Hurricane Sandy swung the election to Obama at the last minute on the strength of Chris Cristie’s endorsement of the way the president handled the crisis.

      2. Gallup does a three day average so it was weighted with data that was collected prior to the storm and lagged because the storm interrupted the polling. This is tied to #1.

      3. Sandy delayed shipment of their special mix of tea leaves and entrails they use to divine their likely voter model.

      The correct answer is #3 but it takes guts.

    • Matt McIrvin

      As I’ve said before, I think one of the major effects Hurricane Sandy had on the media narrative was that it caused Gallup and Rasmussen to suspend their daily trackers for several days. Mirabile dictu, Obama was doing a lot better than before!

    • Amitabh Lath

      538refugee:
      Christie technically pays my salary so I should be circumspect. But if you are talking about “weighting” the polls…

      Anyway, he got to meet Springsteen, while we had no power for 12 days.

      Seriously though, I don’t think Gallup itself knows what’s wrong (or will share fully when they do figure it out). As mediaglyphic points out, they better say something soon since their paid business depends on them being seen as blue chip.

    • Anton Mates

      @wheelers cat,

      Thanks for linking that article. Mark Mellman makes a really good point in there about the drawbacks of focusing on “likely voters.” Even if your likely voter screen is the awesomest one in the world, there’s simply no reason to assume that the distribution of likely voters is identical to the most likely distribution of voters.

      If there’s a big chunk of the population falling just short of the voting likelihood threshold, then they’ll be invisible in Gallup’s results even though we can expect them to collectively contribute a significant fraction of voter turnout. That’s a severe pitfall even if you’re only trying to predict the national popular vote, it seems to me.

  • Kevin

    “Watch one another” really troubles me. The biggest danger to aggregation as a correct snapshot of the current situation is polls clustering around an agreed on story, overly worried about being the next Rasmussen or Gallup. Outliers are fine. What 538 or PEC could not withstand is widespread copying. I’m not in favor of pollsters watching each other, I hope they resolutely avert their eyes and do their own thing, that’s much more likely to produce a random distribution of errors.

    • 538 Refugee

      There is a couple of ways to look at this. If you are way off the median then you might want to double check your methodology. If you believe your method is firm, then you stick to your guns. It might also help you discover a mistake that should be corrected.

      I gotta believe that these guys review each others stats/questions/methods in general as they come out to see if there is something useful that they might incorporate. Maybe the wording of questions to ferret out likely voters?

      In the end, their work is their work and reputable firms put it out there for scrutiny anyhow so they are open to input whether they seek it or not.

    • wheelers cat

      I just dont believe in that herding/flocking/schooling meme. Gallup employed antique methodology, and Rasmussen cheated.
      Just like the conservative camp that honestly and fervently believed Romney would win, Gallup privileged “common sense” and “gut-feeling” and legacy LV screens.
      But neither the markets or the maths are respectful of “common sense”.
      Post 2012, market forces are going to shape the field of pollsters.

      From Dr. Tufekci’s article.
      “For all the bragging on the winning side — and an explicit coveting of these methods on the losing side …”
      The successful pollsters from this cycle are the new gold standard. Telling half the country what they want to hear because that is the way they “feel” it should be because of the past…..that is not going to be a successful business model going forward.
      It was humiliating to be that wrong. Newport sounds sort of petulant to me, “we are going to take our ball and go home” sort of.
      Well…the Market doesnt care if you go home, Frank.

    • mediaglyphic

      @Kevin,
      There is definitely an incentive to herd in all human behaviour (apologies to elias canetti). An aggregator can actually help create incentives for pollsters to unherd, by assigning ratings to pollsters who unherd (high tracking error) and are correct. There are rating systems that do this for earnings estimates.

    • wheelers cat

      mediaglyphic.
      prey (eg. cows and sheep) herd, predators (eg. humans and wolves) pack.
      I wish people would stop saying herding behavior.
      For one thing, herding implies a directed herding agent with a goal.
      If you must use a species-inappropriate adjective, say flocking or schooling behavior.

    • E L

      @wc “Post 2012, market forces are going to shape the field of pollsters.” The great American philosopher P. T. Barnum once said: “There’s a sucker born every minute.” Crazyland wants “scientific” proof they’re right and someone’s gonna feed it to them for a fat profit. Fox, Drudge, and Limbaugh aren’t going to fold just because they’re proven wrong. They’ll want polls to wave around. Rasmussen, Susquehanna, and Suffolk will be paid nicely for trash.

  • Amitabh Lath

    I sympathise with Gallup and their likely voter screen. They’ve been in the game since wall-mounted hand-cranked phones were the new technology, and in their experience, things like LV move slowly. A few tenths of % was a remarkable change. Things move, but adiabatically.

    But then you get this: a paradigm shift. 18 yr olds are voting, and convincing others to vote. I am not at Princeton like Sam and Dr. Tufekci, but at a State University 20 min drive north of them, and I was impressed by how connected our students are to political events. Much more than me and my cohorts were at that age (we let Mondale down to an ignominious defeat).

    Anyway, poor Gallup. Linear regression to predict LV increase isn’t going to work in this environment.

    • Michael

      You’re being way too easy on Gallup. They were publishing results that didn’t come close to passing the smell test. I was checking the cross-tabs on their trackers before Sandy hit. and they showed me that Gallup thought that the electorate was going to be something like 82% white, which hasn’t been true for decades. The people responding to the surveys were doing everything they could to tell Gallup that their LV screen was screwed up, but Gallup just wouldn’t listen. This is general science 101: models are built to fit data, not the other way around.

    • Amitabh Lath

      Michael, yes I know. I am a softie. I know they are wrong, but hate kicking someone when they are down. I was just feeling pity.

      Having said that, the 82% white fraction is truly startling.

      Do you know if they actually weight by race (in which case the 82% is an input) or they weight by other factors (gender/age/income/education….) in which case race is what we’d call a spectator variable. If so, after being weighted, the sample just happened to turn out 82% white.

      But you are absolutely right. Either way, they are pros, and getting a sample that skewed in an important variable like race should have thrown up red flags.

      We weight data all the time. And hate doing it.
      Weights that differ from unity too much mean that your input data was highly biased. So you resist putting big weights on it. Maybe that’s what happened at Gallup. A couple of weeks before the election it seems they got a really large pro-Romney sample in their tracking poll, and they never corrected for it, and never recovered from it either.

    • mediaglyphic

      @Amit,
      i believe Gallup says that they weight likelihood of voting, according to a set of LV questions that are asked. This is where i think the measurer may be affecting the measured, as voters are not answering honestly. Of course the weighting being far from unity is an issue for all pollsters, not sure why it would be worse for Gallup, unless they are polling a weird subset of the population.

      We will never know unless we get the whole dataset.

    • mediaglyphic

      @Dr. Wang,
      What is the assumption of the “true margin” line? We only really know this at the end of the series. Are we assuming that the same offset (infered from the last point) applies for the whole series?

    • Matt McIrvin

      Gallup’s likely-voter trouble wasn’t a phenomenon of the endgame. It was visible back in the summer, but got less attention because at that point, Gallup wasn’t using the LV value for its daily tracker.

      I remember this because Talking Points Memo was getting worried over it. I think it was around the time of the Paul Ryan bump, right before the conventions, that Josh Marshall posted a worried squib about Obama’s enthusiasm problem in which he noted that Gallup’s polls showed Obama losing seven points of advantage when you went from RV to LV numbers.

      He interpreted it as Obama’s problem rather than Gallup’s.

      It got me mildly concerned, and I came back here and started wondering out loud in the comments if you could see this in state polling. And Sam posted something to the effect that, as far as he could see, Obama actually didn’t lose much from LV screens in the state polling at all. I went back and looked at some numbers on Pollster and RCP, and, sure enough, he seemed to be right. The national enthusiasm gap wasn’t showing up in state polling.

      And this is pretty much how it went all the way to the end, with the exception that people paid more attention because in the endgame, the reported top-line numbers all had LV screens.

      Now it turns out that Gallup was way, way off, and even the other national polls were docking Obama by about two points more than they should have. I think the origin of the latter effect is still an open question. Sam’s hypothesis is that the greater homogeneity of state populations reduces the opportunity for confounding effects in likely-voter screens, which might be the case.

    • wheelers cat

      Matt McIrvin
      yeah, I think we dont have all the answers yet. Nate blamed the difference between state and nat’l polling on relative exploitation of cell phone demographics, but it could be population homogeneity of states or just the fact that nat’l polls were fewer.
      But then why would the nat’l error uniformly tend to favor Romney?
      That could happen if Obama voters and Romney voters actually comprise two different populations with asymmetrical political behavior. And then we get back to my crazytalk about a possible non-Gaussian underlying structure for carbon-based reality.
      So one hypoth would be that the nat’l pollsters were only sampling high enthusiasm responders (people that would consent to be polled or were available to be polled). I think it is proveable that organic conservatives have asymmetrical enthusiasm, that conservatives have more enthusiasm than liberals. And the more frequent nature of the battleground state polling removed that bias.
      Another thing to consider is that frequent polling educates responders and engages their interest. I think we saw some of that in RAND.

    • Sam Wang

      The population homogeneity idea: is it so hard to believe? What if it’s just that a diverse US population has become increasingly hard to sample, and the components that are hard to sample (cell phone, Latino, Asian,…) tilt Democratic?

      These sampling problems become easier when one is just sampling one state. Imagine 0.1 person in the Ipsos sample standing up for all the Vietnamese-Americans of the Midwest, that kind of thing. Lots of categories, too small for a national pollster to be able to handle with ~1000 respondents.

      In regard to battleground states, maybe not polling but media saturation could affect responses.

      Asymmetry: to account for responsiveness to pollsters? In some sense we know that is true based on RV vs. LV results. Maybe LV screens overdo it in Presidential years, with Gallup being the oldest and most extreme.

    • wheelers cat

      umm…..okfine…..but you are still just saying the equivalent of we are violating Nyquist so we get aliasing.
      But that still doesnt my question about why the aliasing is all unidirectional (towards Romney)?

    • wheelers cat

      NJF, this is the wave of the future.
      http://www.nytimes.com/2012/11/17/opinion/beware-the-big-data-campaign.html

      “How did Mr. Obama win? The message and the candidate matter, of course; it’s easier to persuade voters if your policies are more popular and your candidate more appealing. But a modern winning campaign requires more. As Mr. Messina explained, his campaign made an “unparalleled” $100 million investment in technology, demanded “data on everything,” “measured everything” and ran 66,000 computer simulations every day. In contrast, Mitt Romney’s campaign’s data operations were lagging, buggy and nowhere as sophisticated. A senior Romney aide described the shock he experienced in seeing the Obama campaign turn out “voters they never even knew existed.”

      Thats where google insight and twitter trends and facebook connections come into play.
      Its a whole new dimension of market research.
      I like to call it biomemetics, that is the study of biomemes…that is, how someones phenotype interacts with their social environment to shape their memome.

      So like Amitabh says, it becomes critical to test if someone is likely to vote– but also critical to evaluate what would make them *more* likely to vote.
      And what is the most effective means of persuasion? Social propaganda or knocking on doors?

      Now Dr. Wang has a foot in both camps– he can distill the real shape of the data with math– proven fact.– and he can get the cognitive neuroscience as well.
      http://www.nytimes.com/2008/06/27/opinion/27aamodt.html

      I think there is going to be a lot of money thrown at biomemetics in the near future.

    • wheelers cat

      oh my goodness he’s from Princeton too like Susan Fiske from the OFA Dream Team.
      “Zeynep Tufekci is a fellow at the Center for Information Technology Policy at Princeton University.”

    • mediaglyphic

      @wc — he is a she (zeynep that is!!)

    • wheelers cat

      omg a thousand thousand pardons.
      i hate when people assume im a guy.

    • A New Jersey Farmer

      wheeler’s cat:

      I saw that article too. At least the Times is coming around to the idea that social media and social pressure has a great deal to do with who votes. The Republicans seem to be behind in technology and they’ve been that way since Bill Clinton went on Arsenio Hall’s show and played the sax in 1992.

      Obama’s GOTV program is fourth wave.

    • bks

      I don’t think the big pharmaceutical companies, and biotech in general, are going to like this new wave of sophisticated statistical analysis in the newspapers.

      http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124

      –bks

  • Dean

    Gallup should definitely not be worried about poll aggregation. It should instead be analyzing what went wrong when it had Romney up by 5-7% not long before the election.

    Gallup has to now finally stop overestimating the white vote and stop underestimating the youth vote.

    IDB-TIPP was rewarded for its perserverance in trying to reach voters by phoning them repeatedly. The firm didn’t oversample older Republicans who were more likely to be at home and answering the phone.

  • Amitabh Lath

    mediaglyphic (please, call me Amit): I am not an expert on public opinion estimation, but I suppose as communication technologies change calling people on the telephone will give way to whatever is next. Probably text/IM.

    Captive set of responders like the RAND panel are interesting. One would have to think through the systematic uncertainty implications, but in you do get rid of statistical uncertainties, esp. when tracking changes in opinion.

    And I did note that article on google insight. At the time Wheeler’s Cat and I had a short discussion about it (remember thatm wcat?). Fascinating stuff. But I suspect contacting people and asking their opinion will always be important. As wheeler’s cat has been pointing out, land-line telephones will not play as big a role.

    Wcat, I do think Rasmussen was more in the “aid and comfort to Republicans” business than honest estimation. He paid a price for this. The plethora of sites that started showing “Rasmussen Free” maps cannot be good for his reputation. I should add I don’t have any direct evidence for it, but the indirect evidence you point out is fairly convincing.

    • mediaglyphic

      Amit,
      i see hybrid methods at first then a total move away from asking people.

      In the hybrid mode, i think google insight and other internet based systems will help us gauge turnout and be the basis of the LV screen and transformation.

      Eventually (and i am going to avoid putting a timeline on this cause i have no idea how to), i see a lot of promise in the snooping that the internet allows to forecast. From my elementary physics i remember the heisenberg uncertainty principle, i wonder if it applies here?

    • Amitabh Lath

      Mediaglyphic: Poor Herr Heisenberg gets trotted out a lot, but no there isn’t any fundamental source of uncertainty.

      Your question made me think of this: Imagine a future where everyone is on Facebook and shares maximally. In that world, Zuckerberg’s great granddaughter could predict elections (and most any other thing) with a very high degree of certainty.

      The luddites with land lines and broadcast TV would be a small, ignorable sub-sub-sample.

      I guess google insight etc are the harbingers of that. The only way to not be part of the dataset is to not search.

    • wheelers cat

      The Assangians will never be on Facebook.
      But we do use twitter.

  • Amitabh Lath

    Aggregators use data generated by pollsters. The reason the PEC, Linzer, Silver et al got it right was that a majority of the state-level pollsters guessed correctly on their LV filters.

    If every pollster had assumed that turnout would be like 2010 rather than 2008 then aggregators would have been wrong in turn.

    So rather than laud IBD for guessing correctly and Gallup/Rasmussen for guessing wrong on the likely voter filter, I question why are we guessing at all?

    Isn’t there a more analytic, systematic way to figure out who is likely to vote than a bunch of questions about prior voting?

    In any case, shouldn’t this rather important part of adjusting the polling results be more transparent?

    • wheelers cat

      But then you would be able see Rasmussen was cheating, Amitabh. (for example)
      He knew his LV model was bad in 2010 when he whiffed on CO and NEV because of cell demographies and hispanics.
      A lot of analysts pegged the white vote share of the electorate at 70-72% for 2012. Gallup could have used that as a fact check. They were way off.

    • mediaglyphic

      Dr. Lath,
      While Dr. Wangs work clearly shows that Polls today have value, i wonder if polls aren’t a way of the past. Have you read Seth Stephens-Davidowitz work using google Insight to predict turnout.

      http://campaignstops.blogs.nytimes.com/2012/10/20/googles-crystal-ball/

      Its also interesting that internet polling apparantley did better than telephone polling this time around (i haven’t seen the data but Richard Thaler of U Chicago wrote this in a nyt article).

      What do you think of using these internet based methods of guaging turnout?

    • Froggy

      Can any of you good people out there educate me on LV screens? Some seemed to be one-question queries about whether a respondent intended to vote, perhaps with a range of possible intention strength. Gallup I know used multiple questions about intention to vote, whether the person had voted in the past, and whether the person knew such things as his or her polling place. What more is out there?

      I guess I’m trying to get a better handle on what is meant for a pollster to have been “guessing wrong on the likely voter filter.”

    • wheelers cat

      mediaglyphic.
      i think thats the 21st century version of my nonparametrics professor’s “farmer method”. Distilling information out of google searches and twitter trends.

    • mediaglyphic

      @Froggy,
      we will never know unless we see the totality of the data set. 1) who was called 2) what they said, 3) LV Screen 4) transformation used.

      I think its better to look from the outside in. The totality of what Gallup and Rasmussen were doing was not representative of the final outcome. PPP/Yougov and Ipsos were.

    • Amitabh Lath

      Froggy, yes mediaglyphic is correct. We do not know what the likely voter filters were, and none of the pollsters are going to share.

      Let’s face it, both the “correct” and “incorrect” filters were guesses. Some got it right, and some (looking at you Gallup, Rasmussen) got it wrong. But all are secretive about it.

      This would be an excellent topic for a guest lecture. Sam should get someone who has a good overview of these screens.

      I suspect in the olden golden years, folks like Gallup applied likely voter screens that were pretty close to unity. It was a small tweak, and didn’t change the answer much, so not considered worth bothering about or understanding in depth. Now it’s become a big deal, changing hugely from 2008 to 2010 to 2012. Let’s see how they react.

    • Froggy

      mediaglyphile and Amitabh Lath, thanks for the responses. Actually Gallup released a nice article in October showing in detail the LV screen they use: http://www.gallup.com/poll/111268/how-gallups-likely-voter-models-work.aspx .

      I would guess that this is the most complicated approach of any pollster, since it involves putting respondents through a series of seven questions. Of course now we know that it led to completely inaccurate results, but you can’t say that they didn’t try, at least in terms of putting some effort into the process. (They would have been well advised to hire wheelers cat as a consultant to check the numbers they were getting.)

    • Matt McIrvin

      I think Gallup’s likely-voter screen gets respect simply because it was first: I believe nobody tried to do likely-voter screening before they did. I can imagine that much of what’s going on now that it’s an obvious outlier is just organizational inertia and Not-Invented-Here.

  • Jay Bryant

    It’s clear to me that both the individual polls and the aggregators are useful. Without the polls, the aggregators have nothing to aggregate, while the aggregators reduce margins of error and give anyone paying attention a better picture of the election. Were I running a polling firm, I would carry on as usual, trying to be as accurate as possible, and watch the aggregators as a check on my firm’s work (and just from curiosity, frankly). Trying to denounce the aggregators is counter-productive.

    My first real concern is that the polling firms will start running their own aggregators, too. That would create a great deal of noise. My other real concern is that someone will buy up the existing aggregators, ruining their value by removing their independence.

    • wheelers cat

      NYT bought Silver, and HuffPo bought Blumenthal. They have basically retained their mathematical integrity, although I would say Nate has paid for pandering to the Right at the beginning of this election cycle.
      Like Dr. Wang says, poll aggregation isnt rocket science. His code is on the sidebar. Its open source.

      The inability of the GOP elites and Team Romney to accurately predict the outcome of the election was humiliating for them. Right now they are scapegoating the election results and blaming a whole collection of factors, but the truth is they could have easily used PEC’s model and been dead on. See the Atlantic article I linked above.

      Gallup’s man is scapegoating here as well. Gallup and Rasmussen were 24th and 25th in this election season ranking. Gallup seems to be saying they are going to reject innovation, but online polling was the most accurate this cycle.
      So what does polling look like in 2014, the next election cycle? Will there be flash polling on twitter? Will there be more RANDs? Will there be triple frame sampling, landline, cell, AND internet?
      OFA was boiling over with innovation and creativity this cycle, and the GOP is going to be frantic to copy what they can. A lot of it is proprietary, and a lot of it simply wont work for the GOP base like it worked for the liberal base.
      And there is the rise of social propaganda. Team Obama bought twitter trends like “bindersfullofwomen” and “47percent” to essentially make Romney look unpresidential. Could the GOP elites use social propaganda to help convince the base to say….accept immigration reform?
      A lot of myths got struck down this cycle. Pollsters will change to accommodate that, or they will finish at the bottom of the pack. Humans are much prone to exhibit social pack behavior than social herd behavior.
      I’m not very concerned about pollsters herding/flocking/schooling.

    • 538 Refugee

      “My other real concern is that someone will buy up the existing aggregators, ruining their value by removing their independence.” I started at Pollster which is now with Huffington. I migrated to 538 which is now behind the Times ‘paywall’. I got tired of having to repeatedly toss my cookies to continue on that site. Coincidence? “You don’t have to live like a refugee.” ;)

      I’ve been an ‘open sores’ guy for quite a while. I didn’t know that Sam believed and followed that model with his work until his article but you know what they say about birds and feathers……

  • Ms. Jay Sheckley

    Wow he sounds scared. In a sane world, Gallup would have hired you to make the recommendations above and included this in their essay and plans. Can they _see_ this? And seriously, couldnt Romney’s people see PEC? Good God, it _is_ free. If he has reason to fear, it would be because there’s support now only for truth and for promoted lies, and a traditional [ie increasingly devalued] source of poor info will simply be sold to the GOP/Fox for fundraising, which some think was the main purpose of Romney’s candidacy. Or maybe Gallup is safe, getting the headline with wacky predictions, then goosing the numbers later for the resume’. After all, the attention _you_ got was because news broadcasters could headline your sane aggregate as an outlier forecast! As it grew into the “98% probability” your work finally made for a fine teaser, then saved their dignity when they were able to show you’re obviously a brilliant scholar. It’s time for Gallup to stop lying about who they are and have their predictions delivered by a guy wearing a boot for a hat.
    Ooh my old dad called. He may have just watched my PEC cake movie highlighted here. [ http://www.youtube.com/watch?v=TpLi4__okCs ] My physicist son actually shared it! You guys [Wong, Ferguson, Wheelers Cat et al ] have made my year and more ! :D xxxx Truth is beauty, beauty truth. That is all you know, all Mitt needed to know. Gotta go call Dad.
    PS Gallup is right that poll aggregation can’t work without polls. But that’s poor proof of their own value.

    • Amitabh Lath

      Your son is a physicist? What kind? Theory or experiment? What subfield?

      I am in experimental particle physics, which is why the lack of systematic uncertainties bugs me.

  • Partha Neogy

    “The “gap” difference was….well within the statistical margin of error and underscore[s] the accuracy of random sampling today.”

    This is deeply disappinting. What Gallup and other pollsters are doing isn’t merely presenting raw data (subject to statistical uncertainties), but a likely voter screen (subject to modeling error) as well. The systematic error that you point out is likely the result of a faulty likely voter model. Rather than bristling at poll aggregators, Gallup should be grateful that the aggregators’ work provides them with the opportunity to improve their likely voter model.

  • Robert Waldmann

    In spite of Newport’s nonsense, I think Gallup understands your point in this post. They don’t just poll on voting intentions. One dramatic feature of the 2012 data is that Gallup polled only nationwide Just imagine how hard it would have been to predict anything if state level data had been distorted by the Gallup likely voter filter.

    I think they have responded rationally to the increase in voting intention polls. Their issues polling remains useful.

    Of course their big problem is that they just won’t abandon the likely voter filter which worked fine for decades and has performed terribly in the past two elections. Something has changed — not Gallup’s method but the electorate they are polling. Oddly in 2010 they published results for a looser filter (which were closer to the outcome) then went back to head fully in the sand this year.

    Now on incentives to poll, Newport has a very particular point of view. You will eventually convince journalists to pay less attention to Gallup, but you will also convince them to pay more attention to less famous pollsters which you include in aggregates. A story like “the latest Gallup poll caused Wang’s calculated probability of an Obama victory to fall to 98% from 99%” is a step down for Gallup. Replace “Gallup” with “Gravis Marketing” and the opposite is true.

    I personally am more concerned about your inadvertently encouraging bad pollsters to keep polling than your discouraging them.

    • Ms. Jay Sheckley

      Robert Waldmann wrote:” A story like ‘the latest Gallup poll caused Wang’s calculated probability of an Obama victory to fall to 98% from 99%’ is a step down for Gallup.”

      That made me grin. I’m still grinning. But if Gallop was such an outlier, mayhap it’d be discounted.. Fact is, hyping individual polls has just been shown to be poor news indeed. Gravis’ star is rising and better yet so is Sam’s [and sure, Nate's.]

      It’s hard to disagree with Robert Waldmann’s conclusion: ” I personally am more concerned about your inadvertently encouraging bad pollsters to keep polling than your discouraging them.”

      I don’t believe in heaping scorn on a pollster for being mistaken. But as purveyors of truth, is defending themselves against aggregators what they should be working on? Indeed PEC defends them- as only one poll. Gallup should be trying to be more useful, certainly to PEC, to be more safe themselves. It’s a fractal view of what we once knew that has oddly become the larger political argument: E pluribus unum.

      http://www.youtube.com/watch?v=uxeEDHZeDDI

    • Amitabh Lath

      Robert, if Gallup is aware of problems in the likely voter filter, so much so that the answers change by more than the statistical margin of error, then it is upon them to indicate some sort of uncertainty due to the filter.

  • Olav Grinde

    Dr Wang, on a side note entirely, do you have any thoughts on the recently published analysis of photos of Dr Albert Einstein’s brain?

    (“The cerebral cortex of Albert Einstein: a description and preliminary analysis of unpublished photographs” by Dean Falk, Frederick E. Lepore and Adrianne Noe, Brain: A Journal of Neurology)

    http://www.oxfordjournals.org/our_journals/brainj/press_releases/prpaper.pdf

    • Sam Wang

      Not much to say. It looks interesting and about as well done as possible, considering the weakness of the starting material: external photographs. I question whether we will learn much about Einstein’s abilities this way.

    • Matt McIrvin

      They find that his brain had some extraordinary features, which were plausibly related to cognitive function, but they’re different from the extraordinary features reported in previous papers.

      What this makes me wonder, speaking as a non-specialist with more knowledge of Einstein’s work than of brains, is whether you’d find features of comparable extraordinariness upon sufficiently diligent examination of any random person’s brain.

      Einstein was an extraordinarily gifted theoretical physicist, but at least some of his pivotal position in physics comes from having chosen to study the right things at the right time, which was probably at least partly luck of the draw. (His work on general relativity also suggests an extraordinary tenacity, which didn’t always serve him that well.) I wouldn’t personally expect his brain to be that different from the brain of any reasonably intelligent person.

  • A New Jersey Farmer

    Like most other products these days, polls have become brands with their own unique characteristics. Gallup is tradition, Rasmussen is the Republicans, PPP is the new hipster, the news organizations piggyback on their news brands. These pollsters will continue to spin their brands in every election cycle as the true reflector of the electorate. That’s why aggregators are necessary, not a luxury.

    • Ms. Jay Sheckley

      A New Jersey Farmer wrote: “These pollsters will continue to spin their brands … as the true reflector of the electorate. That’s why aggregators are necessary, not a luxury.”

      I just wanted to repeat that.

  • Maxie jones

    I believe the head of polling Gordon Gallup has announced he is moving to another segment in the organization. Made it sound as if it was his decision.

  • wheelers cat

    This Patrick Ruffini article fills me with equal parts fascination and horror.
    http://www.theatlantic.com/politics/archive/2012/11/the-gop-talent-gap/265333/
    Its not that the right lacks bright techies– its just that the full compliment of within group individuals are working on supercomputers and nano-second trading algorithms. What intellectuals the right can claim are business class, not academic or social.

  • Paul

    “News organizations love outliers.”

    Damn, he just summarized an entire industry in four words.

    • Steve W

      As soon as a polling firm starts aggregating (or vice-verse), it will be hard to trust either output. There will be at least a subconscious tendency, if not an overt process, to reconcile the two outputs before publication.

  • Amitabh Lath

    Wcat: Did Gallup do live calling? In that case landline bias would be mitigated, no?

    In any case, you could have zero bias in the calls you make, and then introduce a large bias by guessing wrongly on your LV screen.

    How are LV screens created anyway? Is it just someone’s guess? How are they justified?

    • wheelers cat

      well…Ed Freeland and Dr. Wang made the point to me that even robopolls (can only call landlines) can be mitigated with dual frame sampling and stratification adjustment.
      I was resistant, but they convinced me. The FCC keeps a record of what phones are cell. they dont get mixed up.
      If you have good stratification adjustment then landline/cell shouldnt matter.
      But how do you tell if non-respondents are likely voters, since you cant ask? Stratification adjustment.
      The pollsters all have their secret sauce for LV screens and they dont share.
      Like Dr. Wang pointed out the most interesting graph in RAND was the shift in intent to vote. Captive population.

      I think…voter suppression tactics resulted in a backlash this time. Unlikely voters voted because they got engaged over voter suppression.
      Maybe…redistricting could could be backlashed in 2014.

    • Some Body

      They do live calling, including cellphones, and have very large sample sizes. They also have a very old technique for their likely voter screen (which includes asking respondents a series of questions, although they might keep the exact procedure confidential; I didn’t see a full specification anywhere, but to be fair, haven’t looked for it too hard either).

    • Sam Wang

      Gallup asks a series of questions, calculates the score, and takes the top X% as likely voters. X can vary. http://www.gallup.com/poll/111268/how-gallups-likely-voter-models-work.aspx

  • Amitabh Lath

    I find it surprising that pollsters do not report any systematic uncertainty due to the likely voter model. Obviously in the case of Gallup, Rasmussen and a few others, the systematic was much bigger than the statistical.

    One way would be to put the raw data and weights array in addition to the top line results, and let people fiddle with the numbers as they wish.

    But I can understand firms not wanting to give away their raw numbers.

    Another way would be to put out “loose”, “medium” and “tight” likely voter screens. Then people can estimate what your systematic is by how it changes.

    Something has to be done. Current practice, assuming that your LV screens are perfect, is obviously not correct. LV screening error >> stat error.

    • wheelers cat

      Dr. Wang and Ed Freeland convinced me that dual frame sampling and stratification adjustment could be rigorous in removing landline bias…if properly used.
      If systematic error >> statistical error can we say that dual frame sampling and SA were NOT properly used?

    • Some Body

      Amit — but how do you measure systematic error *before* the election took place?

    • Amitabh Lath

      SB: yes, one can figure out the systematic AND statistical uncertainties in your expectations (polls) before looking at the data (Nov 6 election).

      For the likely voter screen, instead of calling each poll respondent likely/not likely, you could % likely +- uncertainty. So an older, married guy homeowner who has voted since Nixon could be 99%+-1%, while an 18 year old college student, never voted, could be 20%+-20%.

      Right now the LV screen for these two would be 100% for the former and 0% for the latter, with no uncertainties.

  • wheelers cat

    Gallups tragic flaw was relying on legacy methodology. Just like the GOP, Gallup is past its sell by date in technical methodology.

    from Ruffini–
    “The most pressing and alarming deficit Republican campaigns face is in human capital, not technology. From recruiting Facebook co-founder Chris Hughes in 2008, to Threadless CTO Harper Reed in 2012, Democrats have imported the geek culture of Silicon Valley’s top engineers into their campaigns. This has paid significant dividends for two election cycles running.”

    Gallup needs geeks and cutting edge tech, just like the GOP.

  • E L

    @Sam and Andrew: Thank you for continuing the discussion after the election. Apparently, Gallup consistently overestimates the white vote. Gallup has been around for a long while. Perhaps, their internal demographic skews older and whiter than the voting population and also older and whiter than newer polling organizations and is, therefore, having trouble correcting their model.

  • Bill Dawers

    Excellent post. I would agree with the comment above about the need to own aggregation — it’s a logical step for a major pollster like Gallup to conduct its own polling but then to develop a separate product that aggregates data.

    I think it’s worth adding that Gallup’s problems are probably related to their likely voter model, which implicitly and explicitly measures voter enthusiasm and voting history. About half of registered but “unlikely” voters probably did vote, and some percentage of voters who passed the hurdles to be deemed “likely”, didn’t vote. Gallup was showing huge advantages for Obama among those registered but “unlikely” voters.

    I wrote a little about this with some links on my blog: http://www.billdawers.com/2012/11/13/gallup-defends-its-erratic-presidential-polling-results-paul-ryan-says-results-were-a-shock-on-election-night/

    Thanks for all your fine work.
    Bill

  • Turgid Jacobian

    Sam,
    There is still something there regarding his commons argument. Not a *lot* there, there. But something.
    Tom

  • Some Body

    Playing the role of the sceptic as usual — isn’t there a risk that aggregators (and advice such as your “Watch one another”) will increase the incentive for pollsters to herd together?

  • mediaglyphic

    He is in Step 1 of a 12 step process.

    I think he is correct to be threatened by poll aggregators, just as earnings estimates are less important now than the consensus earnings estimate, individual polls are less important.

    If he is smart he will throw the considerable resources of gallup behind aggregation and produce an aggregation product. Else the value of his product will decline.

    • Sam Wang

      To me that seems to mix up the primary source with the secondary activity. He could, but that is a bit like novelists getting into the reviewing business. Lots of strength at Gallup in primary datagathering – I would think a better path is to build on that.

    • wheelers cat

      Gallup’s problem is legacy polling methodology which spoofed them on the make-up of the electorate. He thought the white vote share was around 79%. It was 72%. He now has incentive to modernize. Online polls were the most accurate, I believe.
      Nate said internal polls conducted by the campaign gave a 6 point advantage to their candidate– in 2008 Team Obama shared internal polls with him apparently.
      I bet this year Team Obama was spot on. They are very secretive, but I bet they used poll aggregation along with all the other “system-magic” that delivered the election.
      Now Rasmussen cheated. But he is already blaming legacy polling methodology and his lack of cell polling.
      After 2012, there is now a market force incentive to deliver accurate data. Because of Sandy Rasmussen couldn’t perform his usual trick of moving gradually back to accuracy– he got caught with his pants down.

    • wheelers cat

      “I bet they used poll aggregation along with all the other “system-magic” that delivered the election.”

      Or maybe they just used PEC as a resource…or aggregated the aggregators.
      ;)

    • mediaglyphic

      Dr. Wang,
      Are you saying that there would be conflicts between owning the primary and the secondary activity, i agree there would definitely be conflicts, and Gallup would need to manage them.

      But if aggregation has more value than any one poll and you have the best brand in aggregation, one had better own aggregation. I am not sure what Gallup;s business model is (who pays them and for what). Gallup’s decision would likely be led by their current business model. If aggregation is taking the value out of polling the current business model might be doomed.

    • Sam Wang

      I believe that pollsters make most of their money in nonpolitical work, and the political races build their brand. So I think their business model is not really in danger.

    • bks

      There is a significant danger of schooling (Linzer called it “herding”) of the polling organizations as a result of this election. None of them want to be in the position that Rasmussen and Gallup are in now (I’m assuming that Suffolk and Susquehanna are just paid marketing/propaganda companies). The strength of the aggregators comes from the independent conscientious survey research companies. If the pollsters start “real-time” integration of the aggregate, there will be nasty surprises ahead.

      –bks

    • wheelers cat

      Dr. Wang, they mostly make revenue from *marketing* research.
      Obama and Romney were essentially just competing brands.

    • Olav Grinde

      Wheeler’s Feline: “Obama and Romney were essentially just competing brands.”

      Yes, indeed. And now Mr Romney has a very unsightly R&R brand where the sun rarely shines. And we can safely say it is a devalued brand!

    • mediaglyphic

      @bks and some body,
      grouping is definitely an issue, but poll aggregators can also identify opinion leaders amongst pollster and these peoples polls will have more value.

  • Strabo

    At least the third election where Gallup is far away from the real result just days before the election. While it would be high time for them to question their methods it won’t happen, as it didn’t after 2008 or 2010. Nor will the media, who will forget this dreadful performance next time around and breathlessly report on the newest Gallup numbers showing down by .

  • pechmerle

    Mr. Newport starts his piece right out with an implicit admission that his organization’s overall effort is misguided:

    “As our tradition has been in presidential election years, Gallup’s focus this year was on producing an estimate of the national popular vote. We don’t “predict” the election, nor do we make estimates of the Electoral College.”

    Indeed — and therefore you don’t tell us what we (and most people) should want to know. Apparently, this is one those “we’ve always done it this way” arguments that is so compelling.

  • 538 Refugee

    In other words, let me justify my paycheck even though we pretty much blew it?