Princeton Election Consortium

A first draft of electoral history. Since 2004

The Dog That Didn’t Bark (AR-Sen)

July 10th, 2014, 11:39pm by Sam Wang


Today, the NYT’s Nate Cohn speculates about the problem of low-quality polls in Senate races. It’s an interesting piece with lots for poll junkies. However, I am compelled to offer several gentle corrections. My bottom line: polls are better than he implies, especially when they are aggregated properly. And Senator Pryor (D-AR) is probably a little underwater at the moment. Oh, and Democrats aren’t as hosed as you might think.

First, how have Senate polls done in the last two cycles?

Cohn says that “polls have missed the result in three close Senate races in the last two cycles.” That statement is incorrect. The following plot comes from my Election Eve snapshots for close races in 2010 and in 2012, which show poll-based medians. They are plotted against the election results for 2010 and 2012.

Anything in the gray zones is a correct prediction, i.e. the polls and the results favored the same party.

As you can see, polls did well. Here is one metric: we can estimate how many we should have gotten correct by adding up the nominal win probabilities for the 15 leading canddiates. This sum is 13.0. And 13 out of 15 poll-leaders ended up winning. So by that measure, the polls were dead-on.

Looking at it another way, only one race was radically off: Nevada 2010, between Senator Harry Reid and Sharron Angle. Nevada has two factors that made polling difficult: a lot of Hispanic voters and an unusually mobile population. There’s the Colorado race, but that 2-point error is not exactly something to jump up and down about. Overall, I would characterize the result as 13 out of 14 correct, with one push.

At the same time, there is an interesting anomaly: in 14 out of 15 cases, Democrats outperformed opinion polls. I do not know why this is…though in 2008 I pointed out that pollsters tend to underreport margins – in both directions. Anyway, I will not be using this particular fact in my 2014 analysis (which is coming soon). As most readers know, I advocate the use of clean, uncorrected polling data, which has worked so well since 2004. However, I will give you simple tools so you can explore what the effects of a polling bias would be, same as I provided for past Presidential races. But in all the topline estimates I report, I will personally keep off the Unskewing Sauce.

This brings me to the example of Arkansas, which is offered as an example of a problem.

Cohn points out that all the polls showing a Cotton lead are Republican outfits. Counting each outfit once, the median is Cotton leading by 4.0±2.7% (n=7). However, there’s a problem: the Republicans (plus Rasmussen) were the only ones releasing survey data since the beginning of May. All the other data points are at least 9 weeks old.

It is completely plausible that Senator Pryor is behind. In April, his approval/disapproval numbers were 38/46, an abysmal position for an incumbent. Political scientists take the view that a major purpose of campaigns is to drive voters to where they “ought” to be in terms of fundamental factors. The Pryor and Cotton campaigns have certainly worked hard to do that – it’s been lively and nasty. Take a look at this, and this, and this.

Finally, I have a rather basic question. If Pryor is not behind, then where are the neutral and Democratic pollsters? I find it curious that they have not released any data. I think if they had anything to offer, it would have come out by now. As Sherlock Holmes would put it, that’s the dog that didn’t bark. Anyway, this will all get resolved by a few more data points, which I am sure will come soon. Bottom line, Pryor had better get the lead out.

This brings me to back to one of my favorite points. Poll junkies love to sniff over individual polls, offer corrections, and so on. I think it is a waste of time. Historically, poll medians do a great job. At some level, Cohn seems to agree, since he ends up concluding that aggregation is still OK. But hey, it’s July, and he’s got a column to write.

All this aside…if you’re not from Arkansas, what you really want to know is who’s going to take over the Senate. Today’s snapshot is a Democratic retention probability of 55%, median outcome 50 D, 50 R. Boy, that’s close.

Graphs are coming soon – we’re still setting up the rollout. Stay tuned!

Tags: 2012 Election · 2014 Election

7 Comments so far ↓

  • Amitabh Lath

    Just read the exuberance post from 2008. Brought back such memories, like taking Sarah Palin seriously as a candidate. Good times.

    I concur that likely voter weights are a problem. After all if you underweight college students by 20%, who cares? Places like Berkeley, Boulder, Madison, Ann Arbor, Princeton, New Brunswick… it’s hard to get these wrong. and there is no bonus for calling the margin.

    (Of course, this assumes that post-graduation they will all revert to average voting probabilities and get land lines).

    You also floated the idea that supporters of winners are more likely to turn out. That may be the case generally, but didn’t work in the VA7 Republican primary.

    And in MS a group of people turned out who would not have passed even the softest likely voter screen for a Republican primary election/runoff.

    All in all, I think this is a bigger problem than just getting the margin wrong in blowout races. That may have been a leading indicator.

    Your plots force the question: Is there something going on with likelihood of voting? I hate to use the term paradigm shift, but also don’t want to ignore something so obvious.

  • Bill

    I’ve noticed over the past few posts of yours that the probability of Democrats retaining control of the Senate has changed from .67 to now .55. Have you tested to see if this change is a statistically significant trend, or is it most likely random variation about a mean or median showing a better than .50 probability of Democrats retaining a Senate majority?

  • Amitabh Lath

    What a great plot! Whatever the mechanism for the bias, it appears to increase with Democratic majority. What selection criteria went into this plot (ie, what is a “close” race?)

    According to this plot, the effect is statistically significant and persistent over years. At the very least one could formulate a systematic uncertainty by having a hypothesis for what mechanism is causing this bias and looking at the prediction when leaving particular polls that suffer from that out.

    For instance, if it is a cellphone-only household problem as Dr. Science says, then let’s look at pollsters that do cellphones vs ones that don’t.

    You can’t make such a beautiful plot and then just ignore it!

    PS: Book by Mark Haddon is about an autistic boy. Great book, but you probably wanted to quote the Sherlock Holmes short story “Silver Blaze” where the quote came from.

    • Sam Wang

      Thanks, Amit. If you click the Holmes quote, it goes to Silver Blaze. Of course!

      I once wrote about “exuberance of likelier voters.” Google that, or I’ll link here in a bit. [Here it is.]It is a plausible explanation for what we see here. Basically, pollsters optimize their methods to get the sign of the margin correct…but nobody penalizes them for understimating the margin. This might be enough to explain the effect.

  • David Kellogg

    Great post! I wonder if you have seen Josh Marshall’s take on this, which concludes more generally that “this is more an issue with Republican pollsters than Democratic ones” — extrapolating from Arkansas perhaps too much, but still, there you go. http://talkingpointsmemo.com/edblog/is-that-really-what-it-shows

  • Doctor Science

    in 14 out of 15 cases, Democrats outperformed opinion polls. I do not know why this is.

    Both Hispanic voters and cell-phone-only voters are expected to strongly favor Democrats. Isn’t that enough?

    Do you see a similar effect for earlier years (2008, 2006)?

    • Sam Wang

      Not a bad guess…but as far as I am aware, it does not appear in Presidential polling. At least, not nearly so much. So the effect is somewhat fragile. I wouldn’t put it in the bank.

      In 2008 polls were basically 100% correct: see here and here. I should update the graph with that. Adding 2006 would require a bit of digging.