Who are these likely voters, anyway?

September 18, 2012 by Sam Wang

After Labor Day, most pollsters start to apply “likely voter screens,” in which they attempt to identify respondents who are not just registered to vote, but who will actually schlep to the polls (or vote by mail) in the election. Many of you have asked what is in these screens, and whether to be suspicious of the methods.

My general position is that because of the Wisdom-of-Crowds-of-Pollsters principle, I don’t think it’s worth delving into. That wisdom works very well in on-year elections such as 2004 and 2008, as you can see by exploring the left sidebar. It works a tiny bit less well in off-year elections such as 2010, when media saturation is lower and voter motivation is harder to measure. My view is that it is fully effective to take the likely-voter numbers at face value and calculate medians.

However, I recognize that you want more. Rather than re-invent the wheel, I turn to some classic essays by former polling professional Mark Blumenthal, now of Pollster.com. In 2004 as the Mystery Pollster, he wrote a series of seven essays that cover most of what there is to say. In 2008 he wrote another good essay.

(The advent of wireless phones does not really change anything except for the price of doing a poll. If you have any comment along those lines, a better place is yesterday’s essay by Ed Freeland.)

I do not recommend reading anyone’s commentary from this year on the subject of likely voters. Most writers are trying to argue away Obama’s lead. This is a case of motivated reasoning. I did this with John Kerry’s numbers in 2004, to my regret. What I wrote then is useful as an object lesson in what not to do. It is better to rise above the heat of battle.


Ralph Reinhold says:

As I said the other day, the one poll who called me is remarkable only because the poll reported ‘likely voters’ but I could not recall any questions that had anything to do with whether I was a likely voter. Also, since I was taking a sociology class that had been discussing polling at the time, we discussed being polled in class. Then, we discussed the ‘likely voter.’
I would hope that Mark Blumenthal covers how they determine it as well as how useful it is.
I believe that the Walker recall in Wisconsin and the Alabama Lottery vote are two examples where the totals reflected the results, but the likely voters were skewed in the other direction.
In the former, it was a toss up among likely voters but it was a fairly good route in the election. It was slightly out of the margin of error, if I remember right.
In the latter, the poll showed a safe margin for passage, but it was a rout in the other direction.

Ralph Reinhold says:

Modeling ‘hang ups’ is an iffy proposition. I have quite a bit of apocryphal evidence over the years of people say that they had hung up because ‘it is a secret ballot and it is none of their business’ and little ‘the guy called and I’m not voting, so it was a waste of my time’.
If the model is that they represents the people who answer, then choosing ‘likely voters’ by a statistical profile is fine. But in either other case, the results can be skewed. That may have been what happened in my examples.

Matt McIrvin says:

At the end of the summer, right around the time that national polls started to apply likely-voter screens (but using data from before the conventions), some of the polls showed Obama suffering from a gigantic enthusiasm gap, large even by typical Democratic standards. People jumped on that as Romney’s big opening.
I think that’s the proximate cause of much of the talk about this. If you look at the recent polls, though, the gap is gone.

James Moore says:

Out of curiosity, why does wireless affect the cost of a poll? On the actual making-the-call side of things, it’s not a factor as far as I know.
OTOH, the actual costs of actually making the calls for a poll should be really, really low. For literally pennies an hour, amazon will rent you machines that can do hundreds of simultaneous calls. The software is free but will require some expertise. The calls themselves these days average less than a penny a minute in the US.
Thinking about it, I’m surprised there aren’t a lot more automated polls happening. If you need 1000 phone minutes, that’s about $10. Polls probably need some multiple of that, so figure $50. Call it $5 for AWS machines and bandwith (that’s probably way high).
Is it expensive to get the lists of phone numbers? At these rates, seems like you might be OK with just random dialing.

WA to MD says:

Ed Freeland’s article from yesterday addressed this. According to Freeland, federal law prohibits the use of robo-callers on wireless lines; due to this, wireless lines have to be called be actual people which ups the pollster’s costs.

Outloud FLL says:

Hi James,
Apparently regulations prohibit auto dialers to cell phones. I didn’t know this till yesterday when I read Sam’s blog post on cell phone sampling.
Thanks, Sam!

James Moore says:

Needing a person to kick off the poll doesn’t really add that much money, but it definitely adds software. I’d get it done with Mechanical Turk, so figure $3/hour for humans to ask a scripted “are you willing to answer an automated poll” question when the users answer. Let’s say that adds another 5000 minutes, so 5000/60*3 = $250 – and that’s assuming 100% of your calls need a human to start the call. Still sounds dirt cheap.

Sam Wang says:

As I wrote the other day, a typical cost to get 700 respondents using a live-operator survey is $15,000. I do not think that is cheap.

James Moore says:

I should stress that I’m not talking about actual live-person-asking-the-questions stuff here; I’m just trying to figure out costs for something that I’d call operator-assisted; have a person on the line long enough to satisfy the no-robocall requirement.
I know zero about the actual polling side of things, but I do know about the software/telephony side of things. If someone handed me a pile of phone numbers, a flowchart of the questions/answers and audio files of any audio prompts, and a schedule (we need to call these numbers only between 7-9pm in their time zone, etc), my back of the envelope cost would be:
Phone minutes: 1 cent /minute. (billing increments are usually 6second)
Person – minutes: 10c/minute (figuring you’re paying about $3/hour for Mechanical Turk staffing, but there’s inefficiency there, since you’ll need to make sure humans are available at the right points in the calls)
And that gets you to a few hundred dollars for just the phone call portion of a 700 person poll.
I suspect I’m ignoring important things though. My guess is that the vast majority of the budget is for setup, not for the nuts and bolts of doing the survey. It’s the professional expertise of figuring out who to call and what to ask them.
But I do suspect that there’s an old, really expensive way of doing the actual surveys, and a new startup-style hotness that’s much less expensive. So much less expensive that I’d think it’s easily within the budget for a student group project, where the “professional expertise” bit has zero dollar cost.
(Although now I’m curious about the business of polling – maybe there’s a startup opportunity here. The telephony world is still a very strange place, with lots of places where old-fashioned companies still make money with archaic technology and cost structures.)

Sam Champion says:

Are you simply saying this?
Take the median of any poll that gives both RV and LV numbers. Then take an average of all the medians from all polls and that one gives the best sense of where any race is?

Peter D says:

SC: He’s saying use the LV number in your analysis when available.

Howard says:

If I were a pollster deliberately trying to sway the election in favor of my candidate, here’s what I would do: in September and October I would bias my results in favor of my candidate (for example by oversampling my party), in order to create a bandwagon effect. Then on the last poll I would try to get the best most accurate results, in order to bolster my reputation (for the next election) as the pollster who “came the closest”.
We’ve seen reports of which pollsters “came the closest” with their final 2008 poll to the 2008 results. But I am not aware of any studies that show which pollster came closest to the actual outcome with his (her) Sept 15 poll or Oct 1 poll. Is this kind of retrospective easy to do with your available data?

Leave a Reply

Your email address will not be published. Required fields are marked *