The predictive value of GOP Presidential polls

January 5, 2016 by Sam Wang

The New Year is not a bad time for a fresh start. So please let me acknowledge that back in July, I was too pessimistic about Donald Trump’s chances. Like Harry Enten, I was led astray by his high unfavorables. Six months into the Season of Trump, I think it is time to examine his chances with a more neutral stance.

Two Nates (Silver and Cohn) have come out with essays arguing that we still can’t extract much predictive value from opinion polls. For the detailed kind of analysis they like, this may be true. However, a slightly different approach has suggestive implications about who is likely to be the eventual Republican nominee. (Spoiler: rhymes with Grump.)

First, let’s examine two current attitudes about polls. One is endemic to journalists, the other to data pundits.

1) Focusing on the leader in polls. Journalists and commentators have been losing their minds over the fact that Donald Trump’s lead has lasted since July. (For an antidote, see a sharp and entertaining takedown at Lawyers, Guns, and Money.) In 2015, one way of coping was to say things like “at this point in 2012, Gingrich led nationally.” Certainly this was good for a cheap laugh. However, focusing only on the leader discards all the information that can be learned by examining lower-ranked candidates. But how to do that? This leads to the second problem.

2) Trying to predict vote share. Analysts often focus on a technical question: what will each candidate’s vote share be? That approach uses tools that are common to econometric analysis, involving the prediction of quantitative parameters. For example, Cohn writes about how far off polls will be, on average, from the exact final outcome in New Hampshire.

But let us take a step back. Do you care about whether Trump wins by five points in New Hampshire, or by ten points – or loses by five points? Maybe what you really want to know is: From polling data, do we have information on who the eventual nominee will be?

Since actual election results will deviate from current polls by many points, parametric approaches (i.e. calculating means, medians, standard deviations, regressions, and so on) may be of limited use. Let me take a look at the data to ask a simpler question: what does current polling rank predict about the nominee?

Although Donald Trump’s support might be higher or lower than the numbers indicate, nobody seriously questions the observation that he is in first place nationally. But what does that predict for the nomination?

By this time in the past three Presidential elections, here is a table of how the eventual nominee ranked in national and early-state polls:

YearNomineeNational#1-#2 leadIowaN.H.
2012 (R) Romney#18%#2#1
2008 (R) McCain#11%#4#2
2008 (D) Obama#219%#2#2
2004 (D) Kerry#47%#3(#1)
2000 (D) Gore#120%(#1)(#1)
2000 (R) G.W. Bush#145%(#1)(#1)

For national polls, I show late-December/early-January polls. The “#1-#2 lead” column shows the median difference between the #1 and #2 national candidates. Because the Iowa and New Hampshire elections are four weeks later in 2016 than in past years, in those cases I used data from the first week of December. Finally, where polling data was missing, the nominee’s final election outcome is given in parentheses.

Note that in 2004, John Kerry eventually won the Iowa caucus. However, as late as one week before the caucus, he was polling in third place, which is why he is indicated that way in the table above.

In nearly all cases, the eventual nominee has gotten enough attention and support to finish in the top two. Second place is not a bad spot to be in: in this data set, the eventual nominee was at #4 once, #3 once, #2 four times, and at #1 six times.

Although the amount of data is scanty, it should also be noted that although the Democratic and Republican races in 2000 were nominally open, each had a clear national leader: Al Gore by 20%, and George W. Bush by 45%. Therefore their #1 rankings were highly predictive. In the other races, the national leader was ahead by only 1% to 8%, and the candidate at #2 was slightly more likely to prevail in the end.

Now, look at the 2016 campaign. Here are current standings for Republican candidates who are likely to be invited to the January 14th debate:

Trump #1#2#1
Cruz #2#1#3*
Rubio #3#3#2
Carson #4#4#7
Bush #5#6#6
Christie #6#7#3*

*New Hampshire polls currently show Cruz and Christie within one percentage point of one another.

The only candidate with all #1 and #2 rankings is Donald Trump. Therefore, if 2016 were to follow the pattern of past elections, he would be the most likely nominee. After Trump comes Cruz, followed by Rubio as a long shot. Nobody else fits the pattern.

How commanding is Trump’s advantage? Here is his position relative to past nominees:

YearNomineeNational#1-#2 leadIowaN.H.
2000 (R)G.W. Bush#145%#1*#2**
2016 (D)H. Clinton?#122%#1#2
2000 (D)Gore#120%#1*#1*
2016 (R)Trump?#120%#2#1
2008 (D)Obama#219%#2#2
2012 (R)Romney#18%#2#1
2004 (D)Kerry#47%#3#1**
2008 (R)McCain#11%#2#2

**These values indicate final outcomes.

For comparison I include Hillary Clinton, this year’s overwhelming favorite for the Democratic nomination. This emphasizes the fact that based on polling data, Donald Trump is in as strong a position to get his party’s nomination as Hillary Clinton in 2016, George W. Bush in 2000, or Al Gore in 2000. The one case in which a lead of this size was reversed was the 2008 Democratic nomination, which was very closely fought.

Obviously, polls are not the entire story of the campaign. Unlike past nominees, Trump does not have the national party behind him. In that respect, he is emblematic of the overall weirdness of this year’s GOP primaries.

Other factors are said to influence the nomination process: candidate experience, campaign finance, and party endorsements. These are described in the New York Times feature Who’s Winning the Presidential Campaign? (Here is one entertaining recent discussion over at FiveThirtyEight.) In my view, these factors are likely to matter under normal conditions – until a political party undergoes a major upheaval. That happens about every 40-50 years (see this excellent XKCD explainer graphic). Trump-as-nominee could fairly be seen as such an upheaval. This is one reason to pay attention not just to data pundits, but also to grizzled old historians.


Am I saying that Donald Trump is inevitable? Not quite. However, I do have something to say about another candidate:

Unless Marco Rubio gets the lead out, he is on the edge of serious trouble.

The Republican Party’s state-by-state delegate selection rules penalize candidates who fall below a threshold of support that is often 15% or 20%. In a future post I will examine how this Procrustean rule affects each candidate’s likely delegate total. By simulating the state-by-state rules, I will show that a candidate with Rubio’s current level of support (12-13% nationally, in Iowa, and in New Hampshire) is at risk of having virtually no support by Super Tuesday, a major turning point of the campaign. Stay tuned for a full explanation with graphs.

I thank my readers for commenting on an earlier version of this post, and for correcting an error regarding the 2008 Democratic nomination race.


bks says:

I don’t disagree with any of that, but I’ve followed the process closely since the 1992 election and Trump’s ability to garner front-page headlines, week after week is sui generis. And I don’t recall a GOP nomination process where there was this much vicious infighting.
Interesting idea about Donald denialism here:

Sam Wang says:

Two points to support your contention: (1) His trajectory looks more like a nominee’s than anyone else’s. (2) The delegate selection process is about to impose a cruel fate upon anyone who is stuck below 15-20%.
It is not too hard to do simulations to estimate what the effects are on Rubio, who is basically being squashed down by Trump. If he doesn’t pull it out of the fire in January-February, he might go to the convention with very few delegates.

Amitabh Lath says:

Need error bars. We should look at the difference between #1 and #2 spots. Or the difference/sigma, which can tell you how (un)comfortable the #1 spot is.
In December 2007 Giuliani, Huckabee, Romney were all in the low 20’s and McCain was about 5pts behind. In December 2011 Romney and Gingrich were both in the mid 20’s. In contrast current polling shows a larger gap between Trump and the others.
A 5-10% gap can be systematic effects, or be overcome adiabatically. A 15-20% gap might be more difficult.
And yes, higher order effects like unfavorables, undecideds, etc.

MAT says:

Blue states like California, for once could potentially play an outsized role on the GOP side. 159 delegates are awarded by 3 delegates winner take all in each of the 53 congressional districts. So it’s basically 53 different elections. Hugely expensive media market.

Sam Wang says:

Maybe. Also see David Wasserman, who thinks the primaries carry some advantage for Republican moderates. I think the lower thresholds for support have a larger effect than this.

MAT says:

Interesting article, but it presupposes that the moderate candidates are still in the race. If Rubio or Bush don’t win Florida on March 16th with its winner take all 99 delegates, it’s hard to find a rationale for staying in the race, particularly as it’s possible they will have received very few other delegates up to that point. Right now Trump is at polling 30% to Rubio’s 15% & Bush’s 13% in FL. With both Rubio & Bush splitting the vote in Florida, they are in major trouble. I believe the Establishment will push one of them – probably Bush, to drop out after a poor showing in NH for this very reason.

Amitabh Lath says:

Wasserman’s analysis supposes that Republican primary voters in blue states are more moderate than those in red states. But Trump is winning in places like MA and NJ by large margins just as large as any red state.

Ed says:

A recent poll of California GOPers had Trump #1, Cruz #2 by wide margins. I don’t think the West Coast is going to be the Holy Grail for moderate types either.

Petey says:

Good post, Sam.
“The Republican Party’s state-by-state delegate selection rules penalize candidates who fall below a threshold of support that is often 15% or 20%.”
But it’s much, much more than just state thresholds.
Most of the so-called GOP ‘proportional’ states also allocate half or more their delegates by CD in such a way that makes a mockery out of ‘proportional’.
For example, in ‘proportional’ Minnesota with an incredibly minimal 10% threshold, a candidate who wins 35-25 could easily get well north of 55% of delegates. The second place finisher would pick up decent scraps, and third place could get somewhat hammered, even though above threshold.
And Minnesota is one of the most genuinely proportional of the ‘proportional’ states by far, both in thresholds and CD allocation.
In a far more typical case of ‘proportional’ Texas, a 35-25 win could easily give the winner north of 70% of delegates.

Sam Wang says:

Thank you. (Generally I only accept new posts from people who leave a real email address. Please make a note of that.)
Your point describes only one side of the scenario, and demonstrates why it will be helpful to run an actual simulation. District-level delegate selection gives candidates another chance because levels of support vary from district to district. They can also make things harder, because it typically requires ~33% of the vote to get one delegate out of three. This is why I am converting the rules to code, an unrewarding task.
The Minnesota rules lead to benefits and costs for lower-tier candidates. Given the rounding rule there, if the top three candidates in a Minnesota district get 49%, 17%, and 17% of the vote, the delegate split would be 1-1-1. But a split of 51%, 16%, 16% would lead to a 3-0-0 split. Also, a candidate could easily perform at <10% statewide (getting no delegates) and still get above 16.7% (possibly getting a delegate) in one or more districts.

Stuart Levine says:

Here’s what I don’t fully understand: I think that Bush’s campaign is all but over. However, I think that Trump, Cruz, Rubio, and even Christie still have a shot at the GOP nomination. (Actually, I believe that Christie may be dead since I still suspect that he faces a strong likelihood of being indicted in NJ. It simply depends on whether one of the two individuals currently under indictment “flips” and rolls over on him.)
All that said, I agree that Rubio is at risk of falling off the edge. Why then does he rank highest in the PredictWise pool at 34% versus 26% and 24% for Trump and Cruz, respectively. What’s the disconnect? After all, right now, I would rank the correct order as being Cruz, Trump, and, only then, Rubio.

Sam Wang says:

PredictWise bettors are probably taking into account Rubio’s endorsements, i.e. acceptability to GOP officeholders. Until I drilled into the data, I saw more merit in that view. Rubio could still pull it out of the fire, but we have to ask – how?

Aexia says:

In my experience, betting markets are just aggregates of the *current* conventional wisdom about the race, not anything with actual predictive value.

Josh says:

If I had to guess, I’d say it’s a combination of two things:
1) Rubio enjoys (relatively speaking) a substantial amount of support from the establishment in the form of big-money donors and endorsements. Granted, in a year/cycle like this, those may not mean nearly as much. But other than Jeb, who has the most money in the coffers but not much else going for him, Rubio is clearly the strongest overall establishment candidate at this point. Which means…
2) When the GOP powers that be start pushing people out of the race to combat a potential Trump or even Cruz nomination, Rubio will probably be the last person they push out. The race will look very different when Bush, Christie etc aren’t siphoning off votes from Rubio; he still may not be in the lead but he’ll almost certainly be drawing more than 13% of the vote, which will be enough to start earning him delegates.

Matt McIrvin says:

It’s the “party decides” theory of party nominations. At the moment, Rubio is the most viable candidate with a lot of insider support.
I’m inclined to think that things are different this year, but we’ll see.

bks says:

The 538 crowd is sure Trump will fail:
but I can’t help wondering what their analysis would look like if Bush was polling like Trump.

Josh says:

As I’ve written elsewhere recently–they are emphatically NOT sure he will fail. In the last few months they have priced his odds as low as 5-10% and, now, as high as 20-25%. Again, in a field with a dozen candidates–at least 2-3 of whom could mount a serious challenge–giving Trump a 25% chance to win seems pretty fair.

Petey says:

“PredictWise bettors are probably taking into account Rubio’s endorsements, i.e. acceptability to GOP officeholders. Until I drilled into the data, I saw more merit in that view.”
FWIW, betting markets have an absolutely awful track record during the invisible primary. Easy alpha to be had.
“Rubio could still pull it out of the fire, but we have to ask – how?”
1) Somehow catches lightning in a bottle. Not impossible. But certainly not probable.
2) Trump & Cruz improbably achieve a almost tie race throughout, leaving them both short of 50%, and bringing on the mythical 2nd ballot. At that point, they can’t make a deal, the ‘establishment’ comes in with rewards, incentives, and bribes, and pushes Rubio over. Pretty unlikely for all kinds of reasons, but folks seem to love the idea…

Erik Ebert says:

“For comparison I include Hillary Clinton, this year’s overwhelming favorite for the Democratic nomination.”
Can we see the same analysis on the Democratic side, please? By assuming Hillary is going to be the Democratic nominee, aren’t you falling into exactly the same trap your analysis is trying to address?

Sam Wang says:

It’s the same analysis, except that Hillary has the additional advantage of being acceptable to the Democratic Party.

Erik Ebert says:

Sorry, I didn’t mean to sound troll-y. It just seems to me that the media is just assuming Hillary will be the nominee in the same way they are assuming Trump _won’t_ be the nominee.
If you’ve done the analysis on the Democratic side, I’d love to see your work. I’m genuinely curious.

Kevin says:

A lot of the analysis about “establishment” candidates trips up because it treats the “establishment” as a thing, and not a metaphor whose meaning shifts depending on context. Sometimes it’s a loose way of talking about influential players in Washington–i.e., members of Congress who reportedly hate Ted Cruz–and sometimes it’s just used as a proxy for the pundit’s intuition about who seems acceptable (i.e., not Cruz, Paul, Huckabee, or Santorum, despite having traditional credentials, but possibly Christie, Kasich, or Rubio, despite a lack of evidence that any of the above are particularly well-liked or have a large number of friends among actual people in Washington).
It seems to me that overwhelming incentives of the actual people who make up the “establishment” are to back winners, not losers. Backing a winner is a route to greater influence in the game; backing a loser can have the opposite effect. That means the backing of the “establishment” is a lagging indicator; when it gets behind a candidate early, like Bush/Gore, it is because it thinks it sees a safe bet, not because it is anxious to lend support to an uncertain horse and take a brave stand against someone who might become the nominee. In the end, the nominee will have a lot of establishment support, even if it is Trump or Cruz.
For these reasons, I am skeptical of the outsider vs. establishment narrative, which seems to be little more than an overcomplicated way of saying that there is a lot of public distaste for Trump and Cruz. But if not them, who? And here’s the rub: Rubio is not all he’s cracked up to be by some left-leaning political commentators.

Amit Lath says:

I think the establishment (however defined) is having trouble fighting Trump is because Trump’s biggest argument is a promise to demolish the establishment.
If the establishment could denounce big money, tell SuperPACs to pack it up, promise a robust defense of Social Security, it might have a chance against Trump. But then it would hardly be the establishment anymore, would it?

Jim Matthews says:

Is the #1-#2 lead figure for Obama 2008 — 2% — correct? According to RCP Clinton lead Obama by about 20% in early December, 2007 (a month before the Iowa caucus).

Sam Wang says:

d’Oh! Thanks – good catch.

Petey says:

“Generally I only accept new posts from people who leave a real email address. Please make a note of that.”
Noted. And thank you, Sam. Appreciated.
“This is why I am converting the rules to code, an unrewarding task.”
Hard work, yes. But as best as I can tell, work well worth doing. (Nate Cohn threatens he has a series on this in January. But I assume you can do a better job.)
My analysis is lazy, sloppy, and not rigorous. But I think you’d agree that the extreme lack of proportionality in GOP delegate selection is the story no one is really telling yet.
Even if my analysis is sloppy in the details, again I think you’d agree my general overview still holds. The GOP ‘proportional’ states aren’t really proportional in the aggregate, and even the ‘proportional’ states are only 26% of the delegates. Winners get over-rewarded. In short, if someone can put together a winning streak, even if the margins aren’t overwhelming, the race could be quite quick.
(If anyone is interested in the history, rather than the math, Elaine C. Kamarck’s book is an excellent tracing of why the Dems are proportional, and the GOP isn’t.)
“PredictWise bettors are probably taking into account Rubio’s endorsements, i.e. acceptability to GOP officeholders. Until I drilled into the data, I saw more merit in that view.”
FWIW, betting markets have an absolutely awful track record during the invisible primary. Easy alpha to be had almost every cycle in the last months ahead of the first votes.

Amitabh Lath says:

Betting sites are useless in cases where something like Trump or the SNP is causing historic realignments. Recall that one of the British betting sites was actually paying off on a Labour victory before the election, they were that sure of their prediction engines.

Petey says:

“Betting sites are useless in cases where something like Trump or the SNP is causing historic realignments. Recall that one of the British betting sites was actually paying off on a Labour victory before the election”
Well for the UK 2015 election, that was simply due to universal polling failure, no?
But during the “invisible primary”, the betting markets are just normally atrocious, whether there is polling failure or not. Happens almost every 4 years in the US. And the Corbyn “invisible primary” until it got close to the vote was a roughly analogous example.
I’ve played the betting markets for a while, and by far, the easiest winnings are during the “invisible primary”. Just lots of spin, magical thinking, and bad punditry that dominate the broader discourse over reality. Easy alpha.
As far as “historic realignments” go, betting markets are much better in catching ‘wave’ general elections. Pretty good in 2006 and 2010 Congressional elections, for example. But ‘wave’ may not quite the same as what you’re talking about…

Don S says:

Looking back at past primaries it appears that those polling below third(ish) generally lose share to those in the top three on voting day. Given a broad division between the “mostly establishment” GOP group and the “not business as usual” group (mostly Trump and Cruz) , with only Rubio hitting the top three as a “mostly establishment” candidate, one suspects Rubio leaves Iowa benefiting from that more than Trump or Cruz do. And likely then even more in NH following. What do you think it looks like going forward as an effective three way race Trump Cruz and Rubio? How do you think that would be impacted if Trump performs significantly below polling expectations in both Iowa and NH, failing to outright win either of them, while Rubio and Cruz both exceed polling expectations?
Thanks in advance for your thoughts.

B Temple says:

Excellent quality, genuinely informative / insightful discussion. Thanks Sam and all.

Amitabh Lath says:

‘Tis the season to reassess Trump. I see Ross Douthat of the NYTimes has mentioned Sam (this particular post) in his column today.
Frankly I never understood any of the “Trump is sure to fail” analyses. They involved all sorts of new variables like “unfavorables” and “ceilings” which I never quite got (and I suspect would not be hauled out if Jeb Bush had a similar numbers).
Furthermore, these Trump fail discussions have the whiff of desperate students: they think they know what the answer should be and throw in explanations and equations to make it come out they way it’s supposed to.

Petey says:

“I see Ross Douthat of the NYTimes has mentioned Sam (this particular post) in his column today.”
A nice shoutout. One point of Douthat’s reasoning I strongly quibble with:
“There is no credible scenario in which a consistent 30 percent of the vote will deliver the delegates required to be the Republican nominee”
He may well be correct about 30%. But the reason I keep harping on the lack of proportionality in GOP delegate allocation is that I think Trump could easily grab 1st ballot with 35%, give or take, during March in a 3+ way race where Trump is racking up wins and an overwhelming delegate lead.
(I’m not saying I think Trump is the nominee. I think Cruz is quite plausible, and there are other candidates with obscure bank-shot scenarios. But the “Trump Ceiling” line of reasoning has always struck me as poorly thought out due to delegate allocation.)

Sam Wang says:

Did he really? Wow, unexpected.
In 2009, I hosted him here at Princeton for a panel on the future of conservatism, along with David Frum, Virginia Postrel, and Daniel Larison. Douthat caught the tone of the event and was amused by its premature triumphalism. I went around the table to ask when people thought the GOP would retake Congress. The median answer was 2014. This is a case where the wisdom of the crowd was too pessimistic. Someone said 2010, can’t recall who at the moment.

DJC says:

I can’t argue with the analysis but isn’t it predicated on the fact that this is a “normal” primary ? It may well be that we are in uncharted waters here based on the high negatives of the current front runner and the unprecedented overall weakness of the field. Under these conditions, isn’t it reasonable to assume that things will play out differently compared to previous primaries ? In particular rather than coalescing around a single candidate (whether it’s Trump or someone else) we may well end up with 3 candidates with substantial support but none having an outright majority.

Austin says:

Fantastic writing, I look forward to reading the future posts you speak of.
-Austin Spezia-Shwiff

Sharon Machlis says:

Two important additional factors IMO:
1) There are many candidates in the GOP field who will stay in the race longer than they might have in past years, thanks in part to Citizens United and the ability of one or two billionaires to bankroll a personal favorite. Who will benefit when all the 2nd tiers are gone is unclear, but Trump’s unfavorables suggest he may not be a big beneficiary.
2) I’m not sure we’ve ever had an open primary race where one candidate has gotten such a disproportionate share of media attention. It’s still unclear to me how much of Trump’s support at this juncture is name recognition vs what might happen when the field narrows and there’s less of an attention imbalance between Trump and the rest of his challengers.

538 Refugee says:

I just saw that Clinton and Sanders are ‘statistically’ tied in New Hampshire and Iowa. I hope you have a copy of the original draft close by. 😉

Leave a Reply

Your email address will not be published. Required fields are marked *