The predictive value of GOP Presidential polls

January 5, 2016 by Sam Wang

Tweet
The New Year is not a bad time for a fresh start. So please let me acknowledge that back in July, I was too pessimistic about Donald Trump’s chances. Like Harry Enten, I was led astray by his high unfavorables. Six months into the Season of Trump, I think it is time to examine his chances with a more neutral stance.

Two Nates (Silver and Cohn) have come out with essays arguing that we still can’t extract much predictive value from opinion polls. For the detailed kind of analysis they like, this may be true. However, a slightly different approach has suggestive implications about who is likely to be the eventual Republican nominee. (Spoiler: rhymes with Grump.)

First, let’s examine two current attitudes about polls. One is endemic to journalists, the other to data pundits.

1) Focusing on the leader in polls. Journalists and commentators have been losing their minds over the fact that Donald Trump’s lead has lasted since July. (For an antidote, see a sharp and entertaining takedown at Lawyers, Guns, and Money.) In 2015, one way of coping was to say things like “at this point in 2012, Gingrich led nationally.” Certainly this was good for a cheap laugh. However, focusing only on the leader discards all the information that can be learned by examining lower-ranked candidates. But how to do that? This leads to the second problem.

2) Trying to predict vote share. Analysts often focus on a technical question: what will each candidate’s vote share be? That approach uses tools that are common to econometric analysis, involving the prediction of quantitative parameters. For example, Cohn writes about how far off polls will be, on average, from the exact final outcome in New Hampshire.

But let us take a step back. Do you care about whether Trump wins by five points in New Hampshire, or by ten points – or loses by five points? Maybe what you really want to know is: From polling data, do we have information on who the eventual nominee will be?

Since actual election results will deviate from current polls by many points, parametric approaches (i.e. calculating means, medians, standard deviations, regressions, and so on) may be of limited use. Let me take a look at the data to ask a simpler question: what does current polling rank predict about the nominee?

Although Donald Trump’s support might be higher or lower than the numbers indicate, nobody seriously questions the observation that he is in first place nationally. But what does that predict for the nomination?

By this time in the past three Presidential elections, here is a table of how the eventual nominee ranked in national and early-state polls:

Year	Nominee	National	#1-#2 lead	Iowa	N.H.
2012 (R)	Romney	#1	8%	#2	#1
2008 (R)	McCain	#1	1%	#4	#2
2008 (D)	Obama	#2	19%	#2	#2
2004 (D)	Kerry	#4	7%	#3	(#1)
2000 (D)	Gore	#1	20%	(#1)	(#1)
2000 (R)	G.W. Bush	#1	45%	(#1)	(#1)

For national polls, I show late-December/early-January polls. The “#1-#2 lead” column shows the median difference between the #1 and #2 national candidates. Because the Iowa and New Hampshire elections are four weeks later in 2016 than in past years, in those cases I used data from the first week of December. Finally, where polling data was missing, the nominee’s final election outcome is given in parentheses.

Note that in 2004, John Kerry eventually won the Iowa caucus. However, as late as one week before the caucus, he was polling in third place, which is why he is indicated that way in the table above.

In nearly all cases, the eventual nominee has gotten enough attention and support to finish in the top two. Second place is not a bad spot to be in: in this data set, the eventual nominee was at #4 once, #3 once, #2 four times, and at #1 six times.

Although the amount of data is scanty, it should also be noted that although the Democratic and Republican races in 2000 were nominally open, each had a clear national leader: Al Gore by 20%, and George W. Bush by 45%. Therefore their #1 rankings were highly predictive. In the other races, the national leader was ahead by only 1% to 8%, and the candidate at #2 was slightly more likely to prevail in the end.

Now, look at the 2016 campaign. Here are current standings for Republican candidates who are likely to be invited to the January 14th debate:

Candidate	National	Iowa	N.H.
Trump	#1	#2	#1
Cruz	#2	#1	#3*
Rubio	#3	#3	#2
Carson	#4	#4	#7
Bush	#5	#6	#6
Christie	#6	#7	#3*

*New Hampshire polls currently show Cruz and Christie within one percentage point of one another.

The only candidate with all #1 and #2 rankings is Donald Trump. Therefore, if 2016 were to follow the pattern of past elections, he would be the most likely nominee. After Trump comes Cruz, followed by Rubio as a long shot. Nobody else fits the pattern.

How commanding is Trump’s advantage? Here is his position relative to past nominees:

Year	Nominee	National	#1-#2 lead	Iowa	N.H.
2000 (R)	G.W. Bush	#1	45%	#1*	#2**
2016 (D)	H. Clinton?	#1	22%	#1	#2
2000 (D)	Gore	#1	20%	#1*	#1*
2016 (R)	Trump?	#1	20%	#2	#1
2008 (D)	Obama	#2	19%	#2	#2
2012 (R)	Romney	#1	8%	#2	#1
2004 (D)	Kerry	#4	7%	#3	#1**
2008 (R)	McCain	#1	1%	#2	#2

**These values indicate final outcomes.

For comparison I include Hillary Clinton, this year’s overwhelming favorite for the Democratic nomination. This emphasizes the fact that based on polling data, Donald Trump is in as strong a position to get his party’s nomination as Hillary Clinton in 2016, George W. Bush in 2000, or Al Gore in 2000. The one case in which a lead of this size was reversed was the 2008 Democratic nomination, which was very closely fought.

Obviously, polls are not the entire story of the campaign. Unlike past nominees, Trump does not have the national party behind him. In that respect, he is emblematic of the overall weirdness of this year’s GOP primaries.

Other factors are said to influence the nomination process: candidate experience, campaign finance, and party endorsements. These are described in the New York Times feature Who’s Winning the Presidential Campaign? (Here is one entertaining recent discussion over at FiveThirtyEight.) In my view, these factors are likely to matter under normal conditions – until a political party undergoes a major upheaval. That happens about every 40-50 years (see this excellent XKCD explainer graphic). Trump-as-nominee could fairly be seen as such an upheaval. This is one reason to pay attention not just to data pundits, but also to grizzled old historians.

>>>

Am I saying that Donald Trump is inevitable? Not quite. However, I do have something to say about another candidate:

Unless Marco Rubio gets the lead out, he is on the edge of serious trouble.

The Republican Party’s state-by-state delegate selection rules penalize candidates who fall below a threshold of support that is often 15% or 20%. In a future post I will examine how this Procrustean rule affects each candidate’s likely delegate total. By simulating the state-by-state rules, I will show that a candidate with Rubio’s current level of support (12-13% nationally, in Iowa, and in New Hampshire) is at risk of having virtually no support by Super Tuesday, a major turning point of the campaign. Stay tuned for a full explanation with graphs.

I thank my readers for commenting on an earlier version of this post, and for correcting an error regarding the 2008 Democratic nomination race.

Topics:

37 Comments

bks says:

The predictive value of GOP Presidential polls

37 Comments

Leave a Reply Cancel reply

Related Content