Princeton Election Consortium

Innovations in democracy since 2004

Outcome: Biden 306 EV (D+1.2% from toss-up), Senate 50 D (D+1.0%)
Nov 3 polls: Biden 342 EV (D+5.3%), Senate 50-55 D (D+3.9%), House control D+4.6%
Moneyball states: President AZ NE-2 NV, Senate MT ME AK, Legislatures KS TX NC

Did data journalism lose – or just data pundits?

May 5th, 2016, 10:51pm by Sam Wang

I see in today’s New York Times a column critiquing journalists on their coverage of the Republican primaries. Overall it’s a good piece, but one statement pops out: “data journalists have screwed up this year.” This comment misses an important point. The people who have come under criticism are actually a hybrid of journalist and pundit – which might be the problem.

Data-driven nerds carry the potential to give readers an unvarnished look at politics, free of hype. I think their perceived lack of success over the last six months stems from the fact that they have mixed up two roles a bit: synthesizing what they report (journalism) and stating what they conceptually think should occur (punditry). Let me explain.

In the best-case scenario, data can assist journalism tremendously. But this requires keeping one’s biases out of the analysis. I am no stranger to this problem. In August and September 2014, I failed to fairly evaluate evidence that a Republican wave was coming. This year, I was able to describe Trump’s rise because I paid close attention to my commenters, who objected when I inadvertently cherrypicked the evidence. So first: thank you, commenters!

It was possible to recognize Donald Trump as a serious contender sometime between July 2015, when he might have been just another flash in the pan, in first place but not necessarily the front-runner; and January 2016, when he had lasted long enough that Republican nomination rules tilted the playing field toward him. By early February, his nomination was near-inevitable.

FiveThirtyEight committed to a wrong path with the Trump’s Six Stages of Doom theory last July. In the following months, it has been uncomfortable to watch this statement get walked back so gradually and grudgingly. It appeared to be a case of motivated reasoning, a cognitive process in which evidence, however persuasive, is more likely to be rejected if it is disagreeable. And there is no doubt that they find Trump to be disagreeable. But that is not why readers turn to them.

In addition to motivated reasoning, it is conceptually wrong to treat live questions in political science as if they are settled. (Ezra Klein, I am looking straight at you.) For example, take political scientist Hans Noel’s hypothesis that “The Party Decides.” That idea has taken quite a beating this year. I do not think it was wrong per se – just taken by pundits as a fact rather than a concept to be tested.

Now that we know better what is happening this year, I propose to replace Noel’s catchphrase with the following syllogism framework. Call it “Successful Parties Decide”:

  1. Functioning political parties are in the business of deciding: evaluating candidates, helping pick who will rise to the top.
  2. This year, the national Republican Party didn’t decide.
  3. Therefore, the national Republican Party is not a functioning political party.

There’s your story – the Republican Party is broken. It probably broke slowly, from 1994 to 2014. Data geeks, write about how that happened!

As for Noel’s hypothesis, it is not a bad thing for it to be revised. Any hypothesis or model should be considered provisional until new evidence renders it false. Pundits would be well advised to understand that political science appears to be full of such provisional knowledge. In contrast, polls may get beaten up by demanding consumers, but they are concrete, direct measurements. They are about to become more reliable as we enter the general election campaign. When they disagree with a piece of conventional wisdom, it would be a good idea to pay attention.


There is one more problem, which I think is endemic to all media professionals. Their job is to keep you interested – and keep you coming back. As an amateur, I do not have this problem. Nobody at my institution evaluates me well for getting clicks or page views. If anything, it is the opposite. So if there is no drama, I am okay with my dear readers not coming back.

For example, back in late January I nearly posted a brief essay saying it was all settled on Hillary and Trump, and I was going to mothball the Princeton Election Consortium until summer. Obviously, I didn’t do that.

In contrast, websites like FiveThirtyEight and the New York Times are under pressure to create interest and suspense, even when the outcome is not in doubt. The result is wrong statements like how much Indiana mattered and how Cruz had a chance there. Both statements were false. But we expect our data pundits to be better than cable and television news media, which are polluted with such statements. It is a disappointment when they play the clickbait game.

However, there is a way out. Data-oriented websites have lots of people with high analytical skills. To make their mark in journalism, they have to tell interesting stories with data, but the stories have to be both probable and compelling. Some, like Neil Irwin and Josh Katz at The Upshot, do a great job at this. They and others will have plenty of chances to make good in the months ahead.

Tags: 2016 Election

68 Comments so far ↓

  • Causes and FX

    Another problem, to which both academics and media professionals are susceptible, is the belief that analysis of past events shapes the outcome of future events.

    E.g., if it appears with 60% certainty that Trump is likely to win the nomination, and site X posts that Trump is likely going to win the nomination, this only increases the chances that Trump is going to win the nomination as reader internalize this information (say, to 62%). It contributes to “momentum”/”inevitability” and shapes future narratives.

    By contrast, despite the 60% certainty of a Trump win, suppose that site X posts that Trump is not likely to win, which serves to dampen momentum/remove inevitability/change the narrative (thereby decreasing his chances to, say, 58%). I suspect many of the pundit/data analyst hybrids felt a moral imperative to report data in such a way that they absolutely did not contribute to Trump’s success, and in fact railed against it wherever they could.

    You can’t take the politics out of political analysis!

  • Borderpeaks

    First, thank you Dr. Wang for your excellent website over many years now. My short comment would be my disappointment with the press is that they never point out that Donald Trump has swept his way to the Republican Party nomination because their nominating rules are so undemocratic. This fiasco of a 30-40 percent insurgent stealing the party on the first ballot could never happen to the Democratic Party.

    • Sam Wang

      Thank you!

      Another way to look at your point: Both parties have ways to resolve the 40-percent-insurgent problem. The GOP did it with rules that allow the insurgent to go all the way. We see the consequences now.

      As much as some people like to slam on the Democrats’ superdelegates, they would become invaluable if there were a real three-way race. What if Clinton were stuck below 50% of voter support? A process to adjudicate that would have been necessary. Not necessarily taking a side in that race – just pointing out that those superdelegates represent a relatively mild way for Democratic officials to have some say over their own party’s nominee.

    • emmy

      The Republican Party redesigned its rules for this cycle so that a non-popular establishment candidate could win with 30% . They were hoist on their own petard with Trump’s insurgent campaign turning out to dominate instead of an establishment pick.

      And such a fiasco already hit the Democrats hard enough historically (1968) that it’s one of the reasons the superdelegate system exists.

    • Andrew

      ” This fiasco of a 30-40 percent insurgent stealing the party on the first ballot could never happen to the Democratic Party.”

      Why is it a fiasco? He was only a “37% insurgent” because he was generally running in a 3 to 5 man race since South Carolina, with numbers rising as opponents dropped out. There is no reason to think that in a 2 man race instead of a 3 man race he wouldn’t have consistently gotten over 50% as he strengthened normally as time went on.

      Here are Trump’s percentages by round:
      IA/NHSC/NV – 32.6% (against 9+)
      Super Tuesday – 34.4% (against 5)
      KS/KY/ID/LA/MI/MS etc. – 37.2% (against 5)
      Super Tuesday II – 40.6% (against 5)
      AZ/UT/WI – 36.7% (against 3)
      NY + Super Tuesday III – 57.7% (against 3)

      The whole drawn out narrative of the race occurred because first the race was backloaded in comparison to years before 2012 and also because of the delay between March 15, by which point he was inevitable but the media and his opponents kept living in denial claiming he wasn’t, and April 19/April 26, when he delivered the knock-out by sweeping his home region.

      The power of the media narrative and claims by GOPers in denial in those 5 weeks combined with the intervening drama of Wisconsin created this myth you are offering. If it was any other year and person, the race would have been called on March 15.

    • Andrew

      “My short comment would be my disappointment with the press is that they never point out that Donald Trump has swept his way to the Republican Party nomination because their nominating rules are so undemocratic.”

      I’m curious what system you think the GOP should use? Many of us Republicans find the Democratic Superdelegate system extremely undemocratic, so I don’t think that would fly in our party. You can see that by the reaction of GOP voters to how ND, CO, and WY behaved this year where they failed to hold a popular vote. I think many of us also don’t care for the Democrats purely proportional allocation since it doesn’t allow the winner to actually easily win.

      One possible system that respects how Republicans currently do things would be proportional for the base allocation of state wide delegates, and a 50% winner take all threshold for congressional district delegates, and otherwise a 2-1 split for the leading two if all under 50% or a 1-1-1 split for under 33% for the leading three vote getters in each.

      There are many ways to do it, but what I think many Republicans want is a system that respects the state winner being able to win by gradually consolidating support.

    • Sam Wang

      The GOP rules are well suited to gradually consolidating support, but they also give substantial advantages to whomever finishes first, even without a majority. This is the basis for the “Trump can win with only 30-40%” narrative, to which I have contributed.

      I agree that all the talk about Trump’s “ceiling” may have been overstated, especially as his negatives (among Republicans, anyway) decreased.

      I do not think Democratic Party rules are particularly undemocratic. Basically they are proportional representation, while reserving some say for party officials in an explicit way. Well-functioning political parties need ways of enforcing their will, or else they are not really parties.

      It is an imperfect experiment, but we can think about how one party’s voting patterns would translate into delegates using the other party’s rules. Clinton would have more than a 2:1 delegate advantage. Trump would be leading a divided field with about 40% of delegates, and superdelegates would have to settle the matter in Cleveland, probably between him, Cruz, Kasich, and Rubio.

    • Andrew

      Prof. Wang:

      “I do not think Democratic Party rules are particularly undemocratic. Basically they are proportional representation,”

      The key difference between how Republicans view this and Democrats view this is how they treat the difference between winning with a plurality and winning with a majority.

      Republicans favor allowing a plurality winner to win outright, and Democrats favor making someone with majority support breaking out, and if they fail to do so, putting a thumb on the scale with the Superdelegates.

      I know many Democrats like how their system works, but I am just trying to communicate how Republican voters view the Superdelegate system – we view it as very undemocratic – probably because we have long mistrusted our own party elites and would never want to give them this sort of power.

      Interestingly also, these two approaches have tended to produce different races – Democrat primaries seem to come down to two person contests, while Republican contests frequently feature three or more candidates deeper into the election.

      “reserving some say for party officials in an explicit way. Well-functioning political parties need ways of enforcing their will, or else they are not really parties.”

      This is a great topic for further discussion. On the one hand, this is a very valid point – parties without discipline are not parties. On the other hand, a party needs to listen to its voters or else it will find itself a party without electoral support, which is what happened to Republicans this year (and in 1992 and 2008) and what happened to Democrats in 1968 and 1980.

  • 538 Refugee

    The reason I dismissed Trump initially was I didn’t believe he was a serious candidate. It wasn’t until quite a while later that I read about the steps he had taken to lay the groundwork. But in fairness to me, as of today many in the Republican Party still don’t see him as a serious candidate.

    Given Trump’s unfavorable rating the piece on his ceiling may not have been far off. What he did was to bring in people that initially wouldn’t have been counted as GOP primary voters and leveraged them to raise his ceiling where it counted in a fractured system that didn’t require a positive rating.

    Given the ‘complex’ nature of the GOP primary it was inevitable that people would have to stop and rethink what was happening from time to time. Modeling was tedious at best. Credit to Sam with his model that appears at the top of the page here. Most were predicting a contested convention right up until Cruz dropped.

    • fd2

      “What he did was to bring in people that initially wouldn’t have been counted as GOP primary voters and leveraged them to raise his ceiling where it counted in a fractured system that didn’t require a positive rating.”

      Is there any non-anecdotal evidence of this? Most analysis of Trump voters I’ve seen indicates his base are white, male, and with a median household income around $70k, which would appear to be a demographic pretty highly represented in Republican primaries, historically.

    • 538 Refugee

      Blue collar workers in the rust belt would fall well below the $70K median income where I live. Factory workers around here are on the high end if they make $18 an hour. These are the folks I’ve heard are feeling left behind by the establishment and supporting Trump.

    • Sam Wang

      Click on the Neil Irwin/Josh Katz link. Trump support correlates on a county-by-county basis with residence in mobile homes, lack of high school education, and old-economy jobs such as manufacturing. That matches your statement, but of course leaves lots of room for Trump supporters of various types.

    • fd2

      “These are the folks I’ve heard are feeling left behind by the establishment and supporting Trump.”

      You may have heard that, but statistically, the median household income of the Trump voter is $72k, which is why I asked if there was any non-anecdotal evidence otherwise.

    • Sam Wang

      That comes from an analysis of Republican primary voters, which are not representative of all voters. So actually, your citation is also biased. I note that this is another example of a misleading headline at FiveThirtyEight. Data punditry, yuck.

      At least he shows the data, which allowed me to see the problem. When you can, look to original source data before trusting conclusions.

      Perhaps more appropriate would be a within-group comparison: Trump voters have slightly lower median income than Cruz supporters and a lot lower than Kasich supporters. That is consistent with the economic-anxiety point.

    • fd2

      “That comes from an analysis of Republican primary voters, which are not representative of all voters. ”

      I wasn’t claiming that it was representative of all voters. I was responding to a specific claim –

      “What he did was to bring in people that initially wouldn’t have been counted as GOP primary voters and leveraged them to raise his ceiling where it counted in a fractured system that didn’t require a positive rating.”

      In response, I was noting that Trump primary voters were largely white, male, with a high median household income – which is to my understanding not “people that initially wouldn’t have been counted as GOP primary voters”.

      Had I been making a general claim about Trump support, yes, you are correct that this would be a biased sample, but my claim was specific to his support in the Republican primaries.

    • Andrew

      “You may have heard that, but statistically, the median household income of the Trump voter is $72k, which is why I asked if there was any non-anecdotal evidence otherwise.”

      Perhaps this is how it works, sorry, its anecdotal. I live in PA and work in PA, NY, and NJ. My in-laws live in western PA and work in PA and employ numerous people in PA. All the Republicans in my family and all the Republicans at my place of work support and voted for Trump. I’m an engineer so I and my colleagues all make a comfortable living over $100K and many over $200K income.

      My in-laws are business owners who all make in excess of $200K. They all voted Trump (I would also note they were Perot voters in 1992 – I was not). So we are all helping drag that average way up.

      My in-law’s employees are blue-collar warehouse/trucking types. Many have not voted since Perot in 1992 and before that Reagan in 1980 and 1984. The employees are all making less than $65K in those jobs. Many, I won’t say all, but many of these employees came out to vote this year for Trump. So they are dragging the average back down.

      So there you go. A number of us are typical Republican primary voters, and there is also a number of disaffected “new” not so new voters.

  • Robert Paul Wolff

    Unfortunately, your syllogism is invalid. A = functioning political parties. B = parties in the business of deciding. C = The Republican Party. So the syllogism becomes: All A are B. C is not B. Therefore C is not A. To make it valid, you would need to change the major premise to: All B are A, or Parties that decide are functioning political parties. Then the syllogism is valid: All B are A, C is not B, therefore C is not A.

    • Sam Wang

      I concede this point. However, the proposed wording is not as much fun.

    • Olav Grinde

      Speaking of which, perhaps some investigative journalism is called for:

      Did Ted Cruz drop out of the race after being promised a Supreme Court nomination by Donald Trump? Not to be discounted! ;)

    • Henry Wilton

      This critique isn’t correct. Let’s imagine simpler instances: A = “rabbits”, B = “eat carrots”, C = “Daffy Duck”. So Sam’s syllogism becomes

      “All rabbits eat carrots, Daffy Duck does not eat carrots, therefore Daffy Duck is not a rabbit”

      which is evidently correct. So Sam was quite right to conclude that, since all functioning political parties decide, and the GOP didn’t decide, the GOP isn’t a functioning political party.

      On the other hand, the proposed correction is, in fact, incorrect. This time, let’s take A to be “black birds”, let’s take B to be “crows”, and let’s take C to be “Daffy Duck” again. Then the correction becomes:

      “All crows are black birds, Daffy Duck is not a crow, therefore Daffy Duck is not a black bird.”

      This reasoning is clearly incorrect (just search for images of Daffy if you don’t believe me).

      Formally, this critique amounts to the usual confusion between the contrapositive (“A=>B” is equivalent to “(not B)=> (not A)”) and the inverse (“A=>B” is not equivalent to “(not A)=>(not B)”).

    • Vicente Piedrahita

      I’m with Henry Wilton here. If P then q, not q hence not P. If a party is functioning, it decides. The Republican party didn’t decide, hence the Republican party is not functioning. Syllogism seems right.

    • Peter

      Formally, the argument is that All A are B, Some C are not B (reading C is not B this way, i.e. not All C are B), therefore Some C are not B. This is a valid argument form, AOO-2, or BAROCO.

      So, as some of you pointed out Sam is right (again).

  • Amitabh Lath

    Trump led wire to wire. It was the weirdness of his mannerisms that caused the underestimation.

    First he was going to be like Cain or the other not-Romneys, popular for a few weeks. But he was already well known, so two weeks of scrutiny did not sink him.

    Then he had massive unfavorables, he had a “ceiling”, all these other variables that seem to have been invented just for Trump. Unfavorables especially were supposed to be cast in stone.

    Trump was supposed to fall by October, then by December, then after Iowa, then SC…

    If there is a mitigating factor, it’s that N was large, and there wasn’t one large giant planet and bunch of small rocks. There were multiple bodies of similar importance (although Bush tried to be the Jupiter of this system) which makes calculation of trajectories intractable.

    • Olav Grinde

      In my opinion, two of the greatest disappointments of this primary cycle have been:

      1) The press has, by and large, failed to do its jobs as a fact checker. In past decades, candidates uttering blatant untruths would have generated headlines and articles demolishing their claims. Sadly, fact-checking seems to be a thing of the past – at best relegated to tertiary articles on the level of obscure trivia.

      2) The press has covered the primary fight as a sporting event, with an exaggerated focus on sensational emotionalism.

      Donald Trump has played that brilliantly! Like no other presidential candidate in living memory, he has stoked the emotional fires – irrespective of political correctness, decency, facts, or even his own previous positions on the same issue.

      …and the press has been there to give him headlines every time. He has, essentially, been sucking up all the oxygen, making it virtually impossible for other candidates to broadcast their message. Studies of the number of articles on the various Republican candidates underscore this.

      Or to put it another way: Given their own focus on ratings and readership attention, TV-channels and newspapers an web-based news media have been useful idiots, repeatedly allowing Trump’s sensationalistic bumper-sticker messages to displace real issue-focused journalism and intelligent analyses.

    • Amitabh Lath

      Also “The Party Decides”. Silver et al went nuts over the thesis of this book the way precocious undergrads will over the latest Grand Unification Theory of Everything (ToE).

    • Ed Wittens Cat

      the 3body problem!

    • Amitabh Lath

      Olav, I don’t know what exactly the media’s “job” is, but the internet is available to anyone interested in finding out facts about, say, global warming or trade with China or immigration from Latin America. The assertion that “the media caused Trump’s win” needs validation.

      Did he win because the media showcased him or did the media showcase him because he was winning? Until proven otherwise my hypothesis is that Trump voters are rational actors making their choice from a constrained set of options.

    • Matt McIrvin

      Nate Silver was never my favorite of the political data geeks (that would be our host), but his transformation into a traditional pundit really disappoints me. He was blatantly ignoring what the data were telling him about the Republican primary race for the better part of a year, and coming up with more and more implausible reasons why the numbers were lying. I can understand that happening to network TV guys, but I’m not sure how that happened to him.

  • bks

    When was the last time the political journalists got something right? Vietnam, Iraq, Ukraine, Arab Spring, 2008 financial collapse, the effect of Quantitative Easing (Inflation? What’s that?), Obamacare …

    You could probably achieve Hari Seldon heights merely by taking a position antithetical to the editorial boards of the Times and WaPo.

  • David Cutler


    In many ways I viewed a lot of what happened this year, through your much older lens of “modeler vs. data aggregator.” In many ways, you weren’t much of a modeler this year at all. You just kinda let the data estimate individual chance of voting particular ways, and then added those chances to a model that included almost nothing other than the rules for voting. It was a model, sure (and you integrated over uncertainty), but it was almost the simplest possible model that included the necessary elements (voting rules, and voting likelihood). Importantly there was no strong prior on your model at all. It was, to a first approximation, a frequentist approach (or at least a Bayesian with uninformative prior).

    Nate Silver (and pundits in their own way) had a much more sophisticated model. Their model included all sorts of evidence about “endorsements” and previous party behavior, and stuff like that. Effectively Nate (and pundits) had a huge prior on their model, and that prior was “Trump has an infinitesimal chance. ” The data kept saying that Trump / Clinton were incredibly likely to win, but the PRIOR said Clinton wins, Trump loses, and they never let the data outweigh their prior.


    • Sam Wang

      That is a good point. Plus you have smoked me out as the frequentist that I am. Or aspire to be, anyway.

      It was a lot of work to model (or “not-model”) the delegate rules, but I am glad I did it.

    • Amitabh Lath

      Trust data above your gut.

    • Bill Herschel

      “And like the temperature on a given day, voter preferences usually change slowly enough that past history can give us an idea of where things are likely to head within an election season. We can get a deeper sense of the interaction between polls and final results by looking at how the last 16 presidential elections unfolded.”

      How does all this become frequentist? I am asking seriously, because I don’t know.

  • Rob in CT

    “the Republican Party is broken. It probably broke slowly, from 1994 to 2014. Data geeks, write about how that happened!”

    I won’t claim to be a data geek, but the short short answer is:

    Newt Gingrich.

    • bks

      Exactly. The last three candidates in the race with Mitt Romney were Gingrich, Santorum and Ron Paul and that’s after whackjobs like Cain and Bachmann had been early frontrunners.

    • Brian

      Newt was a powerful renewing force within the Republican party in 1994 and held a reforming spirit together for a few years. With Clinton, he delivered a boom and balanced the budget. Then he left once it was clear his star had faded in 1998.

      The disaster that broke the Republican Party was Dubya. Look at the video of Trump telling the truth in South Carolina’s debate about 9-11 and Iraq. A party has to be able to repudiate a total failure at the essential tasks of government if it ever wants to govern again.

    • Rob in CT

      Brian: I was specifically referring to the type of “renewal” Gingrich brought. It was Newt that set out quite deliberately and specifically (there’s a memo) to demonize and delegitimize Democrats. Now I’ll grant “the government is the problem” dates back to Reagan and so you could go there, but Gingrich took it up a notch:

      The language Gingrich suggested in his memo was used by GOP politicians, and it was probably already common by 1995 on talk radio (maybe that’s where Newt got the idea).

      So you had both elected officials and thought leaders using more and more inflammatory terminology about Democrats and government in general. Why would GOP rank and file be open to compromise? Compromise with corrupt failures, destructive bringers of moral decay, etc?

      This has consequences. One does not compromise with demons. One does not get into government believing that government is fundamentally awful and then govern well. The seeds of the Dubya years were sown earlier (arguably much earlier, but Gingrich deserves special mention).

  • Brian Tucker-HIll

    Excellent analysis. I’ll just emphasize a couple points I have been thinking about:

    (1) It really is important to treat assertions about political processes as hypotheses to be tested, not as axioms. It should be obvious there is no guarantee that political processes will remain the same over time, and so we should always be trying to use data to understand what the process looks like now, rather than assuming it must look the same as it looked in the past. Analysis of the past may help guide that inquiry, but it shouldn’t rule it;

    (2) And as applied to Trump, there was so much being tested, and therefore so much to learn! Sam has wittily summarized what this says about the Republican Party, but I think with the proper attitude we could have been learning so much all along–about the nature of the constituencies in the Republican Party, about their primary process, and so on.

    This isn’t a new point, but as many have suggested, there was an awful lot we could learn here, which meant those who assumed they already knew all the relevant answers not only got it wrong, but missed an opportunity to do some very interesting work.

  • Amitabh Lath

    Game theory question: if you are an ambivalent Republican party official right now how do you decide to endorse Trump or not? If he wins in November, then the earlier and louder your pledge of fealty the better. But otherwise? What are the coefficients for that decision tree?

    • Josh

      It seems like there’s no incentive not to hedge right now a la Paul Ryan. As things heat up–probably around the convention or thereafter–party officials will have to get off the fence.

    • Joseph

      Mr. Ryan is attempting to assert some control over a prototypical loose cannon. He’s threatening Mr. Trump with pulling his endorsement unless Mr. Trump follows the party line.

      Will it work? Depends. Is it a bluff? Trump is wily and can probably sniff out a bluff. Yes, he wants to have a “trophy Presidency”, but he’s already gotten a trophy Presidential nomination, so that may satisfy him. In which case it won’t matter if Mr. Ryan is bluffing or not.

      Frankly, if I were in Mr. Ryan’s shoes right now, I wouldn’t be bluffing. I’d be prepared to pull the plug on any hope of a Republican Presidency for eight years rather than put Mr. Trump in office. A loose cannon is one thing. A loose hydrogen bomb is something else altogether.

    • Amitabh Lath

      Ryan lies on the spectrum from Christie on one end, Ayotte and Haley somewhere in the middle, and Flake et al at the other. Ryan’s first derivative is non-zero, moving towards acceptance.

      Assuming they all made rational decisions, what is the long-term strategy? For Christie and others on the Trump bus it’s obvious: early endorsers win in a Trump administration (high weight given to R win in Nov). But if you believe Trump is less than a slam dunk, what is the process? How do you model the post-Trump Republican party and maximize your effectiveness?

    • 538 Refugee

      There could be some theatrics involved here. Don’t be surprised if they aren’t all arm-in-arm, with tears in their eyes, signing “Kumbaya” at the convention. They need a convention boost like no other in the history of the republic at this point.

    • Josh

      Why do you assume Paul Ryan is moving toward acceptance of Trump? This seems like a pretty big assumption. What’s your evidence to support it?

      Your reading of this seems backward to me. Trump has the support of a majority of GOP primary voters and is, in many ways, explicitly distancing himself from the party’s establishment. Paul Ryan is the embodiment of that establishment. If anybody needs the acceptance of the other, it’s Ryan needing Trump’s approval, not the other way around.

      If I’m Paul Ryan, I’m doing the math and realizing that I’m kind of damned if I do, damned if I don’t at the moment. If Ryan supports Trump and Trump bombs, Ryan’s ambitions go out the window for awhile. If he comes out against Trump, he’s going against the will of a majority of GOP voters–not a good look for the highest-ranking GOP pol in the country.

    • Josh

      ETA: Trump has not yet won a majority of GOP primary votes, but by the end of the primaries, this will probably be the case.

    • Joseph

      @ Josh:
      “ETA: Trump has not yet won a majority of GOP primary votes, but by the end of the primaries, this will probably be the case.”
      Of course, but that’s to be expected when everyone else has dropped out and people will either (a) vote for him, (b) make a protest vote, or (c) just not show up.

      But you raise an interesting question; will it be possible to determine how many Republicans will fall into the “c” category? My guess is – a lot. And that could affect a lot of down-ticket issues and races.

      This alone could be a watershed moment for liberals and progressives, especially with Senator Sanders still drawing in the left wing.

    • Brian


      Don’t forget that Ryan has a primary of his own. He’s already guaranteed a fully funded challenger with his hedging.

      In the presidential, 83% of Republicans in WI-1 voted for candidates that oppose Ryan’s long standing stridently harsh positions on trade and mass immigration. A large majority oppose his plans to slash Social Security. About 85% of Republican voters in polls want to unify behind nominee Trump, so Ryan has gone against them, too.

      Ryan is in serious danger of losing and will probably soon fall in line with Trump in hopes of keeping his seat in the House at all. If Ryan outright opposes Trump, Ryan will be a private citizen before the inauguration.

  • Sam Minter

    You say “By early February, his nomination was near-inevitable.” I think even based on your own analysis this is an exaggeration.

    By early February it was clear that he would almost certainly have the plurality of delegates, that is very true, but as late as your April 18th update you were giving him a 64% chance of getting to 1237, which means a 36% chance of him not getting there and a potential convention.

    That dropped to 25% on your April 22nd update, then 6% on April 27th. And then of course down from there. But it was only at that point… after the “Northeast Primaries” where your own odds of a contested convention became negligible. (Although I believe the analysis of Indiana and California was just as important in that shift as the primary results in the Northeast if I remember correctly.)

    Before that anybody saying we were definitely heading to a contested convention was of course off base, but it would also have been an exaggeration to say that Trump was heading to a certain outright win, since a 25% or 36% shot at the contested scenario was still a real possibility… even though it was less likely than the direct win.

    The way I put it during the weeks before the Northeast Primaries were that we were still on the edge between an outright Trump win and the contested scenarios. It seems like anything where the Trump 1237 odds were between maybe 25% and 75% still reasonably could be interpreted as “could go either way”.

    Trump was a favorite in your analysis the whole time of course, it just seems “near-inevitable” is overstating it a bit. Unless you also were just giving him high odds of coming out of a contested convention the nominee anyway.

    (For the record, at one point pre-New York my gut feel estimate, not based on models, went as high as a 60% chance of the contested convention scenario, which was probably buying into the hype a bit, which is unfortunate, but I never got to the “this is what is going to happen for sure” zone, it was always in the “could go either way” zone.)

    • Sam Wang

      No, it is not an exaggeration. You are mixing up different kinds of calculation.

      In the first, I examined patterns of early-state data to learn characteristics of eventual nominees. This was the basis of my February post.

      In the second, I did an explicit calculation of the expected number of delegates. I agree that this was less certain.

      Using the delegate calculation to estimate a nomination probability would get into questions like whether GOP insiders will end up going along with a plurality winner. I cannot think of how to do that without hauling in a ton of assumptions about human interaction. I don’t think I am the right person to do that.

  • anonymous

    So, what would you do differently if you could go back to 2014? If I recall correctly, you used a purely poll-based approach without introducing additional priors then as well (citing the experience in 2004 as an argument against priors). It was just that the polls had a systematic bias in 2014. Perhaps the ingenious nearby-county method or Google Correlate methods might have helped, but could you have done any better with the tools that you had at hand then?

    • Sam Wang

      Good question. I think that polling errors were only a minor problem in the end, creating a few percentage points of error in the last few weeks of the campaign. However, it is an error, and pollsters would kill to know how to fix it.

      The bigger problem was my own danged fault: I assumed symmetric random drift from August/September onward, which is mostly true…in Presidential years. Basically I think what is needed is a version of the Wlezien/Erikson data set for off-year elections. I hesitate to using such data to construct a prior…but it would be necessary for making a good prediction. Drew Linzer did well in 2014, and one could look into how he did it. Linzer says he got lucky, but I suspect he is just being modest.

    • Matt McIrvin

      Interestingly, Linzer is now insisting that it’s way too early to call Clinton a strong favorite over Trump, and citing fundamentals models favoring Republicans for this year.

    • Sam Wang

      Rather than rely on Twitter comments, everyone take a look at this article by Lauderdale and Linzer. They get specific: a fundamentals-based model (i.e. any not involving polls) should have an uncertainty of +/-7% in national party vote share (95% CI). Which is pretty impressive if true. Note that this does not apply to all models, only theirs. One would have to then evaluate which ones would be likely to meet this criterion.

      Wlezien sent me his data, and I am currently working on it. I estimate that poll-based national vote share uncertainty is currently about +/-12% (95% CI). My web post is incorrect on this particular point, but I would rather have another full essay before getting into it. The poll-based CI only becomes equal or less than the fundamentals-based CI starting at 50 days before the election.

      Considering that both fundamentals- and polls-based probabilities are so uncertain, it might be better to calculate in terms of two-party vote margins instead. Less chance for reader misinterpretation. Also, I think it would be best to report results of polls and models separately, and then estimate their combined estimate separately.

    • Matt McIrvin

      This year is such an extreme case. We have a situation in which I think most political scientists would agree the fundamentals models favor Republicans, if perhaps not as strongly as in 2012; but most analysts’ strong prior about the specific Republican candidate would also be that he can’t win. Then again, that was their strong prior about the primary race too, which he won handily… but this time the early polls are also against him. And then there are the demographics- and map-based analyses, which suggest an uphill battle for Trump as well.

      Unfortunately there are already enough variables that it’s still hard to take it as a test case about personalities vs. fundamentals.

  • Amitabh Lath

    John Cassidy at the New Yorker contrasts Sam favorably to Nate Silver. He specifically mentions the PEC’s early January statement about Trump’s high probability to seize the nomination. Silver does not come off well.

    As I recall the one uncertainty was if Trump’s support was fictitious. Were his polls sampling hot air? After Iowa that got laid to rest and the rest followed predictably.

  • Lorem

    Sam, on a related note, can we talk about the Cruz-outperforming-polls bonus that you briefly used? With the benefit of hindsight, do you believe that it was real? (“Real” in the sense that if we made a new world and reran some time segment of the primaries again from scratch, applying it would lead to better predictions than not applying it.)

    If so, do you think you should have included it earlier or alternatively ignored it despite it being real? And do we have any good theories about what made it appear and then go away?

    If not, then what was that all just variance?

    • Sam Wang

      The Cruz bonus bothered me: it was ad hoc, and to tell the truth I didn’t like adding it to the calculation.

      Here is my current guess. Cruz’s overperformance was especially large in March, a period when his national polling numbers were rising steeply, at a rate of about 0.6% per day. In retrospect I wonder if the state polls were just stale. At that rate of national change, a 10-day-old poll would be off by 6 percentage points. This explanation has the advantage of accounting for why the bonus disappeared in late April.

      One way to address this would be to adjust older state-level polls according to movement in national opinion. That might have given the same result without being as arbitrary as a “Cruz bonus.”

    • Lorem

      Oh, that does sound quite plausible, and has the nice feature that no non-poll assumptions are required.

  • Some Body

    Joining the discussion belatedly…
    Let me paraphrase Kant: conceptual frameworks without data are empty; data without a conceptual framework is blind. You need some fraework to evaluate and interpret data, and there’s no way around that.

    In that sense, I don’t see anything really wrong with what 538 did. They had a theory (itself the result of overcorrection from 2012, when Silver took Herman Cain’s candidacy very seriously based on polling alone). They tried to read and evaluate the data through this theory. Eventually they realized they couldn’t, and the theory has to be revised. The turning point was around January (while the “stages of doom” story could have held on for much longer—even up to last week, when the final obstacle Silver listed proved immaterial). Others were quicker to adapt, or had a better theory to begin with, but they did eventually drop the theory in the face of evidence. That’s much better than many established scientists manage in their fields…

  • Olav Grinde

    Amit, I most certainly hope you are not making the point that the job of the press might exclude fact-checking! ;)

    For at least in my mind, that has always bee one of the vital tasks of the press in a free, open and democratic society. I firmly believe that the educated reader should not be compelled to do Internet searches merely because our so-called news media has ceased to strive for accurate, fact-based reporting!

    Just to clarify, I am not implying that the press caused Trump to win. I am saying that the press by and large failed to do its job, and that Donald Trump expertly exploited the priorities of today’s news media – priorities that I believe are woefully inadequate and reprehensible.

    Meanwhile, here is a very interesting analysis of the press coverage we can come to expect in the months preceding the general election. It’s by David Roberts of Vox (not to be confused with Fox):

  • AySz88

    I was a bit puzzled by the intro to this post agreeing with the NYT critique – I suppose I have a different interpretation of it. I thought it was encouraging readers to discount data and precedent entirely when it conflicts with “typical” journalism – for example, “nothing exceeds the value of shoe-leather reporting”, and the suggestion to ignore polls today and go “talk to some voters”.

    That attitude seems even *more* vulnerable to cherry-picking anecdotal evidence to build one’s preferred narrative. (I am vaguely reminded of “Bernie Math” or “unskewing”.)

    So I see the article as in diametric opposition to looking at the data in and of itself, as Sam did.

  • Ed Wittens Cat

    If the press is supposed to be a regulatory body for information then Trump captured them as surely as cartels capture economic regulatory bodies.
    capitalism, si!

  • J. R. Mole

    Never mind … someone did publish a rigorous examination of 538’s Trump coverage:

    Frankly, I find your analysis of the data very interesting and worthwhile. I find the persona of guardian of intellectual honesty … transparently human, but no less grating for being so.

    Please, lay off the poorly-informed critiques of 538. They give the impression of trying to latch on to a more highly-trafficked site — just the sort of “clickbait” you decry — and produce comment sections that, especially after moderation, lead one to wonder whether “EC” actually stands for “Echo Chamber”.

    You can and do do better.

    • Total

      Er…Nate Silver (in the link you posted) seems to agree with Dr. Wang’s “poorly-informed critique.”

    • J. R. Mole


      Sorry I wasn’t clearer.

      They both use the term “data punditry”, so yes, there is some form of agreement there. However, I don’t think one could fairly say that Nate is agreeing with Sam’s picture of walking back “gradually and grudgingly”, much less engaging in “clickbait”. As with the famous Six Stages of Doom piece, the reality is much more nuanced than the caricature.

      In reality, Nate was openly struggling over the course of the primaries with how much weight to give to which conflicting signals, while trying to be clear about the uncertainty involved and the potential for various pitfalls. Go back and read the actual 538 coverage. I’ll wait.

      Writing this off as “clickbait” is ill-informed, to pick a phase. “Sour grapes” also comes to mind. Sam is not at his best when trying to play the role of the only honest data scientist in town.

      Paying attention only to polls is not an inherently more or less rigorous approach than paying attention to polls, endorsements, demographics and such or paying attention to polls, Google correlates and such, or any of a number of other approaches. The question is what works best, and this question remains decidedly open.

      Nor is it immoral to give a “gut feel” percentage if you’re open about it and what you’re basing it on. Nate argues that it would have been better, and more in keeping with the mission of 538, to have used a statistical model with a uniform prior as a sanity check. Fair point. I’m not sure Sam would agree with the Bayesian approach here, but perhaps he can chime in.

      To borrow a phrase, there is much more that unites sites like 538 and PEC than divides them. Both are making an honest attempt to make falsifiable predictions about a complex domain and to measure the results. Both are trying to put solid data behind their work. To see Sam not only playing into the notion that PEC is somehow in a different league (besides in traffic), but actively promoting it, is discouraging.

      By the way, I believe Sam’s doctorate is in neuroscience, not political science, in which case it seems more appropriate to drop the honorific in this context — though a significant number of commenters here seem to disagree.

    • Sam Wang

      Clickbait refers to everyone who is injecting false uncertainty.

      I personally feel that my professional qualifications are relevant in this work.

      If we’re going to get into priors, 6-8, and whatever, it would be more honest to examine my January analysis head-on.

      There’s a new post on this topic. See main page.

Leave a Comment