What head-to-head election polls tell us about November

May 1st, 2016, 3:08pm by Sam Wang

General-election matchup polls (e.g. Clinton v. Trump) started to become informative in February. In May, they tell us quite a lot – and give a way to estimate the probability of a Hillary Clinton victory. [Read more →]

Indiana may not matter any more

April 28th, 2016, 9:00am by Sam Wang

Media types want you to get your knickers in a twist about Indiana. However, the data suggests that it doesn’t matter any more. Rationally speaking, it is probably time to stop writing so much about the Republican race for delegates. Also, may we have a moratorium on “brokered-convention” articles please?

Today I write about the PEC delegate snapshot. It is based on data posted here. All polls are current, including Trump +6% in Indiana (n=3 polls). Based on Tuesday’s voting, in which Cruz underperformed polls by a median of 4 percentage points, I will no longer assign a Cruz bonus. Note that Trump overperformed polls by a median of 8 percentage points.

As of today, for recently-unpolled states (NE,WV,OR,WA,MT,NM,SD) I will start using Google Correlate-based estimates. Of those states, Trump is favored in West Virginia (34 delegates) and is near-tied in Oregon and Washington (proportional representation). The rest are Cruz states.

Put through the PEC delegate simulator, the median delegate count is 1333 (interquartile range 1304-1339). The probability of getting to 1237 delegates is 98%:

What if we assume that Trump will lose Indiana? In that case the median drops to 1284 delegates (interquartile range 1278-1287). The probability of getting to 1237 is now 97%:

The 1% change in probability is inconsequential. The main effect of forcing a Cruz win in Indiana is to reduce uncertainty in the delegate count, which you can see in the narrowing of the historgram.

Close states (Oregon, Washington, and New Mexico) happen to use proportional rules, so they contribute very little uncertainty. Winner-take-all or nearly-winner-take-all (i.e. district-level rule) states are either strong Cruz (Nebraska, Montana, and South Dakota) or strong Trump (West Virginia, California, and New Jersey).

Most of the remaining uncertainty comes from district-level races in California. With California polls showing Trump +18% (Google Correlate says Trump +31%), it will take a highly coordinated effort by Cruz and Kasich to pick up many of its 53 districts. They would use geographic information like this Sextant Strategies survey to guide their efforts. At the moment, the likeliest outcome is for Trump to get at least 160 out of 172 delegates in the Golden State.

Trump on a glide path (since mid-March)

April 27th, 2016, 9:19am by Sam Wang

The race has been stable for weeks, varying only by factors that are local to each state. Last night’s voting confirmed that – there was nothing new revealed. In terms of voter sentiment, the GOP race has been essentially unchanged since March.

How do we know this? Two reasons. The first is that national polls have been stable for four weeks, since March 22. The second is the remarkable success of a predictive method based on Google Correlate, which relies solely on past voting and web search patterns – and does not use polls or demographics at all. Here is how PEC and N.‘s Google Correlate method did (click to enlarge): [Read more →]

East Coast primaries – open thread

April 26th, 2016, 8:30pm by Sam Wang

Based on polls and border counties (see summary), I expect Donald Trump to get over 85% of all the delegates to be voted upon today. I estimate that he will gain about 150 delegates (43 of these are district-level Pennsylvania delegates, whose rate of faithfulness I estimate to be 0.8). Trump’s total number of delegates might exceed 1000 tonight. [live results at HuffPost]

More calculations from N. after the jump. [

Google-Wide Association Studies

April 26th, 2016, 1:00pm by Sam Wang

I will comment on the East Coast primaries at the end of the post. First I will write about something more interesting: Google Correlate!


In human genetics there is a form of analysis called a genome-wide association study (“GWAS”). In this kind of analysis, the researcher looks for bits of DNA that show up more often in people with some trait or disease. Motivations for doing this kind of study include (a) finding genetic variations that contribute to a condition, so they can be studied; and (b) providing a way of estimating the chance that a condition will occur. However, GWAS is full of challenges. One of my research interests is autism. Autism is strongly driven by combinations of genes, yet GWAS has only succeeded in identifying a small fraction of the risk. Many of these bits of DNA have all kinds of other effects (this is a project in my lab…and hey, I’m recruiting!).

The Google Correlate method for political prediction is analogous to GWAS…but better! In this analogy, Google search terms are the “genes.” Thousands (maybe millions) of Google search terms are statistically associated with the frequency at which a state votes for Donald Trump, Ted Cruz, John Kasich, Hillary Clinton, or Bernie Sanders supporters. Some of these terms make intuitive sense; others are mind-bending. [Read more →]

The Circular Firing Squad Re-Forms

April 25th, 2016, 5:00pm by Sam Wang

One day after the announcement of cooperation between Team Cruz and Team Kasich, John Kasich has already gone off script. I question whether this alliance will hold.

For the goal of stopping Trump, avoiding division is important, not just for Indiana’s 57 delegates next Tuesday, but also for California, where Cruz and Kasich are dividing the non-Trump support.

Even if their efforts stick, Cruz and Kasich may be too late. [Read more →]

Who got the better of the Cruz-Kasich deal?

April 24th, 2016, 11:47pm by Sam Wang

The main news this week is probably Tuesday’s primaries, when Trump may come close to sweeping Connecticut, Delaware, Maryland, Pennsylvania, and Rhode Island. But what to make of today’s bargain between Team Kasich and Team Cruz?

I gotta say, this looks like a suboptimal deal for Kasich. [Read more →]

Pennsylvania’s Delegate Rule: Tempest in a Teapot?

April 24th, 2016, 12:00pm by Sam Wang

I am chuffed about the initial match between Indiana polls (Trump +7%) and a “demographics-less” prediction based on border counties (also Trump +7%). A second demographics-less method, based on Google Correlate, also performs very well. In the coming weeks, we will see how these two approaches do.

I have provided predictions separate from polls. I find it confusing to mix up a secret-saucey prediction with hard polling numbers. Unlike other sites’ evaluations [NYT] [538], the approaches presented here are transparent. It is easy to do the predictive calculations yourself, and I hope you will try them out yourself!


As I pointed out in March, Kasich’s win in Ohio was good for Trump because it kept the anti-Trump opposition divided. In Pennsylvania, Donald Trump is polling at a median of 42% (n=4 polls, April 7-18), with a divided opposition (Ted Cruz at 26%, John Kasich at 23.5%). There is little doubt about who will win the popular vote there on Tuesday.

In the overall PEC calculation of expected GOP delegates, Trump should receive all 71 of Pennsylvania’s delegates based on voting. I’ve written about this calculation and assumption before. Today, to expand upon those thoughts… [Read more →]

Two independent ways of predicting GOP primaries lead to highly similar forecasts

April 22nd, 2016, 3:15pm by Sam Wang

Today’s update: Trump median 1285 delegates (IQR: 1239-1320). Probability of a pledged majority is 75%.

Despite the usual complaints, primary polls do reasonably well when aggregated. To understand a state, it is far worse to have no polls at all. As the joke goes, “That restaurant’s food is terrible. And such small portions!”

Unfortunately, we have no public polls for the Republican primary in Indiana (update – just in, we have Trump +6%, very close to both of today’s estimates, Trump +7% and Trump +5%). Indiana is pivotal to whether Donald Trump can get to a majority of pledged delegates. You’d think data pundits would rush to fill this void. But that has not been the case.

For any data pundit, the absence of polling has been a serious problem if the question is anywhere close to a tie. At the New York Times, The Upshot has made a demographics-based effort, but I believe that calculation missed Wisconsin (and lacks details). The Great Argental Satan seems to favor Cruz for Indiana in a fairly vague way. He and his staff make extremely weak use of demographics-based analysis, perhaps appropriately so; as far as I am aware, their approach is not strong enough to repair inaccurate polling (for instance, the Michigan Democratic primary). For better performance, there is a need for a method that uses state-level information that is more specific than general demographic composition.

Which brings us to today’s topic. I will show you two independent methods for estimating Trump/Cruz/Kasich support without demographics or polls. This is a long post.

Bottom line: the two methods agree in all important respects. Trump is favored in the remaining Eastern states, including Indiana. Cruz is favored in the remaining states west of the Mississippi (Washington, Oregon, Nebraska, and South Dakota). The only point of disagreement is New Mexico. Both methods indicate that Trump is on a path to more than 1237 pledged delegates. [Read more →]

SCOTUS upholds Arizona Redistricting Commission!

April 20th, 2016, 10:24am by Sam Wang

Decision’s out (PDF)! It’s unanimous, written by Breyer, favoring the Commission, an outcome for which I had advocated. Unfortunately, nothing about tests for symmetry, which I proposed in the NYT and in an upcoming Stanford Law Review article.

More thoughts… [

