Google-Wide Association Studies

April 26th, 2016, 1:00pm by Sam Wang

I will comment on the East Coast primaries at the end of the post. First I will write about something more interesting: Google Correlate!


In human genetics there is a form of analysis called a genome-wide association study (“GWAS”). In this kind of analysis, the researcher looks for bits of DNA that show up more often in people with some trait or disease. Motivations for doing this kind of study include (a) finding genetic variations that contribute to a condition, so they can be studied; and (b) providing a way of estimating the chance that a condition will occur. However, GWAS is full of challenges. One of my research interests is autism. Autism is strongly driven by combinations of genes, yet GWAS has only succeeded in identifying a small fraction of the risk. Many of these bits of DNA have all kinds of other effects (this is a project in my lab…and hey, I’m recruiting!).

The Google Correlate method for political prediction is analogous to GWAS…but better! In this analogy, Google search terms are the “genes.” Thousands (maybe millions) of Google search terms are statistically associated with the frequency at which a state votes for Donald Trump, Ted Cruz, John Kasich, Hillary Clinton, or Bernie Sanders supporters. Some of these terms make intuitive sense; others are mind-bending. [Read more →]

The Circular Firing Squad Re-Forms

April 25th, 2016, 5:00pm by Sam Wang

One day after the announcement of cooperation between Team Cruz and Team Kasich, John Kasich has already gone off script. I question whether this alliance will hold.

For the goal of stopping Trump, avoiding division is important, not just for Indiana’s 57 delegates next Tuesday, but also for California, where Cruz and Kasich are dividing the non-Trump support.

Even if their efforts stick, Cruz and Kasich may be too late. [Read more →]

Who got the better of the Cruz-Kasich deal?

April 24th, 2016, 11:47pm by Sam Wang

The main news this week is probably Tuesday’s primaries, when Trump may come close to sweeping Connecticut, Delaware, Maryland, Pennsylvania, and Rhode Island. But what to make of today’s bargain between Team Kasich and Team Cruz?

I gotta say, this looks like a suboptimal deal for Kasich. [Read more →]

Pennsylvania’s Delegate Rule: Tempest in a Teapot?

April 24th, 2016, 12:00pm by Sam Wang

I am chuffed about the initial match between Indiana polls (Trump +7%) and a “demographics-less” prediction based on border counties (also Trump +7%). A second demographics-less method, based on Google Correlate, also performs very well. In the coming weeks, we will see how these two approaches do.

I have provided predictions separate from polls. I find it confusing to mix up a secret-saucey prediction with hard polling numbers. Unlike other sites’ evaluations [NYT] [538], the approaches presented here are transparent. It is easy to do the predictive calculations yourself, and I hope you will try them out yourself!


As I pointed out in March, Kasich’s win in Ohio was good for Trump because it kept the anti-Trump opposition divided. In Pennsylvania, Donald Trump is polling at a median of 42% (n=4 polls, April 7-18), with a divided opposition (Ted Cruz at 26%, John Kasich at 23.5%). There is little doubt about who will win the popular vote there on Tuesday.

In the overall PEC calculation of expected GOP delegates, Trump should receive all 71 of Pennsylvania’s delegates based on voting. I’ve written about this calculation and assumption before. Today, to expand upon those thoughts… [Read more →]

Two independent ways of predicting GOP primaries lead to highly similar forecasts

April 22nd, 2016, 3:15pm by Sam Wang

Today’s update: Trump median 1285 delegates (IQR: 1239-1320). Probability of a pledged majority is 75%.

Despite the usual complaints, primary polls do reasonably well when aggregated. To understand a state, it is far worse to have no polls at all. As the joke goes, “That restaurant’s food is terrible. And such small portions!”

Unfortunately, we have no public polls for the Republican primary in Indiana (update – just in, we have Trump +6%, very close to both of today’s estimates, Trump +7% and Trump +5%). Indiana is pivotal to whether Donald Trump can get to a majority of pledged delegates. You’d think data pundits would rush to fill this void. But that has not been the case.

For any data pundit, the absence of polling has been a serious problem if the question is anywhere close to a tie. At the New York Times, The Upshot has made a demographics-based effort, but I believe that calculation missed Wisconsin (and lacks details). The Great Argental Satan seems to favor Cruz for Indiana in a fairly vague way. He and his staff make extremely weak use of demographics-based analysis, perhaps appropriately so; as far as I am aware, their approach is not strong enough to repair inaccurate polling (for instance, the Michigan Democratic primary). For better performance, there is a need for a method that uses state-level information that is more specific than general demographic composition.

Which brings us to today’s topic. I will show you two independent methods for estimating Trump/Cruz/Kasich support without demographics or polls. This is a long post.

Bottom line: the two methods agree in all important respects. Trump is favored in the remaining Eastern states, including Indiana. Cruz is favored in the remaining states west of the Mississippi (Washington, Oregon, Nebraska, and South Dakota). The only point of disagreement is New Mexico. Both methods indicate that Trump is on a path to more than 1237 pledged delegates. [Read more →]

SCOTUS upholds Arizona Redistricting Commission!

April 20th, 2016, 10:24am by Sam Wang

Decision’s out (PDF)! It’s unanimous, written by Breyer, favoring the Commission, an outcome for which I had advocated. Unfortunately, nothing about tests for symmetry, which I proposed in the NYT and in an upcoming Stanford Law Review article.

More thoughts…

New York thread

April 19th, 2016, 10:16pm by Sam Wang

Trump heading for at least 90 out of 95 delegates. If his 60% vote share holds up, I’ll guess 92. Talk among yourselves…

Wednesday 7:30am update: for now, it looks like 90 delegates. Trump got below 50% in 3 districts, therefore losing three delegates. Kasich finished first in one district, leaving 1 delegate for Trump there. Overall, the average district vote share was 60.6% for Trump, with a standard deviation of 10.0% – a bit more than my assumption of 9%.

GOP update, pre-New York

April 18th, 2016, 3:59pm by Sam Wang

Tomorrow New York votes. This is a critical race in the Republican primary campaign. Above is a final snapshot, based on polls and voting patterns to date. This calculation gives a median Trump outcome of 1265 pledged delegates (interquartile range or IQR, 1210 to 1305 delegates). The probability of getting 1237 or above is 64%. If polls are accurate, Donald Trump appears to be headed to getting 86 or more of New York’s 95 delegates.

The overall picture represents very little change from last week. Below are some technical notes, as well as state-by-state snapshots. I have updated my methods (details documented here). In the biggest new item, I show how to infer likely voting in states for which there are no polls, without use of any demographic assumptions. Using this method, I handicap Indiana as Trump +7%. [Read more →]

“Momentum” and the Wannabe Physicists at Meet The Press

April 14th, 2016, 3:40pm by Sam Wang

In political reportage, the word “momentum” is nearly worthless. It gets used whenever a candidate wins a race or gets a favorable poll. As far as I can tell, a working definition of “momentum” is “I am excited about a noisy data point and will now give someone a lot of press coverage.” Remember John Dickerson and Ro-mentum? I guess discussing “momentum” levels the media playing field for trailing candidates, which is democratizing and maybe not all bad. Still, I cringe when the word is used.

Single primaries only tell what is special about that state. For example, on March 15th Governor John Kasich won his home state of Ohio. He went up in the polls – until March 22nd, when he lost Arizona and Utah, at which point his polling numbers peaked and started to come down.

Putting aside the incredible spectacle of the national Republican Party’s implosion, their nomination race has been remarkably uneventful in terms of voter sentiment. [Read more →]

Current Polls Favor A Trump Delegate Majority

April 9th, 2016, 4:59pm by Sam Wang

This week in the Republican nomination race, Ted Cruz’s win in Wisconsin triggered buzz about how front-runner Donald Trump might be in trouble. Doubtless today’s win in Colorado will intensify the chatter, and will involve words like “momentum.” It is best to ignore all of that coverage – at least until some national polling data shows a sustained change. Why? Because states differ from one another, mostly in demographics but also in rules and various local factors. It is almost impossible to learn something new from a single race. To know where the race stands as a whole, it is necessary to consider all states at once.

In several ways, Wisconsin was typical. With a pre-election poll median of 36.0 ± 1.5% (median ± estimated SEM), Trump’s vote share of 35% was on the mark, continuing his close match between polls and outcomes. Cruz’s finish was also typical, but for a different reason: he was, and is, outperforming his polls. Cruz’s pre-election polls were 39.0 ± 1.2%, and he ended up with 48% of the vote. In previous states, Cruz has overperformed by a median factor of 1.2. Either Cruz’s supporters are exceptionally committed, or he is the beneficiary of anti-Trump votes liberated from their previous first choices, or undecided voters break hard for him, or some combination of the three. In Wisconsin he may also have benefited from the fact that trailing candidates like Kasich often underperform their polls when it is time to vote.

Where is the national race now? The current 6-national-poll median (March 29-April 6) is Trump 39.5 ± 1.2%, Cruz 31.0  ± 2.1%, Kasich 19.0 ± 1.1%. If we were to apply a 1.2-fold bonus to Cruz’s numbers to allow for his overperformance, the corrected numbers are Trump 39.5%, Cruz 37.2% – extremely close. Either way, Cruz has risen quite a bit in the last month, and national opinion is now closely divided.

I have updated the polls-only snapshot of the remaining Republican primaries through June 7th, when voting ends. As I pointed out months ago in The New Republic and The American Prospect, Republican rules are complex and tilt the playing field toward the front-runner, even if he/she doesn’t get a majority of the popular vote. Therefore it is essential to emulate the state-by-state delegate rules with close attention to quantitative accuracy.

Even after getting the rules right, this is a challenging calculation for three reasons: (a) many states lack polls; (b) Cruz overperforms his polls; and (c) delegates may not follow the rules. Today I describe one way of dealing with all of these issues.

For those who just want the bottom line: Since my last update, a poll-based snapshot has moved – in Trump’s favor. If current polls accurately measure voter behavior, then Donald Trump would get a median of 1,356 delegates – almost 120 more than the 1,237 he needs for a first-ballot victory at the national convention in Cleveland. For this probability to drop to 50%, his national lead would have to drop by 8.0% – this is Trump’s Meta-Margin, a measure I have previously developed for general-election Presidential races. However, if Cruz’s overperformance continues, Trump’s lead would narrow considerably, to a count of 1,280 delegates and a Meta-Margin of 2.0%. After allowing for Cruz’s potential overperformance, the probability of a Trump majority is 70% – probable but uncertain. Under such closely divided conditions, the outcome won’t be known until the last primaries, on June 7th.

And now I will explain at length.

