As we have done since 2004, we are taking a polls-only approach to give a daily snapshot of the race – as well as a November prediction. This approach has an effective precision of a few tenths of a percentage point of public opinion, and performs very well as both a tracker and a forecast. Currently, the probability of a Hillary Clinton victory in November is 85 percent, based on polls alone.

Today, I give a brief tour of the computational approach.

The Meta-Analysis starts with a Python script that downloads recent state polls from the Huffington Post’s Pollster operation. Thanks to Natalie Jackson, the HuffPollster team, and dozens of pollsters for this stream of information, which forms the foundation of the calculation.

Where polls are not available, we use the election result from 2012. As I have written, this year’s Clinton-versus-Trump state polls are strongly correlated with 2012′s Obama-versus-Romney polls. Because no realignment is evident, past results are a good predictor of the likely outcome this year. At the moment, no more than fourteen states are genuinely in play.

State polls are converted to a win probability by calculating (a) the poll median and (b) the confidence with which this median is known, as measured using the estimated standard error of the mean (SEM). Because the estimated SEM is calculated from polling data, it therefore includes pollster-to-pollster “house effects” within it, as part of the variation. These numbers are passed to a MATLAB program for the rest of the calculation.

These two numbers are converted to a win probability using the t-distribution, a method that allows for the possibility of outlier events. Those probabilities are diagrammed above in the cartogram, whose areas are proportional to each state’s electoral votes (EV). Note that the cartogram is principally for display purposes, and its EV totals reflect only one combination of all possible outcomes, which number in the quadrillions.

To get a full snapshot of every possible outcome, these probabilities are compounded to generate a probability distribution of all possible outcomes:

The central dark blue bars represent the 95% confidence interval. The tails are plotted in green.

At nearly all times, this snapshot shows a near-certain win for one candidate or the other. Because this probability is usually greater than 99%, we do not report it.

Instead, in the banner at the top of this website, we report the probability of a final election win. That quantity includes the possibility of movement between the time the polls were taken and Election Day, in November.

**The Meta-Margin and November predictions**

A key parameter in PEC’s calculation is the Meta-Margin. This is the amount by which the two-candidate margin in state polls would have to change in order to create a perfect electoral tie. For example, today the Meta-Margin is Clinton +3.86%. This means that Donald Trump’s election odds would be perfectly even if he picked up a net of about 3.9 percentage points among currently-undecided voters – or if about 1.9 percentage points of Clinton supporters switched sides..

Because the Meta-Margin is in units of percentage, it is a nice quantity to work with. If we have expectations about how much the Meta-Margin may move, we can predict what should happen in November.

To estimate how much future movement may occur, we make two calculations of probability, both of which appear in the banner at the top of this website:

- “random drift”: This calculation assumes random drift in either direction by an amount that matches past patterns in polling from 1952 to 2012;
- “Bayesian”: In this approach, the up-and-down variation in the Clinton-Trump polling margin so far in 2016 is used to establish a “prior”, i.e. the expected range of future movement. Variation to date is used estimate the midpoint of the likely range of futurevalues. This variation is then estimated conservatively to vary by approximately 7 percentage points, in order to not constrain the November possibilities too much. assumed range is more than twice as broad as the observed variation in 2004-2012, which were very stable compared to elections since 1952. Commentators say anything can happen, so bring it on.

At the moment, we are using national polls to set the prior. Once we have enough Meta-Margin history, we will use that.

The black curves indicate the median of the electoral vote estimator (top graph) and the Popular Vote Meta-Margin (bottom). For the EV estimator, the gray shaded region indicates the calculated 95% confidence interval. This confidence interval includes sampling error, variation in biases among pollsters, and changes in opinion during the period when the polls were taken. Because pollster biases tend to cancel one another on average, the true 95% confidence interval is smaller, typically less than +/-10 EV.

The graph’s calculations are explained **here**. Briefly, the red and yellow zones show a prediction range that combines random drift from current polls with a Bayesian prior. This Bayesian prior is calculated from the assumption that the average Clinton-Trump margin in national polls since January gives the center of the likely range of election outcomes. The prior has a Gaussian range with a sigma of +/-7%, consistent with 1952-2012 but larger than the amount of movement in the 2004-2012 election cycles. In other words, the prior is set to allow anything reasonable to happen.

The red zone is a “strike zone” showing the 68% confidence interval of probable outcomes. The yellow zone is a “watch zone” that shows a combination of the 95% random-movement confidence interval and the 95% gray-zone confidence interval. The November outcome is nearly certain to be within this range.

]]>Continuing statistical malpractice at @thehill. Clinton-over-Trump median is holding fairly steady around 5-7%. https://t.co/VoIKMiSkDZ

— Sam Wang (@SamWangPhD) June 26, 2016

On most news days this month, there has been some pointless story about a single poll. Journalists’ instincts to report on the exceptional event are totally inappropriate for following polls, where the median result is the one that is most likely to be true. After 12 years of poll aggregation, wouldn’t their profession have adopted better practices by now ? Anyway, Clinton has been up by 5 to 7 percentage points all month. There is nothing else to say about that. Also, we are starting to get state polls, which will fill in the picture considerably.

Meantime, this is of at least equal significance for November:

]]>The website, gerrymander.princeton.edu, implements three tests for partisan gerrymandering as described in an article I published last week in the *Stanford Law Review*. These proposed standards recently won a prize in Common Cause’s 2016 contest to define a partisan gerrymandering standard. **The website is in beta-test, and I welcome your comments.** If you detect a problem, email the output PDF if possible.

My three standards have two key features: (1) they implement the principle of partisan asymmetry, as others have also recently done; and (2) they do so without the use of any consideration of maps.

The second point is quite important. Most people who get exercised at the offense of gerrymandering may gravitate toward examination of a district’s convoluted boundaries. Although this is perfectly reasonable, existing precedents and consequences of the Voting Rights Act have conspired to make consideration of boundaries a tough sell with courts – at least for statewide partisan gerrymandering. Let me explain.

The word “gerrymandering” encompasses two kinds of offense.

First, an **individual district** can be drawn to give an overwhelming advantage to one party or candidate. To paraphrase legal scholar George Berman, this consists of a legislator choosing his/her voters, and not the other way around. The Supreme Court has said that districts should be “compact.” However, they have also allowed that compactness could be geographic…or community-based. This means that a strange shape is permissible. In addition, Section 2 of the Voting Rights Act (as well as the late Section 5, which was killed by the Shelby County v. Holder decision) mandated the creation of districts that empowered specific communities of interest such as blacks or Hispanics. Since the 1970s, district boundaries have gotten more complicated, probably because of this mandate. So an argument that a district’s boundaries are convoluted can be countered with the defense that it was necessary and/or legally permissible.

The second kind of gerrymander is partisan: the construction of **an entire statewide scheme**, composed of single-district gerrymanders, that gives a net overall advantage to one party. Here, the consideration of a single district’s boundaries is again ambiguous. Why? Because any individual district can be part of a statewide scheme that is neutral overall, or helps one party. For example, if all districts in a state were drawn to be 60-40 for either party, the overall effect would not give either side an advantage. But a party could pack most of its opponents into a tiny number of 80-20 districts, and get more 60-40 districts for itself. A given 60-40 district could potentially look the same in both schemes. So an argument based on the properties of single districts may not be logically watertight.

I therefore suggest that any standard for diagnosing partisan gerrymandering should consider all districts in a state as a whole. Apprehending lots of data at once in a single measure is the *raison d’*ê*tre* for the field of statistics.

The online app calculates three statistical quantities:

**The presence of lopsided wins for one party but not the other.**This calculation uses a two-sample*t*-test, perhaps the most widely used statistical test in the world.**The construction of consistent wins for one party.**This can be measured two ways: by the mean-median difference to detect overall skewness, or a chi-squared test to test whether one party’s wins are too consistent to have occurred by chance.**The number of seats that a party has gained in excess of what would arise naturally, given national population-clustering patterns.**This is done by calculating 1,000,000 hypothetical “fantasy delegations” to see what would arise if redistricters were not seeking a systematic statewide advantage.

The first two tests can be done pretty easily; one objective of gerrymander.princeton.edu makes it fairly painless to calculate them. In addition, the website gives you the ability to calculate the third test at the press of a button. As it turns out, a map-less approach to calculating fantasy delegations can be done extremely fast. One million delegations can be simulated in well under a minute.

If you want to consider specific district boundaries, there has been lots of effort in that direction. One example is a random-map-drawing approach taken by Jowei Chen and John Rodden. Another is the first-place winner in this year’s Common Cause contest, by Wendy Tam Cho and Yan Y. Liu. These approaches go into great geographic detail, and are complementary to my proposal. However, they have the disadvantage that a judge would have difficulty applying them without the help of an expert witness. My hope is to place a tool into the hands of a judge that he or she could apply directly – even by jotting it in the margin of a brief.

I hope one or more of these standards will find a receptive audience at the Supreme Court in the near future. No matter whose standard is adopted, it will be a substantial improvement over the current limbo. You can read more about that limbo in my article.

]]>Absolutely brilliant poll on Brexit by @YouGov pic.twitter.com/EPevG1MOAW

— Tancredi Palmeri (@tancredipalmeri) June 23, 2016

The UK voters who dominated the vote to Leave are also the ones who have to live with the outcome for the least amount of time.

And then there is this fascinating essay in Dissent magazine, which describes two Englands: elite England centered almost entirely in London, and excluded England composed of everyone else. Excluded England includes working classes, poor areas, former industrial districts – regions and classes that have not partaken in the reinvigoration that has been promised as part of membership in the European Union. All in all, they sound rather a lot like the pro-Trump wing of the Republican Party.

Finally, Paul Krugman ponders the aftermath of Brexit in a fairly non-panicked manner. He suggests that the problems in the European Union were there all along, and this vote changes nothing. He does suggest that the vote is pretty bad for Britain in the long run. They wanted to revive Britain; what they may get is a revived England (and probably Wales), severed from Scotland and Northern Ireland.

]]>HuffPollster reports about 9% of respondents still remain Undecided, enough to swing the outcome either way. Where multiple polls were available from one pollster, the direction of change was YouGov 0.5% toward Remain, NBC/SurveyMonkey 4% toward Remain, and Survation 3% toward Leave. This is ambiguous.

Big referenda like this can contain hidden strains of opinion. For example, in 2014 the Scotland independence referendum failed by 6% more than indicated by polls. That was a situation of some voters being little-c conservative, in the sense of avoiding drastic change. Naively, I think such a dynamic would favor the Remain side…but we will see. Could still go the other way.

The Scotland failure led to stengthening of the SNP, with echoes felt in the UK today. Even if Remain wins, what will be the consequences of today’s vote?

**Update, 8:54pm: ***“Leave” is doing better than expected in many constituencies.* Follow the results at the Guardian’s liveblog and tracker.

When Swansea reported, Leave became the favrite on Betfair for the first time tonight.

— Phil Kerpen (@kerpen) June 24, 2016

*Friday morning: I had a bit of trouble with updates last night, so couldn’t post this. An excellent projection was done by Chris Hanretty of the University of East Anglia. His interpolation made it clear by about 9:00pm Eastern (2:00am UK) last night that Leave would win.*

Starting this summer, I am on the lookout for a new partner in running PEC, as Mark Tengi moves on from Princeton. Ideally, the person is part of the Princeton community (student or otherwise) and conversant or willing in Python, WordPress backend, and Linux. Write me!

]]>(Math note: To make this plot I used multidimensional scaling, in which I plotted three of the elections to get the graph started, then added more elections one at a time, re-optimizing the graph each time. My collaborators and I have used this approach before to analyze how brain architecture has changed over the course of evolution. The scripts and data are here. Okay, back to the politics.)

Changes in partisan alignment do not coincide with transitions in power. 1976 to 1980 (Carter to Reagan), 1988 to 1992 (Bush I to Clinton I), and 2004 to 2008 (Bush II to Obama) were all momentous transitions, but in no case did the political map rearrange itself. On the other hand, 1960 to 1964 (Johnson’s re-election) showed a massive upheaval in which Democrats lost their previously strong grip on the South. As I wrote the other day, this change took three more elections to play out, culminating in 1972, the year of Nixon’s re-election and his famous use of the Southern strategy. I would therefore call 1960 to 1972 the culmination of a first great modern realignment of the parties.

This map shows a perhaps equally great second shift, starting in 1976 and indicated by the red path. Each step was small, but over time it added up. Comparison of the 1976 and 2012 electoral maps (see below) shows that during this time, two things happened: (a) the South became securely Republican (think of Gore losing Tennessee in 2000), and (b) Democrats captured Western states starting with Washington, Oregon, and California, and eventually including Nevada, New Mexico, and Colorado. These changes may be partly racial, partly economic/cultural.

Note that the distinction between “Phase 1″ and “Phase 2″ is somewhat artificial. Really, the major event separating the two phases was the Watergate scandal and Nixon’s consequent resignation, which caused massive damage to Republicans and led to the election of Jimmy Carter, a previously obscure Southern governor. If we leave out 1976 and 1980 from the graph above, the remaining elections look like this:

In this representation, the shift of 1984-2012 corresponds to (b) above, the loss of Western states by Republicans. Washington/Oregon/California/Nevada currently add up to 80 electoral votes, marking a substantial shift toward Democrats. Racial diversity is an obvious driving factor here, with some additional role for new-economy jobs growth.

Perhaps a broader lesson from this diagram is that much of the last 50 years has involved continual change in the configuration of the electoral map. The exception is 2000 to 2012, a static period during which polarization has been massive (multiple Republican takeovers of the House of Representatives, impeachment of President Clinton, multiple government shutdowns, partisan passage of Obamacare, decreased productivity of Congress). Against this backdrop, any change may seem like a lot.

As I wrote last week, despite the upheaval on the Republican side there might not be much change to the map. The change from 2012 to 2016 currently looks, at most, like one-tenth the size of a real realignment.

The graph makes it look like 2016′s partisan alignment might be moving back a little bit toward 1964. I would not read too much into that. I think it is probably noise. Most states are unpolled, and this could just as well be caused by not having enough data. From a numbers standpoint, so far it looks like Donald Trump is basically Bush/McCain/Romney minus Utah, which is amazing considering that all three of those people have rejected Trump in various ways.

]]>