As we have done since 2004, we are taking a polls-only approach to give a daily snapshot of the race – as well as a November prediction. This approach has an effective precision of a few tenths of a percentage point of public opinion, and performs very well as both a tracker and a forecast. Currently, the probability of a Hillary Clinton victory in November is 85 percent, based on polls alone.
Today, I give a brief tour of the computational approach.
The Meta-Analysis starts with a Python script that downloads recent state polls from the Huffington Post’s Pollster operation. Thanks to Natalie Jackson, the HuffPollster team, and dozens of pollsters for this stream of information, which forms the foundation of the calculation.
Where polls are not available, we use the election result from 2012. As I have written, this year’s Clinton-versus-Trump state polls are strongly correlated with 2012′s Obama-versus-Romney polls. Because no realignment is evident, past results are a good predictor of the likely outcome this year. At the moment, no more than fourteen states are genuinely in play.
State polls are converted to a win probability by calculating (a) the poll median and (b) the confidence with which this median is known, as measured using the estimated standard error of the mean (SEM). Because the estimated SEM is calculated from polling data, it therefore includes pollster-to-pollster “house effects” within it, as part of the variation. These numbers are passed to a MATLAB program for the rest of the calculation.
These two numbers are converted to a win probability using the t-distribution, a method that allows for the possibility of outlier events. Those probabilities are diagrammed above in the cartogram, whose areas are proportional to each state’s electoral votes (EV). Note that the cartogram is principally for display purposes, and its EV totals reflect only one combination of all possible outcomes, which number in the quadrillions.
To get a full snapshot of every possible outcome, these probabilities are compounded to generate a probability distribution of all possible outcomes:
The central dark blue bars represent the 95% confidence interval. The tails are plotted in green.
At nearly all times, this snapshot shows a near-certain win for one candidate or the other. Because this probability is usually greater than 99%, we do not report it.
Instead, in the banner at the top of this website, we report the probability of a final election win. That quantity includes the possibility of movement between the time the polls were taken and Election Day, in November.
The Meta-Margin and November predictions
A key parameter in PEC’s calculation is the Meta-Margin. This is the amount by which the two-candidate margin in state polls would have to change in order to create a perfect electoral tie. For example, today the Meta-Margin is Clinton +3.86%. This means that Donald Trump’s election odds would be perfectly even if he picked up a net of about 3.9 percentage points among currently-undecided voters – or if about 1.9 percentage points of Clinton supporters switched sides..
Because the Meta-Margin is in units of percentage, it is a nice quantity to work with. If we have expectations about how much the Meta-Margin may move, we can predict what should happen in November.
To estimate how much future movement may occur, we make two calculations of probability, both of which appear in the banner at the top of this website:
- “random drift”: This calculation assumes random drift in either direction by an amount that matches past patterns in polling from 1952 to 2012;
- “Bayesian”: In this approach, the up-and-down variation in the Clinton-Trump polling margin so far in 2016 is used to establish a “prior”, i.e. the expected range of future movement. Variation to date is used estimate the midpoint of the likely range of futurevalues. This variation is then estimated conservatively to vary by approximately 7 percentage points, in order to not constrain the November possibilities too much. assumed range is more than twice as broad as the observed variation in 2004-2012, which were very stable compared to elections since 1952. Commentators say anything can happen, so bring it on.
At the moment, we are using national polls to set the prior. Once we have enough Meta-Margin history, we will use that.
The black curves indicate the median of the electoral vote estimator (top graph) and the Popular Vote Meta-Margin (bottom). For the EV estimator, the gray shaded region indicates the calculated 95% confidence interval. This confidence interval includes sampling error, variation in biases among pollsters, and changes in opinion during the period when the polls were taken. Because pollster biases tend to cancel one another on average, the true 95% confidence interval is smaller, typically less than +/-10 EV.
The graph’s calculations are explained here. Briefly, the red and yellow zones show a prediction range that combines random drift from current polls with a Bayesian prior. This Bayesian prior is calculated from the assumption that the average Clinton-Trump margin in national polls since January gives the center of the likely range of election outcomes. The prior has a Gaussian range with a sigma of +/-7%, consistent with 1952-2012 but larger than the amount of movement in the 2004-2012 election cycles. In other words, the prior is set to allow anything reasonable to happen.
The red zone is a “strike zone” showing the 68% confidence interval of probable outcomes. The yellow zone is a “watch zone” that shows a combination of the 95% random-movement confidence interval and the 95% gray-zone confidence interval. The November outcome is nearly certain to be within this range.