Princeton Election Consortium

A first draft of electoral history. Since 2004

About PEC and the Meta-Analysis (FAQ)

(Updated on 29 March 2016 by Sam Wang – new material at end)

The right-hand sidebar features a meta-analysis directed at the question of who would win the Electoral College in an election held today. Meta-analysis provides more objectivity and precision than looking at one or a few polls, and in the case of election prediction gives a highly accurate current snapshot. In 2004, the median decided-voter calculation captured the exact final outcome. In 2008, the final-week decided-voter calculation was within 1 electoral vote.

Calculations are based on recent available state polls, which are used to estimate the probability of a Democratic/Republican win, state by state. These are then used to calculate the probability distribution of electoral votes corresponding to all 2.3 quadrillion possible combinations. For a popular article about this calculation, read this article and the follow-up.

Is this Meta-Analysis a prediction of what will happen on Election Day?

The basic analysis does not; it is a snapshot of conditions today. Between now and Election Day, think of the Meta-Analysis as a precise snapshot of where the race stands at any given time. In late October the Meta-Analysis should come quite close to the actual outcome.

Starting in 2012, this site also provides a prediction (see essay 1 essay 2) based on the current year’s polls and the amount of variation observed in similar past races. This is a true prediction for November. It has the specific advantage of not relying on poorly-justified assumptions such as econometric conditions. It relies only on polls, which are the only direct measure of opinion. The approach taken in both popular and political science models introduces more noise than signal, as discussed in this essay.

What’s different about this analysis in 2012 compared with 2008?

The main difference is the addition of a prediction for Election Day as described above.

What was different about this analysis in 2008 compared with 2004?

In 2008, three major changes were made.

First, the Meta-Analysis relies entirely on the well-established principle that the median of multiple state polls is an excellent predictor of actual voter behavior. On Election Eve 2004, a calculation based on this principle made a correct prediction of the electoral vote outcome. Additional assumptions were unnecessary and unwarranted. In 2008 the calculation is kept simple – and therefore reliable.

Second, the calculation is automated to allow tracking of trends over time. This allows the Meta-Analysis to be used to identify changes in voter sentiment as seen through the lens of actual electoral mechanisms.

Third, instead of focusing on battleground states, we are tracking all 50 states and the District of Columbia.

In the Meta-Analysis, how can you possibly go through 2.3 quadrillion possibilities? Wouldn’t that take forever?

The Meta-Analysis doesn’t actually calculate the probability of every combination of states one at a time. At a rate of going through a million combinations per second, that process would take over 71 years. Yet repeated simulation is exactly what other sites do – though they only do thousands of simulations, not quadrillions. Such a laborious approach means that they can only approximate the expectation based on a set of win probabilities.

Instead, the Meta-Analysis uses an overlooked method to calculate the probability of getting an exact number of electoral votes, covering all ways of reaching that number given the individual state win probabilities. This is a much easier problem – it can be solved in less than a second. Here is a simple example.

Imagine that there are just two states. State 1 has EV1 electoral votes and your candidate has a probability P1 of winning that state; in state 2, EV2 electoral votes and a probability P2. Assume that EV1 and EV2 are not equal. Then the possible outcomes have the following probabilities:

EV1+EV2 electoral votes (i.e. winning both): P1 * P2. EV1 electoral votes: P1 * (1-P2). EV2 electoral votes: (1-P1) * P2. No electoral votes: (1-P1) * (1 – P2).

In general, the probability distribution for all possible outcomes is given by the coefficients of the polynomial

((1 – P1) + P1 * x^EV1) * ((1 – P2) + P2 * x^EV2) * … * ((1 – P51) + P51 * x^EV51)

where 1…51 represent the 50 states and the District of Columbia. This polynomial can be calculated in a fraction of a second.

Why don’t other projection sites use your approach?

Three reasons.

First, the Meta-Analysis is unlike, say, fantasy baseball, where a lot of enjoyment comes from thinking about individual scenarios. We take no interest in specific scenarios; we want the median outcome that takes into account all possibilities. This gives the most precise possible answer, but it lends itself poorly to color commentary.

Second, the treatment is somewhat mathematical. Hobbyists at other sites may not have the expertise to take the polynomial shortcut, which is made possible by the fact that the Electoral College follows a relatively simple system in which EV are added up. Certain aspects of the Meta-Analysis are original and may someday be published.

Third, a properly done calculation reduces noise. This, in turn, reduces variation – and opportunities for commentary. Most media organizations want more commentary, not less.

What polls do you use? When do you exclude a poll?

We use all available state polls, with a preference for likely-voter polls over registered-voter polls when both are released. We do not exclude any poll.

For the current snapshot, the rule for a given state is to use the last 3 polls, or 1 week’s worth of polls, whichever is greater. A poll’s date is defined as the middle date on which it took place. In some cases 4 polls are used if the oldest have the same date. At present, the same pollster can be used more than once for a given state. From these inputs, a median and estimated standard error of the median are used to calculate a win probability using the t-distribution.

What do you think of my favorite/despised pollster?

Appropriate aggregation methods remove the need to dissect individual polls. For example, median-based statistics correct for outliers. Also, human beings engage in motivated reasoning, and look more critically and closely at polls with which they disagree. Avoiding this bias leads to more accurate results. For these reasons, commenting on individual pollsters is usually not productive.

Indeed, good aggregation has the potential to free up mental (and media) space for information about more important topics than individual polls.

When do updates occur?

Every day at midnight, 8:00am, noon, 5:00pm, and 8:00pm.

Your estimate fluctuates less than other sites. Why is that?

It’s the power of meta-analysis. Even though individual polls may vary, aggregating multiple polls per state reduces uncertainty. Calculating the entire distribution of outcomes does an even better job. As a result, the Electoral Vote estimator on this site typically doesn’t move much. This is in fact the point of the analysis – to get past the vagaries of day-to-day poll reports. Rigorous meta-analysis is sometimes less exciting to watch than a site that varies every day, but in our view it’s the best way to present polling data.

Your calculation could be used to give a win probability. Why don’t you show this?

The uncertainty at any given moment is small enough that the results of an election held today would not be in much doubt. At any given moment, a current-poll win probability is typically greater than 95% for either candidate. Because this quantity is the wrong one to focus on, it is not given.

The greater uncertainty comes from changes that may happen over time between now and Election Day. This can be used to derive a true November-win probability. Some challenges in estimation are discussed here.

From day to day, a very useful quantity is the Popular Meta-Margin, defined as how much swing would have to take place to generate a near-exact electoral tie. The Popular Meta-Margin is equivalent to the two-candidate difference found in most single polls. It has many uses because it tells you where the race stands in units of voters. Errors in polling such as cell phone user undersampling and third party candidates are in these units, and therefore can be compared directly to the Meta-Margin.

For those who still insist upon getting a probability from the Meta-Analysis, it can be computed by pasting current histogram data into a spreadsheet and summing rows 270 through 538.

Why should I believe the Meta-Analysis? In 2004, didn’t it predict a narrow Kerry victory?

Actually, the method was fine, but its inventor, Prof. Sam Wang, made an error. In the closing weeks of the campaign, he assumed that undecided voters would vote against the incumbent, a tendency that had been noticed in previous pre-election polls. Compensating for the “incumbent rule” had the effect of putting a thumb on the scales, lightly – but unmistakably – biasing the outcome.

Leaving out this assumption, the prediction in 2004 was exactly correct: Bush 286 EV, Kerry 252 EV. In retrospect, it’s clear that the incumbent rule is subjective and cannot be relied upon. You can read about the confirmation of the prediction in the Wall Street Journal (pre-election story here). A second confirmation came in 2006, when, using a related but simpler method Sam expected the odds of a Democratic takeover of the US Senate were 50-50, a higher chance than predicted by either pundits or electronic markets. Indeed, that event did end up occurring. Finally, in 2008, and Presidential and Congressional calculations did extremely well.

Overall, the analysis is kept as simple as possible as a means of avoiding unintended bias. Both data and the code for doing the calculations are freely available. That way, anyone can check the results. Everything was open in 2004 as well; readers provided lots of useful feedback, such as this exchange.

State polls are done less often than national polls. Does that introduce a delay into your analysis?

Yes. As of early August this delay is about two weeks in key states. The delay will diminish dramatically as the campaign season progresses. A correction based on national polls is possible, but adds considerable uncertainty to the estimate.

What is the Popular Meta-Margin?

The Popular Meta-Margin works like the more familiar margin between two candidates. It is defined as the amount of opinion swing that is needed to bring the Median Electoral Vote Estimator to a tie. It helps you think about how far ahead one candidate really is. For example, if you think support for your candidate is understated by 1%, this can overcome an unfavorable Meta-Margin of less than 1%. If you think that between now and Election Day, 1% of voters will switch from the other candidate to yours, this is a swing of 2% and can compensate for a Meta-Margin of 2%. The Meta-Margin has a useful precision of approximately a tenth of a percentage point.

What if I think that polls are biased against my candidate? Do you provide a tool for me to see how a bias changes things?

One tool is the Popular Meta-Margin (see above). Another tool is the map in the right-hand column, which comes in flavors that show single-state probabilities with a 2% swing toward either candidate.

What are jerseyvotes? And can you explain the “Power Of Your Vote” table?

Jerseyvotes, invented at this site in 2004, are a way to measure the power of individual votes to sway the election. Conceptually, jerseyvotes are distantly related to the Banzhaf Power Index, but normalized to the power of one individual. As originally envisioned, if you have ten times as much influence over the national win probability as a voter in New Jersey, your vote is worth 10 jerseyvotes. Sadly for the hosts of this site, one jerseyvote is not worth very much.

However, like many deflated currencies, the value of a vote in New Jersey can fluctuate wildly. Therefore, since 2012, the jerseyvote has been pegged in value to the state with the most influential voters; each of their votes is defined as being worth 100 jerseyvotes. A vote in New Jersey is worth some number of jerseyvotes, but usually not 1.0.

The Voter Influence table in the right-hand sidebar displays information about the ten states currently with the highest jerseyvotes, plus New Jersey for comparison. The jerseyvote statistic for each state is listed in the “Power” column, and they are normalized so that the most powerful state has power equal to 100. (Originally, this power statistics was normalized so that NJ voters had power equal to 1, hence the term “Jerseyvotes”.) The current polling margin, as determined by the meta-analysis, is also displayed for each state. For example, if the meta-analysis indicates that NJ is currently polling 50% for Obama and 44% for McCain, then NJ’s “Margin” column would read “Obama +6%.”

In your future prediction / current snapshot, the probability is very different from the InTrade price. Why is that?

It is wrong to interpret InTrade prices as true probabilities. Those prices reflect what a number of bidders think to be the win probability. InTrade bidders tend to be underconfident in evaluating polling data. On Election Eve, even a 5 +/- 1 point lead for a candidate is often insufficient to drive a share price above $0.90. However, it is true that the candidate with an InTrade price above $0.50 is usually the leading candidate. The issue is analyzed further here.

I wrote a comment but it does not appear. What happened?

The site is moderated to shape the discussion. Our audience includes a wide range of numerically-oriented professionals and academics. Many are also partisans of various stripes.

Here are just a few examples of comments that can get deleted or delayed:

  • falsehoods
  • abuse
  • perseveration on one data point such as a single poll or pollster
  • opinions given without evidence, especially quantitative evidence
  • long comments
  • questions that are already answered somewhere in the left sidebar
  • diversions from the point at hand that are not interesting
  • comments on unrelated content from other aggregators

Of course, all of these statements can find a home at unmoderated sites, of which there are many.

Several other sites emphasize the possibility that states tend to vary together, so that if one poll is off, then others will be off in the same direction. Why don’t you include that in your model?

Assumptions should only be added to a model if they make a difference in the outcome.

For a snapshot, adding covariance makes very little difference. For instance, let us make the assumption that all polls move together between the last day of polling and Election Day by a random small amount (up to 1%, say), and the random amount is unbiased. In this situation the median EV estimator does not change by a measurable amount. The uncertainty in the final outcome, as measured by 95% confidence interval, gets a little wider. But that’s it. Thus, since covariance has no substantive effect, it is left out.

For a prediction, the answer is more nuanced. In this situation, the change between polling day and Election Day could be considerable. Now, the way that the change is modeled affects the shape of the distribution. However, the median is still the same. In this case a simple and effective way to vary long-term change is to covary all states together by a random amount. This is at the core of the prediction, a feature that was introduced starting in 2012.

This site has discussed the subject of non-independence here and, most recently, here.

What do you do with third-party candidates? Can such a candidate shift the outcome?

We take whatever the pollster gives us as the margin between the two leading candidates. Third-party candidates tend to fade in the finish in a system with two dominant parties. Some pollsters give third-party results and some don’t. This kind of detail might help, especially for analyzing local/state races where third-party local candidates run strong, such as Maine.



Why was my comment moderated?

In order to shape the conversation, this website is moderated. PEC focuses on data analysis. Therefore comments regarding data are most likely to get through. Incorrect statements about data, statements of opinion, and advocacy for specific candidates are discouraged. If your focus is on opinions, you might prefer sites like Democratic Underground and Ace Of Spades.