Princeton Election Consortium

A first draft of electoral history. Since 2004

Analysis issues (part 1)

July 9th, 2012, 9:43am by Sam Wang

Most of you won’t notice the details, but here are some issues on my mind.Traffic’s low so I ‘ll let it all hang out.

(1) Correction of biases. For the Presidential race, I give two estimates: one based on national polls (currently Obama +3.0 +/- 0.2%) and one based on state polls (currently 318 EV). The two are calculated somewhat differently – and illustrate a current problem that needs to be solved.

The national estimate is done by hand (perhaps we could automate it). The uncertainty is estimated using an estimated SEM, which is derived from the median absolute deviation of the last 30 days’ polls. The long time window is acceptable since the race is not moving much (as you can see in the graph at right). Using 1.0% of popular margin is equivalent to about 27 EV. In other words, these polls correspond to an EV confidence band of +/- 2 sigma = +/- 11 EV.

The state estimate is based on state polls, which are more numerous. The calculation is automated. If you’re reading this you know that we use those to calculate mean margin and estimated SEM. These values are the basis for everything on the top bar and right column: the EV estimator, the electoral map, the Meta-Margin, and the Power of Your Vote. For the EV estimator, mean margins are converted to Z-scores and then to probabilities using the t-distribution. Then we compound the probabilities to get a distribution of all possible outcomes (2^51 = 2.3 quadrillion).  From this the EV estimator comes out, along with a 95% confidence band. The current confidence band is 70 EV wide, about +/- 35 EV…

And this is what needs fixing. 35 EV is way too large! For example, the estimator hardly moves, which is a major tell that the true uncertainty is much smaller – probably comparable to the national race.

As I’ve written, the correct approach is to calculate the offset introduced by each pollster’s methods and subtract it from their individual results before calculating the mean margin and estimated SEM. For poll aggregation, such offsets are far more consequential than the famous “margin of error,” which is prominent in single polls. Once you start aggregating the combined MoE, which comes from sampling error, becomes quite small.

Such a de-noised estimator would be boring to watch. But it would tell us when some truly game-changing event happens. An example in 2008 was the addition of Sarah Palin to the ticket.

Anyway, that’s what is on my mind in regard to this analysis.

As a cautionary example, see FiveThirtyEight, where there is an item called the Now-cast. I believe this is meant to be a snapshot of current conditions. Given what I have written above, at any given moment, if their analysis is done correctly, the leading candidate’s win probability should be very high, close to 100%.  Yet they give a probability around 80%. This indicates an underlying error in their analysis. It is only for future projections that the probability should drop. Somewhere in the guts of that calculation, there’s a failure to keep noise out of the  signal.

Think of it as the difference between knowing what temperature it is right now, and knowing what the temperature will be like tomorrow. With a good instrument, you should know the first quantity with high accuracy. That’s what we are trying to construct – a good instrument.

(2) The Senate and House (aggregated). For the Senate, a national estimator is possible – once polling picks up in states like Indiana and North Dakota. I suspect the Democrats are currently barely hanging on. But only more data will tell.

For the House, based on June national Congressional preference polls at, the estimated national vote is Democrats +2.5 +/- 1.1 %, for a very likely (>95% probability) takeover if the election were held today.

Many other aggregators do not deliver the value that they could. For example, I find the curve fititng at TalkingPointsMemo to be unhelpful in reducing uncertainty – again, they combine signal with noise. is an excellent data source (and has interesting commentary). But their graphs are potentially misleading since they weight each poll equally. This favors organizations that poll often, including biased ones such as Rasmussen. I tend to do analysis by hand, and I find older sources to be simpler to use and look at: (the original!) and RealClearPolitics.

(3) Individual Senate and House races. We’re not going to address this because of the lower density of polls. This is where introducing non-polling variables such as campaign finance and partisan voting index (PVI) becomes useful. There are sites out there that do this well.

If you support our efforts, visit our ActBlue page.



Tags: 2012 Election · Site News

One Comment so far ↓

  • Wu

    Good to see you put up a new ActBlue page, Sam. I just took advantage of it with the three candidates listed.