Princeton Election Consortium

A first draft of electoral history. Since 2004

The mailbag

In my evaluation of how the predictions turned out, Reader AA offered the following:

It is meaningless to compare your personal prediction to the output of other website’s models. I believe 538 was reporting the mean of all simulations? What was the mean of your EV distribution? Alternatively you could compare the mode and median of your two models.

This is a brave sally, but is not quite right. My reply is rather technical, but perhaps a few will be interested.

First, let us consider the idea that one should consider “means vs. means.” Depending on the nature of the calculation, the mean of a probability distribution is not an appropriate parameter to report. EV outcomes are discrete. When the distribution is spiky, a mean may not be close to any likely outcome.

Models that assign intermediate win probabilities to individual states yield a distribution that is sufficiently smoothed that the mean and median are indistinguishable. This can come from using fewer polls (therefore more uncertainty) or by blurring out the polling data. In contrast, a pure poll-based distribution gives fewer states that are uncertain and a sharpness that allows the mean and median to differ. Because I regard the median as predictive, I made a “median vs. median” comparison.

But if you really want it:

Mean vs. mean: The PEC probability distribution’s last-day mean was 352.8 EV. By this metric, we were closer to the final outcome (365 EV) than FiveThirtyEight’s 348.5 EV.

Median vs. median: PEC, 352 EV (single-day), 364 EV (final prediction), FiveThirtyEight 348.5 EV.

Mode vs. mode: This is the game of guessing every state. At 353 EV, we tied.

So that’s two out of three, with the third one a tie.

Aaron appears to think that the specific prediction I made lacked a rationale. This is not true. For a final prediction, I could reasonably have chosen any large spike that fell within the 68% CI. As I mentioned to another commenter, for most of October the median spent most of the time on one of three values: 353, 364, and 367 EV – all quite close to the final outcome. I was highly likely to choose one of these values. In the end, I followed two lines of reasoning.

1) The EV estimator was fluctuating more than I would like to see. I wanted to resolve this by integrating over a longer period. The problem was how to determine that period. To resolve this I examined the SD of the medians from day X to November 4th, where X varied from mid-September to late October. I found that this SD was fairly constant, but increased for values of X earlier than approximately October 4th.Therefore it would be reasonable to use all data over that period, during which the median of the medians was 364 EV.

2) The other argument is the one I gave at the time. The question was whether polls were accurate indicators of actual Election Day behavior. Since several Missouri and Indiana were basically tied, the median was not stable. Assuming a cell-phone correction of 1% was sufficient to bring the daily snapshot toward where it had been for the past month. I should also point out that this correction is considerably smaller than the 2.8% claimed by Silver. Also, the final prediction met the aforementioned criterion of still being inside the 68% confidence interval of the daily snapshot.

That’s the entire analysis. The corresponding assumptions in the FiveThirtyEight model include: 1) assignment of weights to pollsters, 2) correction based on national polls, 3) demographically-based assignment of undecided voters, 4) an inverse Bradley effect based on primary-season voting patterns, 5) an overall drift in sentiment between poll day and Election Day, and 6) a variable half-life of polls, all topped off with 7) Monte Carlo simulations that only approximated the exact distribution. The summed effect of all of these assumptions led to a difference of 3.5 EV from our final snapshot of last-week polls. Also, a cumulative effect of the assumptions is that simulation is no longer necessary. With many uncertain states the same answer can be gotten by taking a simple probability-weighted average of EV.

It’s a shame to see an otherwise very engaging web resource marred so. But this intricacy was very much a part of the attraction. As Observer writes:

Your methods don’t generate a lot of debate, ergo far lower traffic and far less discussion….A simple clear picture wouldn’t have helped newspaper sales, and it apparently doesn’t drive traffic here either. Sad, but that’s how it is.

FiveThirtyEight does have other merits that make the site quite appealing. Silver and Quinn are fun and brash. They did original reporting, unusual for bloggers. Conventional wisdom is settling in that the Model (I think they call it that) did well. Ironically, it’s the weakest aspect of the site. Luckily for them it was an easy year for analysis.