Princeton Election Consortium

Innovations in democracy since 2004

Outcome: Biden 306 EV (D+1.2% from toss-up), Senate 50 D (D+1.0%)
Nov 3 polls: Biden 342 EV (D+5.3%), Senate 50-55 D (D+3.9%), House control D+4.6%
Moneyball states: President AZ NE-2 NV, Senate MT ME AK, Legislatures KS TX NC

Lessons from 2016 and application to 2020

November 24th, 2019, 2:23pm by Sam Wang

For his piece on polling in the New York Times, Giovanni Russonello contacted me with questions about what went wrong in my 2016 analysis. Our starting point: my Election Eve estimate that Hillary Clinton’s Meta-Margin of +1.1% led to a 93% probability.

The simple answer is that I underestimated the minimum uncertainty in state polls, at less than 1.0 percentage point. This was a holdover from the calculation script, which I set up in May 2016 and neglected to revisit.  By the time October rolled around it seemed inappropriate to change it suddenly.

My email to him is reproduced, with slight edits, after the jump.
Dear Gio,

To be completely honest I am far less interested in poll aggregation this year. I think there is an unhealthy fascination with horserace in the press. This attention would better be spent focusing on policies and substance. Ironically, I started doing aggregation in 2004 to reduce attention to polls.

I am far more interested in optimal resource allocation to help citizens be effective locally, in states and districts. I think citizen action is much more interesting than poll-based punditry. My first big change is to shift focus to states and districts.

Now, here is the direct answer to your question:

My method is pretty simple and has very few moving parts. I basically estimate the distribution of all possible electoral college outcomes, using state polls and rules of compound probability. My fundamental change is to build in a hard floor to the amount of overall uncertainty in those polls. It involves changing one line of code.

Opinion polls have two kinds of uncertainty. The first arises from the fact that even if your sample is representative of the voting population, you didn’t talk to everyone and so your estimate of what happens on Election Day will be a little off. That is called “sampling error.” When many polls are put together, it’s surprisingly small, a fraction of a percentage point.

The second kind of uncertainty is called “systematic error” by physical scientists. It refers to the fact that the entire set of measurements might be off by some amount. It’s like a scale that reads some nonzero number even when there’s nothing on it. In 2016, that systematic error was about two percentage points, and it was greatest in Republican and swing states such as Wisconsin.

There’s one line in my script that specifies this systematic error. In 2016, I set it to less than 1 percentage point, which was a mistake (frankly, in spring 2016 this did not seem like the critical assumption). In 2020, I’ll set it to two percentage points. That will increase the uncertainty much more, which will set expectations appropriately in case the election is close. For example, in 2016 this would have set Hillary Clinton’s win probability to 68%, which seems about right.

The rest is up to the pollsters, who are continually working hard to improve their methods. We all depend on their efforts.

Warm regards,

Sam Wang


Giovanni Russello wrote:

Hi Dr. Wang –

I hope this finds you well. I’m working on a story on polling in 2016 — particularly those famous miscues! — and how pollsters and forecasters are adjusting as we look ahead to 2020.


Tags: 2016 Election · 2020 Election

No Comments so far ↓

Like gas stations in rural Texas after 10 pm, comments are closed.