Princeton Election Consortium

A first draft of electoral history. Since 2004

Weekend Nerdery (Basic level, part 1): Make Your Own Senate Prediction!

September 20th, 2014, 11:15am by Sam Wang


Some of you may think that analyzing polls is some kind of wizardry. It’s not.

This is the first of a few Basic posts. I’ll have Advanced posts too for extreme PEC aficionados. As always, I reserve the opportunity to make minor corrections.

How To Make A Median-Based Prediction

Let’s make a simple Senate prediction. Here is a procedure that I myself follow when I am away from my computer. You will need the following items:

  1. A web browser.
  2. A pencil or pen.
  3. A sheet of paper. Ideally the good stuff, college ruled.
  4. The ability to do inequalities, and to count.

Things you will not need:

  1. Addition, subtraction, or multiplication.
  2. The ability to write computer code.

Ready? let’s begin.

Step 1. Point your browser to a good data source. I prefer HuffPollster. Or you might like RealClearPolitics, or another site. Find a race you care about.

Step 2. Decide on your goal – snapshot or prediction? You have to decide whether you want a current snapshot of conditions today, or a November prediction. There is also the question of whether you’ll take all polls, or whether you want to exclude some polls. Based on experience, I advise against excluding polls. PEC always takes all polls, with a preference for likely-voter polls, with accurate results. However, other sites often leave out partisan polls.

If you want a current snapshot, write down the last 3 polls or 3 weeks, whichever is greater. (At PEC, the time interval gets shorter as the election approaches). Alternately, if you want a November prediction, write down all polls since August 1.

Step 3. Write down a list of poll margins. Write down all the top-two-candidate polling margins from your list. If a polling organization has more than one poll on the list, cross out the older values. Your list should look like this.

Step 4. Take the median. Sort the polls in order of margin. Pick the middle value, which is the median. If you have an even number of polls, take the midpoint between the middle two polls. As an alternative, add or remove an older poll.

Step 5. Estimate the probability. Our next challenge is to figure out the probability that the outcome will end up on the other side of the margin by Election Day. In this case, the question is whether the margin will go to Thom Tillis, on the R side. We can do that by calculating a “one-tailed probability.”

Let us assume that our polls will be off on Election Day (i.e., sigma_movement+sigma_systematic) by an average of 3.0%*. In general, if the margin is more than this value, the candidate’s November win probability is at least 79%. In this example, Kay Hagan is at +3%, so her November win probability is 79%. If the margin is more than 6 percentage points, the win probability is at least 91%.

If you want a more exact answer, get a calculator and divide the margin by 3.0% to get a t-score. Then calculate the probability using this calculator:

For this calculation, I recommend 2 degrees of freedom. This is a conservative choice that allows for game-changing events and other surprises.

Now you have your win probability.

Analyzing The Whole Senate

If you want to continue and get your own crude seat count, here is how to do that. It is not what we do at PEC – we actually calculate the entire distribution of possible outcomes. But with just a pen and paper, the following is possible.

Step 6. Repeat for all the states. Repeat Steps 3 through 5 for all the close states. To see which states are in play, see RCP.

Step 7. Count up the very-likely seats. Count up the seats on each side with probabilities of greater than 80%. These are the very-likely seats.

Step 8. What about the rest of the seats? At this point, you should have four or so seats left with probabilities in the 20-80% range: Arkansas, Iowa, Louisiana, and so on.

In these close races, you could add up the probabilities to get an expected number of seats. Or you could  just call these close races coin tosses, and use that to estimate the likely range of outcomes. Whatever you do, these races will probably decide the total seat count. If you were to donate money or time, they would be the best target. This is why there’s so much focus on Iowa lately.

For Future Nerdery

There are several things we haven’t done. First, we haven’t calculated an exact distribution of outcomes, like the blue/red/green histogram in the right sidebar. Second, we haven’t delved into an absolutely critical part of the PEC calculation, the Senate Meta-Margin. The Meta-Margin is used to estimate the Election Day Senate seat count. It largely addresses the “covariance problem,” an issue that makes true polling nerds salivate.

To learn about these other issues, read the detailed explanations here and here. The second link is for the 2012 Presidential race, but the ideas are the same. I will come back to this another day.

>>>

Notes

*This assumes that poll medians typically deviate from the outcome by sigma=3 points on Election Day. Comparing August-September poll medians with actual election results, the median error in 11 close Senate races in 2010 was 3.3 percentage points.

This assumption does not take into account the variation among polls that are very dispersed, for instance in Alaska, where different pollsters come up with rather divergent results. To account for poll variability, calculate the standard error of the mean (SEM, defined as standard deviation divided by the square root of the number of polls). Then use the formula sigma=sqrt(SEM*SEM+3*3) to figure out the final likely error. However, doing this breaks my no-arithmetic promise.

[Update, Monday 2:00pm: Finally, note that this is the uncertainty for a single race. The uncertainty for the entire set of contested races, i.e. the fluctuation in the Meta-Margin, appears to be smaller. See this discussion.]

Tags: 2014 Election · Senate

37 Comments so far ↓

  • ArcticStones

    Sam, you wrote: “In 2012, the Meta-Margin ended at D+2.8% and the actual election meta-margin (i.e. tipping-point state) was D+5.4%, suggesting an 2012-Presidential Meta-Margin systematic error of 2.6% in the same direction as 2010-Senate. That’s actually rather large.”

    Do we have any reason to believe there might be a similar systematic error today – and in the same direction – in the Senate polls/forecasts?

    What would be the post-2014 Senate map if there is such a large Meta-Margin error or shift?

    .
    PS. I hope those are fair questions to ask.
    .

  • bks

    Six weeks to go!

    Fewer than four in 10 Americans can identify which political party controls the Senate and which controls the House, according to a Gallup poll released Monday.

    http://www.washingtonpost.com/blogs/post-politics/wp/2014/09/22/just-36-percent-of-americans-can-name-which-parties-control-house-senate/?wpmm=AG0003407
    –bks

  • Amitabh Lath

    Sometimes we use a sum of two gaussians to model distributions with long tails. A narrow one to fit the core, and a wide one for the tails. There are two sigmas, and the relative amplitudes. If there are asymmetric tails one can offset the mean of the wide gaussian.

    • Sam Wang

      The application to asymmetric movement would be useful in instances where we have a precise enough fundamentals-based model – or if we had good evidence for directional movement from other sources. That’s hard to tell with so few Senate races in play – the Meta-Margin is so noisy.

      My reading of the Linzer paper is that the 1-sigma on Presidential models is 3.5% of opinion. If that could be done for Senate (a big if), it still does not seem accurate enough to merit inclusion in a close race like this year’s. It also raises the possibility that a fundamentals-based model can accidentally drag predictions in the wrong direction.

      At some level, there is a divergence as to whether one’s original motivation is sporting or theory-oriented (the other guys), vs. policy issues like predicting the binary outcome of who will win (me). I think this leads to different emphasis.

  • MarkS

    Can you explain your reasoning behind using the t-distribution with nu=2? This part just baffles me, it seems to come out of nowhere.

  • Vicki Vance

    Hi Sam,
    Not directly related to this post, but I want to be sure that you still think the best places to invest are the four elections that come up on the left sidebar (IA, AK, AR and LA) before I donate again.
    Vicki

  • Jay

    Here you use an error term of 3%. Previously you’ve mentioned an error term (not including the movement term) of 0.7% or 1.0%. But you say your actual 2010 error was 3.3%. DailyKos finds a 4.0% average error in 2010 and 2012 combined. It still seems like your error term is too small. Maybe by only a point, but still small. It is true if you take the Upshot polls-only model and exclude house effects, you get 59% chance of Democrats holding the Senate, similar to your 70% chance. But you still seem to be underestimating uncertainty some.

    • Sam Wang

      I think your point is worthwhile but look at my previous writeup again please? My current thoughts:

      (1) I have discussed two kinds of error term. The first is the single-race uncertainty, which is about 3%. That’s the topic of this post we’re commenting on. It’s based on 2010 data, where I have 11 close races to draw from. 2012 looks similar. By the way, DailyKos may not be taking into account that the error is not symmetrically distributed: the tendency is for front-runners to expand their leads. So the average error is not quite the right number.

      (2) The second is the error term in the Meta-Margin. As I wrote the other day, it has two components. One (sigma_movement) is the movement in the Meta-Margin, which is smaller than the single-race uncertainty because races are not correlated much. If you look at the time-series plot, the standard deviation is about 0.7% (you have to correct for the Orman event on September 3rd to get that one accurately). Think of it as scaling like 1/sqrt(N), where there are N close Senate races.

      The other contributor to the Meta-Margin uncertainty is election-day error (sigma_systematic), i.e. how will final polls and final outcomes differ on average? I have had a hard time estimating this parameter. In 2012 it was quite small. In 2010 it was about 2.6% favoring the Democrats, which makes me think pollsters had a tough time with their likely-voter models. My current thought is to estimate it at 1.0%, but with long tails on both sides, i.e. don’t assume the error will favor either side.

      That would give an overall sigma of sqrt(0.7*0.7+1.0*1.0)=1.2%. I would like an open discussion of this parameter…October 1 is a logical time to introduce any corrections without giving the false appearance of movement. On that date I start phasing in a random walk component, so explanations will be necessary anyway.

    • Amitabh Lath

      Adding in quadrature is ok if the individual distributions are gaussian. But, if sigma_systematic has long tails you may have to convolve by hand.

    • Sam Wang

      Yes, though my goal is to estimate size of the uncertainty. The tails can be added at the end, so to speak.

    • Jay

      I see what you are doing better and I really appreciate you taking your time to put all this work into this, but if you have a sample size of 2 elections and one had systematic error of 0% and the other about 2.6%, the most likely systematic error term for the upcoming election would be 1.3%. I don’t believe there was actually 0% systematic error in 2012 and I think Nate Silver makes the point that there has usually been significant systematic error in elections over the last 50 years. Per http://sbronars.wordpress.com/2012/11/11/nate-silvers-value-added-and-systematic-forecast-errors/ , the polls underestimated Democratic performance in competitive races for President by 2%. Per your final prediction: http://election.princeton.edu/2012/11/06/presidential-prediction-2012-final/ , you underestimated Obama’s share of the two-party vote by 0.7%. By my calculation looking at your 10 competitive Senate projections, http://election.princeton.edu/2012/11/06/senate-prediction-final-election-eve/ you underestimated the Democratic performance by a median of 2.9% and mean of 3.2%. Your 5 competitive 2010 Senate seats ended up 2.8% median more democratic (3.8% by mean).

      This all suggests to me that the systematic error term should be at least 2% and possibly closer to 3%, unless I am misunderstanding what it represents. I think it represents the chance that polling averages across states (and especially the competitive states), will have errors in the same direction. Some of the error may be included in sigma_movement, but my understanding is that that contributes relatively little on election day.

    • Sam Wang

      Again, there is a difference between single-race predictive uncertainty (from now to November) and Meta-Margin systematic uncertainty (error in final polls, aggregated to estimate the entire Senate). Your information suggests my estimate of the former (3%) is OK. For Meta-Margin systematic uncertainty, closer analysis of 2006/2008/2012 Senate races is in order.

      2012-Senate: comparison of final poll margins and outcomes is consistent with a systematic error of less than 1% had no missed calls, suggesting a low error. However, in close races, the Democratic-minus-Republican margin outperformed polls by an average of 2.3%.

      2008-Senate: this and this are useful, and are suggestive of a systematic error of less than 1%.

      2006-Senate: I haven’t analyzed it yet but the RealClearPolitics archival data (note that you have to calculate your own medians) suggestsa systematic error of less than 0.5%. That year is actually kind of impressive: calculate medians for VA and MT to see what I mean.

      In summary: <0.5%, <1%, <1% 2.3%, 2.6%. I still like the 1.0% estimate…though I concede that increasing it to 1.3% and a fat tail might be good. So few data points…too bad we don’t run elections every quarter.

      Note that Presidential errors may not reflect Senate errors. In regard to 2012-Presidential, the 2% in the article you cite can be recast directly in terms of the Meta-Margin. In 2012, the Meta-Margin ended at D+2.8% and the actual election meta-margin (i.e. tipping-point state) was D+5.4%, suggesting an 2012-Presidential Meta-Margin systematic error of 2.6% in the same direction as 2010-Senate. That’s actually rather large.

  • 538 Refugee

    Where does Harry Reid get his polling data from? I find the 6 week remark interesting. I guess he is a ‘fundamentalist’? Will we start seeing polling shifts now?

    “The early part of the elections, [it’s all about] national issues, Obama being popular or not popular,” Reid said. “Six weeks out of an election, all they care about are things that affect them personally.”

    Reid added: “If the election were held today, we would unquestionably maintain control of the Senate.”

    Read more: http://www.politico.com/story/2014/09/harry-reid-senate-elections-2014-111115_Page2.html#ixzz3E3D360jW

    • Sam Wang

      He’s being hyperbolic, which is his job…but I do think it’s probable. Why not? The average of the poll-based snapshot is currently 50.5 Democrats plus Independents. In a substantial fraction of scenarios, Orman would not even be the 50th vote.

  • Steve Scarborough

    Sam — this is great! Thanks for sharing the information.

    Since I am rusty on my statistics, I am having trouble understanding how to work with the polynomial (http://election.princeton.edu/faq/). Can you point me to some sources with some simple examples to help me better understand how to work it?

    Also, what is your view on this person’s view that the binomial distribution is not the way to go (http://andrewgelman.com/2008/11/27/dont_use_the_bi/)?

    You are doing great work and I always look forward to reading what you have to say.

    Regards,

    Steve

    • Sam Wang

      The basic problem with the binomial distribution would be the question of whether the probabilities are independent of one another, i.e. if there’s an systematic polling error, or swing of opinion, across the board. The binomial distribution will not capture that.

      This is why I use the Meta-Margin, which allows such correlated moves to be modeled. In that case the binomial is OK. If you are working without software, simply keep in mind that if you are adding up Senate seats across the board, in 2014 a swing of 1.0 percentage point in opinion will move 1.0 Senate seat across the aisle, more or less. That number is different in different years.

      In regard to working the binomial…erm, look up some things on Pascal’s Triangle? Not sure, let me think about that.

    • Steve Scarborough

      Sam: Thanks for your response on the binomial and the polynomial.

      In my opinion, the probabilities are independent and indeed the binomial is the right way. To be frank, I simply do not follow Mr. Gelman’s thesis, which seems to focus on voting power.

      I do understand how to use Pascal’s triangle to help out on polynomials; however, I am struggling with how to practically work with the X in the polynomial. Obviously I am exposing my ignorance here. Sorry for the dumb question, but what is the X? Also, in my old eyes, I read that — in the latter portion of the P1 part — you have P1*X^EV1. In other words you have X to the EV1 power. Is that right?

      Thanks for indulging an old guy on this, as the last statistics course I took on probability was in the 1960s.

      Regards,

      Steve

    • Sam Wang

      I wrote the equation as a means of turning rules for combining probabilities into a more familiar format. In that sense, the X doesn’t really “do” anything…it’s a placeholder to help when writing computer code to do the calculation. (MATLAB has a built-in function to multiply polynomials called conv).

      For example, Ohio has 18 electoral votes and Florida has 29 electoral votes. If each one is perfectly 50-50, then the D (or R) candidate has a 25% chance each of getting 0 EV, 18 EV, 29 EV, or 47 EV. Using the polynomial notation, that becomes

      (0.5 + 0.5 x^18)(0.5 +x^29)=0.25 +0.25 x^18 + 0.25 x^29 + 0.25 x^47. The coefficients give the probabilities.

    • Steve Scarborough

      Thanks Sam, for taking your time to show the simple example for Ohio and Florida. Most helpful!

      Regards,

      Steve

  • wendy fleet

    *How* worried am I to be that Sept 21 meta-margin went from yesterday’s 1.8 to 1.7?

    Is it GGB (Golden Gate Bridge) time or it it “Take a deep breath & eat more 85% chocolate time”?

  • NP

    Electoral-vote.com uses the most recent 3 polls. However, the prediction on his site 51R boils down to the fact that Quinnipiac recently polled Colorado and Iowa as R by an 8 and 6 point margin respectively. These are definitely outliers and have already been shown to be by more recent polls. Colorado and Iowa should be considered a real toss-up.

    Control of the senate will essentially be based on which way these go. For the Republicans to gain control they have to win both. Unless that is Orman does not caucus with the Democrats – that essentially means that Chad Taylor dropping out was pointless but I guess it wouldn’t be the first time someone got stabbed in the back in politics!

    • Avattoir

      Taylor knew he would lose. How is his leaving the contest to improve the chances of the nation going a bit less batty remotely “stabbed in the back”?

    • NP

      But Orman has to caucus with the Democrats otherwise Taylor dropping out made no sense. One has to assume that Taylor made a “deal” with Orman or at the very least asked about Orman’s intentions. If Orman now caucuses with the Republicans then he would have changed his mind (hence the stabbed in the back comment).

  • A New Jersey Farmer

    And at HuffPost, Mark Blumenthal does his best to explain what the tiff is all about. The concern about the Quinnipiac sauce is interesting.

    http://www.huffingtonpost.com/2014/09/19/senate-models_n_5848920.html?utm_hp_ref=@pollster

  • A New Jersey Farmer

    I was just over at the Votemaster. He’s got Iowa and Colorado going to the Republicans. Looks like he just uses the most recent polls as opposed to an aggregate.

    http://www.electoral-vote.com/

    • Amitabh Lath

      Could be one bad Quinnipiac poll that pulls the means (but not the medians).

    • 538 Refugee

      Quinnipiac has a good track record but this year they are decidedly more Republican leaning compared to other polls. Time will tell.

  • SFBay

    Thanks for the lesson. If you have a simple, straight forward method of plotting polls and calculating the odds of a winner there’s no reason not to put it out there for everyone to use.

    I haven’t done this much homework in decades.

  • whirlaway

    Thanks for the tutorial. Nate Silver of course cannot provide any info of this sort. His corporate affiliation will mean that he has to claim that he has a “secret sauce” that he cannot reveal..

    Well, we will see in about 7 weeks! If I remember, you did better than Silver in 2010 and even in 2012 when he was the one who grabbed all the limelight..

  • Frances Smith

    Thank you for this explanation. I will trust your calculations since I haven’t the time to do them myself. It is gratifying to understand how you make them.