Princeton Election Consortium

A first draft of electoral history. Since 2004

Senate 2014 Election Day Model (geeky version)

September 9th, 2014, 6:20pm by Sam Wang

[Today I wrote about the forecast at The New Yorker. Below, the math... - Sam]

The Election Day prediction for Senate control is now fully live. There are parts that may require further tinkering, especially concerning how the results are displayed. But it’s basically in place.

Those of you who followed PEC in 2012 will recognize all the components. Basically, I have adapted the Presidential model for the Senate. Here I will document key components that are needed for the Election Day prediction.

First, let me summarize the core principles of any model at the Princeton Election Consortium.

  1. Polls are the only inputs, and are never “corrected.” Median-based statistics are used to reduce the influence of outliers.
  2. Fundamentals based in political science have a minimal role.
  3. The most important output of the model is the Meta-Margin (and not the win probability).
  4. Polls from earlier in the year are used to predict future outcomes.
  5. PEC’s results let you direct your attention and activism to the races that matter most.
  6. All the code is open-source.

Today I will focus on principles 2, 3, and 4. They involve assumptions that should be examined critically.

Principles 2 and 4 are the biggest differences between us and NYT/FiveThirtyEight, which use models that are based on polls, but also draw upon non-polling fundamentals to nudge expectations for where the race ought to be. The general gist of the fundamentals is that political conditions this year favor the Republicans. Since Democrats are currently outperforming those expectations, the other models predict that in the next two months, Senate polls will drift toward the Republicans. At the core, this is why those models lean more than we do toward the possibility of Republican control of the Senate in 2015.

However, this year a fundamentals-based correction is fraught with difficulty. The reason is that the Senate is likely to split fairly close to 50-50 between Democrats+Independents and Republicans. In this circumstance, even a small amount of imprecision in the non-polling-based estimates can send a prediction awry. In fact, I estimate that FiveThirtyEight’s consideration of fundamentals has the effect of shifting the polls toward Republicans by about 2.0 percentage points. That may not seem like much, but it is enough to flip the sign of the prediction.

To illustrate the difficulties associated with a fundamentals-based prediction, let’s take the House of Representatives as an example. In midterm elections, on average, the Congressional ballot preference moves away from the President’s party from Labor Day to Election Day. However, in the last four midterm elections (2010, 2006, 2002, and 1998), the ballot preference has gone toward the President’s party, away from it, or not changed at all. That is a demonstration of the fact that fundamentals-based models are quite variable in their predictions, perhaps too much so to help.

This year, the data so far indicate that fundamentals aren’t matching the polling data so far. On average, in midterm elections, opinion usually goes against the President’s party. But here is what the generic Congressional ballot looks like.

Most of this graph is not very far from the blue line, which labels the national House vote in the 2012 election. In that respect, the generic Congressional ballot is looking rather different from the campaign of 2010, when opinion favored Republicans all year. In other words, the average fundamentals-based expectation for the House is not being met in 2014.

Now let’s look at Senate polls. Here our best source is PEC’s own Senate Meta-Margin, an extremely valuable tool. The Senate Meta-Margin is defined by how much polls would have to be shifted, across all states, to bring the Democratic/Independent control probability to exactly even odds for the two parties. Since the Meta-Margin is in units of public opinion, it’s a convenient quantity to think about, and to model.

Here is this year’s graph for the Senate Meta-Margin.

The most apparent quality of the Meta-Margin is that it has been above the red line, i.e. favoring Democrats+Independents, for most of this year. (The last big jump at the start of September reflects the withdrawal of Chad Taylor (D) in the Kansas Senate race, which opened the way for independent candidate Greg Orman.) Also, it isn’t moving strongly in either direction. To put it simply: Democratic candidates have steadily outperformed the fundamentals-based models.

Considering the lack of obvious directional drift, I have chosen to model changes in opinion over the coming two months as a random variable that can move in either direction, toward Republicans – or toward Democrats.

The range over which the Meta-Margin is likely to roam can be estimated from the graph above. Since June the Senate Meta-Margin has fluctuated between R+1.0% and D+1.7%. Its average has been D+0.3%, with a standard deviation of 0.6%. Based on this, we would expect polls in late October to have a Meta-Margin of D+0.3±0.6%.

The next question is how to convert this predicted Meta-Margin to a probability of control by one party or the other. The model (MATLAB code here) uses the following assumptions. I note that these are normal, reasonable assumptions, do not add bias to the outcome, and are neutral.

Systematic bias in polls. On average, polls will differ from true election outcomes. This difference arises from pollster biases and misjudgments (“house effects”) and other sources of noise. Systematic bias is the main source of uncertainty in interpreting the Senate Meta-Margin. I have assumed that the systematic bias has an average (rms) value of 0.7%.

The Orman offset. Since Greg Orman (I-KS) was not a factor from June through August, his candidacy is not reflected in the history of the Senate Meta-Margin. His viable candidacy effectively shifts the Senate Meta-Margin toward the Democrats+Independents. If we give him a 50-50 probability of a November win, this effectively moves the Meta-Margin toward Democrats+Independents by 0.47%.

What if the fundamentals-based models are right after all? Despite the absence of evidence thus far, future movement remains a possibility. I would loosely refer to this as a black-swan event that is not predictable from past polling data. The exit of Chad Taylor (D) from the Kansas Senate race was a black-swan event (though note that I did anticipate this event). The possibility of movement that matches the FiveThirtyEight bias is also a black-swan effect, and can be modeled using a t-distribution.

To put outlier events into perspective, the conversion between Meta-Margin and Senate seats is about 1.0 percent per Senate race. So an outlier event would be the equivalent of more than two seats flipping than one would normally expect. For example, Jay Cost at The Weekly Standard says“a true repeat of the 2010 wave should therefore give the Republicans 54 seats.” According to the PEC model, that would be 4 sigma away from our prediction, a clear outlier event.

What’s the bottom line? With these assumptions, the probability of Democratic+Independent vs. Republican control is expressed by asking: will the Senate Meta-Margin be positive or negative, where its historical mean is D+0.3% and its sigma is 0.92% (the propagated combination of the standard deviation, 0.6%, and the systematic error, 0.7%).

Including the Orman offset, Pr(Meta-Margin>0) is expressed by the MATLAB expression tcdf((+0.3+0.47)/0.92,1)=0.722, which rounds off to 0.70. It’s even possible to put an error bar on this probability. Using the systematic error as sigma, 1 sigma lower gives a probability of 0.524, and 1 sigma higher gives a probability of 0.822.

In other words, without equations: the probability that Democrats and Independents will control 50 or more seats is 70%. Because of uncertainties, this probability has a likely range of 50 to 80%.

When converted to a seat count, the Senate Meta-Margin corresponds to a caucus of 50.3±0.9 Democrats and Independents. The likeliest outcome is a 50-50 split, with Greg Orman having to decide how he will caucus. This prediction is plotted at the top of this post, with the 1-sigma “strike zone” plotted in red, and the 2-sigma zone plotted in yellow.

As the election nears… Fluctuations in the Senate Meta-Margin suggest that the daily snapshot starts to be predictive of the final outcome about 5 weeks in advance. When that day comes, we will start using the daily snapshot to weight the Election Day prediction.

Tags: 2014 Election · Senate

44 Comments so far ↓

  • Jeremiah

    No disrespect at all, Dr. Wang. I admire the prognosticators and mathematicians, much smarter than myself. And for the record I agree with your predictions more than anyone else at the moment as, indeed, an accurate indicator.

    However I’ve been in this game a long time. Humans can be impossible to even fall within the margin of error. Weather can be a factor. Ground-game can be a factor. October surprises can be a factor. ANYTHING can happen in ANY race. So mathematical predictions can be close. But all it takes is 1 or 2 or 3 surprise upsets that can make the math look like a joke. Maybe that’s why I love politics. The built-in chaos.

  • Ethan

    The only probability that matters is election day. But I think Sam is on to something. There has been way too much special sauce later.

    It was to the point that *sauciers* were including if a candidate owned a cat or dog.

    Let the pure data, alone, speak for themselves.

  • Jeremiah

    I love the deep scientific calculations of incredibly smart and talented people such as Dr. Wang.

    However. This is politics. You can make excellent mathematical predictions… and then Majority Leader Cantor loses his primary. And then Sen. Jeffords switches parties. And then Chad Taylor drops out. And then Sen. Zell Miller lurches hard right. And then Charlie Crist and Parker Griffith switch parties 10 times. And all the math becomes moot.

    • Sam Wang

      Lots of truth in your statement. However, the math is still useful. Look at what we do here as a thermometer. It gives you a reading that you may use in any way you see fit. I can’t tell you what Orman will do as a deciding vote in the Senate…but I can tell you the probability that he will get that chance.

      I assure you that this kind of information is available to very well-financed entities such as the national parties. PEC gives it to you for free.

  • Nathanael

    “fundamentals” in 538′s terms == assuming that past trends will continue.

    I have been studying history. A lot of history. We have just hit a sociopolitical change period, similar to the destruction of the Whig Party, or the party platform flip-flops in the 1913-1932 period, or the Dixiecrats joining the Republican Party in the 1960s. Past trends are crap for prediction during such a sociopolitical change period. (For example, midterms now favor the President’s party, but there are plenty of other such “trend breaks”.)

    If you corrected using the right “fundamentals” you might get a better prediction, but the “fundamentals” which held true through the 20th century don’t hold true any more. So if you don’t want to make big historical predictions, just sticking to the polls works better.

  • Tom

    I’ve read your justification a few times for the projection–but I just don’t get how assuming a random walk is the thing to do to carry the “snapshot” into a “projection.”

    The other thing that I’m not that satisfied with is your covariance arguments. I’ve read them several times, but it seems that you are missing the objection–*some* things you might reasonably expect to covary with all states similarly; however, *some* states are much more like each other than the rest, and bias there, if true, would amount to a partial but systematic drift–not at all like adding +/-1% or +/-2% all-51 bias.

    Otherwise I appreciate the clarity and simplicity of your methods, and particularly enjoy the cute EV trick (mod the reservations above, which I know you feel you’ve addressed!)

    • Sam Wang

      Actually, covariance is added at the end, in the variable called “systematic.” The architecture of the analysis is to acknowledge that covariance can happen, and let it come up at the end so as to avoid unwanted blurring. Covariance can happen from co-movement of candidates in different states, or if pollsters are off as a community. All of this is contained in the analysis. It’s worked well in the past.

      The way to fit your thinking into this calculation is to ask how much that covariance would add, net, across the board. If you don’t like systematic=0.7-1.0%, let me know with a reasoned argument, ideally with some math…

    • some dood

      not sure how relevant this simple example is (though I’d like to know how you know it’s irrelevant if it is), or if I’m somehow misunderstanding what you’re doing, but imagine there are two states (A and B) and democrats win if they win either. Using your snapshot the democrats have a 33.3% chance of winning each and so, based on an assumption of independence have a 55.55% chance of winning the whole thing. If, however, you know the polls in A and B are perfectly correlated (not sure if it’s possible to know something like this beforehand), then the democrats’ chance of winning is actually 33%. Not sure how big an effect that would have on the meta-margin, but if I’m understanding it right the sign would change with the correlation, which makes me a little leery.

    • Tom

      Okay, thought some more about the treatment of covariance. I get the justification now, and seems pretty reasonable overall.

  • 538 Refugee

    538 headline is “Senate Update: A Big Swing In North Carolina Improves Democrats’ Odds”. I was wondering when this “Big Swing” happened. The article goes on to explain the swing based on the last two Rasmussen polls and a SurveyUSA poll from Thursday that moved Hagan ahead. If they have a graph of the polls they are following in each race I sure can’t find it in that article or on the site. At least not easily. I find very little in terms of supporting documentation in fact. Is that ‘special’ or ‘secret’ sauce? The HuffPo link Sam provides had had Hagan ahead from about mid May. So is this how “the comeback” begins?

  • Marc

    There is a new SurveyUSA poll (#21567) commissioned by KSN-TV of Wichita, KS. It shows 36% Roberts, 37% Orman and 10% Taylor. The question reminded respondents that Taylor does not want to run, but assumed his name was on the ballot. This poll hasn’t made it to Pollster yet. 550 LV. 9/4 to 9/7.

    • Froggy

      You might want to email the folks at Pollster and send them a link to the poll results — they miss polls sometimes, and they do put them up when informed about them. I’d do it myself, but I went off the deep end about this in 2012, to the point where I’m sure they were thoroughly sick of hearing from me, and I’ve vowed not to get sucked into doing this again, not to do things like point out to them the Kansas poll or the North Carolina poll reported here:

  • Liang Q

    Sam, when are you going to release your forecast for the gubernatorial races? I’m sure it would be very exciting to see. What do you think of the chances for some of the states with two competitive marquee races, such as Alaska, Arkansas, Colorado, Michigan, and Kansas, to see their two races diverge?

    • Sam Wang

      That’s interesting. Not that much has changed since my New Yorker piece on governors’ races, except that Scott Walker (R) in Wisconsin is now behind by a few points, as I indicated might become the case.

    • Liang Q

      I noticed in Michigan, as Peters pulled away from Land, Schauer also has caught up with Snyder, and in some polls, even moved ahead. In Arkansas, we have observed a similar trend, except that the races are moving towards the GOP, not the Democrats. In Iowa, they are moving in the opposite direction. It would be very interesting to lay out, or better yet, quantify, the impact of a gubernatorial race on the same-state senatorial race, or vice versa. Sam, do you think that’s something that can be done?

  • W.W.

    Updates state by state?

    • 538 Refugee

      24% chance of Pat Roberts losing his seat? They put Republican takeover is 57%. That’s little more than a coin flip. Once the Kansas polls catch up to the new reality even that margin may disappear in their analysis.

    • Sam Wang

      They do read PEC, actually. But once a model is developed, it has to stay pretty much the same.

      At the NYT, mouse over “Kansas” and you’ll see their background model (i.e. fundamentals) predict Roberts +28% in that race. If they had Orman’s poll margin at +5.5+/-4.5%, which is where his polls are today, their GOP control probability would probably drop to about 30%.

  • MarkS

    I used to think it mattered which party would be in power, and breathlessly followed this site and others. But now that our Democratic President has decided to be warmonger-in-chief, and since both major parties are more and more just the governmental arm of the 1%, I really don’t care anymore.

    • Amitabh Lath

      The prominent issue of our times is the response to anthropogenic global warming. In this, the two parties are far apart, and always have been.

    • mdavid.s

      Bad mistake. The choice between bad and worse in politics is so much more critical than the choice between good and better, or even good and bad. You can’t afford not to care. Otherwise, to dredge up an old truism, you wind up with precisely the government you deserve.

  • Matt McIrvin

    In the UK, people seem to have freaked out over a poll on Scottish independence that had Yes ahead. It looks like it was just an outlier, though. Somebody needs to be doing some good statistical aggregation…

  • Bernd

    Honestly, I have trouble getting this Kansas thing. As I understand it, you’re tracking the Orman/Roberts matchup on the HuffPo site and treat Orman as a D (at least this is my understanding from the file python/races.csv). Then, you have an additional Orman offset in the file matlab/Senate_November_prediction.m. But why is this Orman offset positive and not negative? Shouldn’t it account for the probability of Orman caucusing with the Republicans instead of your assumption treating him as a D?

    Additionally, there is a much more recent poll from SurveyUSA listed under the three-way matchup between Orman, Roberts, and Taylor. Wouldn’t it be more accurate to choose this one instead of the ancient PPP one?

    • Sam Wang

      The calculation is for D+I seats. Whichever way he goes, Orman is an I.

      The Orman offset is relative to the historical average, which is pre-Orman-v-Roberts. As more post-O-v-R data enters the history, I will compensate. [11-Sep 9:26am: done.]

      Kansas is hard-coded at Orman +1 sigma until HuffPost feed stabilizes. Counting SUSA he’s actually at +5.5+/-4.5%=1.2 sigma.

      (No more fake email addresses please.)

  • SJWangsnesss

    Boy, I sure hope you’re right about this. You nailed the 2012 election, so I’m hopeful!

  • Sam Wang

    Sachin – it’s all posted. Move your eyes to the left. No extra services, sorry!

  • shma

    Sam, do you have a link which explains how to use t distributions for predictions?

    Also, why do you increase the systematic error to 1% starting in October? Shouldn’t house effects remain constant over time and the uncertainty from undecideds go down over time?

    • Sam Wang

      You read my code to come up with that question. Which is great!

      I have no tutorial on t-distributions. If you know MATLAB, run this:
      foo=[0:0.02:4]; houseavg=1;
      tprobs=tcdf(foo/houseavg,1); % this is t
      normprobs=(erf(foo/houseavg/sqrt(2))+1)*50; % this is gaussian
      plot(foo,tprobs,’-k’);hold on;plot(foo,normprobs,’-r’);

      Basically the t-distribution is a kludge to make the tails fat. As the second parameter in tcdf() gets higher, the distribution gets more gaussian. It’s basically a judgment call how black-swan-y to get.

      As for the systematic error…in principle it’s a constant, unchanging all year. The prediction is more resistant to that parameter than you might think. The only time to revise it is the moment when we get to the random-walk period in October. Think of that as a pit stop.

      Above, houseavg is how much we think the mean house effect will vary from year to year. In 2004/2008/2012-Pres and 2008/2012-Sen, it was no more than 0.5%. In 2010-Senate, it was 2% (Dems outperformed polls across the board).

    • shma

      Alright, that’s simpler than I thought.

      Thanks Sam.

  • HeyHuey

    These results more closely mirror which was spot on in the 2012 election cycle. I’m more comfortable with Mr Wang’s calculus than I am 538′s.

    • Pinkybum

      But Nate Silver was also spot on in 2012 so who to believe? I have been a big believer in 538 ever since the 2008 elections but in this cycle it looks like Nate is giving too much sway to his “fundamentals” compared with the presidential election forecast. I believe Dr Wang and Dr Tennenbaum are more on the money than Nate is this cycle.

  • Michael K

    Sam, you write (in the New Yorker article):
    “Indeed, a swing of opinion of two percentage points toward Republicans in all races would be enough to account for the difference between my predictions and Silver’s.”

    Nate previously wrote:
    “We estimate that on average in midterm years since 1990, registered voter polls have had a 2.6 percentage-point Democratic bias — compared against likely voter polls, which have been unbiased.”

    So how much of the difference between the PEC and 538 predictions are attributable to 538′s R+2.6 adjustment to polls that include all registered voters without regard to their likelihood of voting?

    I assume that portion of the difference between the models will disappear as we get closer to November and polls consistently apply likely-voter filters, no?

    • 538 Refugee

      We are now dealing with likely voter polls from what I have seen so the difference would be moot at this point.

    • Michael K

      “We are now dealing with likely voter polls”

      Then I stand corrected. In that case it would be interesting to see if the historical (2.6%) bias that Nate claims was evident this time around during the transition from RV to LV polls.

      If it was, then I would have expected the PEC and 538 forecasts to have converged somewhat. It seems to me that if anything they have diverged over time up until now.

    • 538 Refugee

      I’ve read that Republicans turn out a higher percentage of registered voters, especially in mid terms, and this has been referred to as the “enthusiasm gap”. Any divergence would be the “sauce”. One has to wonder if there is an ethanol component in the formulation. We will know in two months.

  • Amin

    I’m glad I can get my daily fix now! Thanks a lot!

  • Karl Hudnut

    Ahem. You seem to have posted the wrong graph in what I shall refer to as “fig. 3″ above. (Sorry, I love “nit picking”.) Hope you see this soon and fix.

  • Chuck

    This is like going back to school! thanks

    It seems what you are saying is that the ‘fundamentals’ are actually in the polling data?

    You cannot simply add fundamentals to polling data.

    it is not a ‘black swan’ if you know it but it is for other people who don’t.

  • SFBay

    Thank you for the explanation. My statistics is rusty, but it’s logical and pleasantly opinion free.

  • Kenny

    The likeliest outcome is a 50-50 split, with Greg Orman having to decide how he will caucus.

    So 50 on one side is D+I? which include Orman? If Chad Taylor’s name stays on the bill and Orman loses, that would put the likeliest scenario at 49-51 with the Republicans in the majority?

    • FlyInTheOintment

      The latest polls show Orman would still beat Roberts even with Taylor’s name on ballot, but I am guessing it won’t be after the courts take a look at it.

  • J. R. Mole


Leave a Comment