Princeton Election Consortium

A first draft of electoral history. Since 2004

A quick note on the PEC Senate model

September 4th, 2014, 9:06pm by Sam Wang


[Update, Friday 1:00pm: The Election Day prediction now takes into account the new Kansas Senate race, scoring it as a tie.]

Welcome, new readers! Washington Post, Reddit, Krugman readers…great to see you all. I just wanted to make a few quick notes to orient everyone as to what’s going on here at PEC. We’re a bit topsy-turvy because of the Kansas Senate race. We hope to recover soon.

The main thing to know about PEC’s calculations is that we only use polling data. This approach led us to have a perfect record of Senate forecasting in 2012. In 2010, we missed one race – the Nevada Reid-vs.-Angle race. Our track record is excellent.

We do not use “fundamentals” at all, as practiced by The Upshot, The Monkey Cage, or FiveThirtyEight. To learn more about why this matters, read my June piece in POLITICO. In 2012 I argued that fundamentals are useful research tools, but may be unsuitable for everyday forecasting.

The banner above lists two probabilities. The first number is a “snapshot” view of current polling conditions. It states how an election held today would turn out. It takes into account the new Orman(I)-vs.-Roberts(R) matchup in the Kansas Senate race. In an election held today, Democrats+Independents would control the chamber with 90% probability.

The second number is an Election Day probability – a real prediction of Democratic+Independent control in the November election. it is based on treating the snapshot as a random walk that has fluctuated from day to day since June. I won’t get into it now, but here’s how it worked in the 2012 Presidential race.  Based on this approach, the Election Day probability of Democratic/Independent control is 65%.

However, the Election Day prediction does not take into account yesterday’s developments in the Kansas race. If Kansas were redefined as a tossup race, the Election Day probability of Democratic/Independent control would rise to between 70% and 85%. I am thinking about how (and whether) to implement that.

Tags: 2014 Election · Senate

32 Comments so far ↓

  • Lojo

    538 and Upshot are saying the same thing as Sam – namely that Senate is a toss up. MSM – not Nate – is saying that 60% means certainity rather than toss up. Sam is just a bit on the other side.

    Is this still correct Sam? Or is your model now moving away from toss up status into truly lean Dem situation?

  • ArcticStones

    Question: Are there any legitimate studies of GOTV efforts in past elections? I would really like to see some reflections on the upper/lower potential impact of Democratic and Republican GOTV efforts, respectively.

  • Bob Grundfest

    New NYT Upshot data has the GOP winning 8 states. The big shift in their new poll is away from Begich in Alaska. We’ll see if online panel data from YouGov is as reliable as traditional polling methods. Slather on the sauce.

    http://www.nytimes.com/2014/09/08/upshot/shift-in-alaska-helps-republicans-retain-senate-edge.html?hp&action=click&pgtype=Homepage&version=HpSum&module=second-column-region&region=top-news&WT.nav=top-news&abt=0002&abg=0

    • Kenny Johnson

      Their polls only option (with the other “secret sauce ” removed in their “Make Your Own Forecast” comes up with Dems staying in control at 66%.

      It will be interesting to see if a) the polls shift to match the baked in fundamentals, b) the fundamentals and other secret sauce are removed from the predictions before the election and/or c) if the polls only model that Sam uses is the better “predictor” than all these other methods.

    • Joseph

      “Fundamentals” is clearly the latest gimmick that permits twisting poll results into the desired direction. Thank you, Dr. Wang, for giving us an alternative! We owe you our sanity.

  • Dean

    This morning I saw Nate Silver on one of the MSM Sunday news shows, and he said that Republicans are favored to win the Senate by 64%. He was talking about fundamentals, like presidential approval rating.

    Other pundits were also saying this morning that Republicans are favored to win. They were using fundamentals like an energized Republican base, red states that Romney won, etc.

    Fundamentals are things, if I understand correctly, that are “baked in the cake” of the polls-only approach. Fundamentals like presidential approval are expressed in people’s opinions measured in current polls, and in the poll aggregation needed to make current accurate forecasts.

    I sure hope that Dr. Wang’s model prevails. It gives me a happier outlook.

  • Bert

    I don’t think it looks good for Dems right now in Arkansas, Louisiana, and Alaska. Which means NC is probably the make or break race for Dems to have a shot at maintaining control. And that is assuming that Orman wins and doesn’t caucus with the GOP.

  • Wendy Fleet

    C’mon Sam you sleep?! We want/need articles every day, every hour — only The Future of HumanKind depends upon it. (My fear that Human UnKind will win the Senate sets off Boschian movies in my head.) I am the Beloved Planet’s most optimystic person except about the 2014 elections in which I fear being in my Usual Joy lest I jinx something anything whoknowswhat. Vitamin Sam is the only dissuader of despair. Sleep less, write more, dude . . .

  • RPF

    I think I see the problem. You are thinking that the errors are all due to sampling. They are not. Part of the prediction errors are due to the underlying thing that is predicted moving in unexpected ways.

    • Pinkybum

      I feel like you are obsessing about a macro event pushing the polls all one way. However, there is no way this could be predicted with any great accuracy and I don’t think Sam should. His election day forecast is better – assume a random walk using variances seen so far (or historically).

  • RPF

    I did not mean to use an example, Rob Ford, to mean to correct for single events. it was my idea of a macro event that would move votes in all or almost all states. My point is that there are hundreds of unexpected macro events in 60 days. On average such events may zero out wrt the point estimate but they will balloon the variance of the 60-day estimator – because of covariance effects.

  • RPF

    The covariances issue is NOT about the mean point estimates. It’s about their distribution. I fear Mr Wang is understating the variance of his forecasts. Problem is if something happens, as it often does, it could effect all the estimates. Like make them all too high. That’s a positive covariance of the errors. Sam’s formula for the variance from which he gets his 65% number integrates a distribution in which he has assumed the covariances to be zero. But if there is a macro event the covariances will be positive. The variance of the sum of 33 events (elections) is the sum of the 33 individual event variances plus the 956 covariances. For portfolios of events, the covariances are likely much more important than the individual variances – that’s why Finance is all about covariances. Regardless, Mr Wang seems to be doing well.

    • Sam Wang

      On the other hand, I do not correct for individual pollsters, thus leaving in some variance.

      I really think this is best fixed at the last step, in a lumped constant.

      Final advice: even when I show you all that goes into the sausage (which I will), don’t feel obliged to look at it too hard.

      (by the way, I believe it’s 36 races this year)

    • Amitabh Lath

      Hi RPF, the effect you are discussing — a single event moving all the state estimates — should rightly be called an offset. Examples from the 2012 presidential would be the 47% remark by Romney, or the aftermath of the first debate.

      A covariance is two variables (states?) moving in concert. So if, say, AK and LA moved up or down together, beyond statistical jitter. This could happen if both states were thinly polled and a single pollster with a heavy bias did both states.

      But, given there are plenty of polls, and several that are state-specific, I fail to see how this could be a significant effect.

  • bks

    The problem as I see it is that Sam is now becoming a major node in the network of minds that constitute the election prediction gemish. I can just see a board meeting at Rasmussen Reports with a powerpoint slide on the screen showing the difference between the last Rasmussen prediction and the day’s PEC numbers and the President deciding to put his other thumb on the scale. –bks

  • RPF

    The pollsters need not be biased. I was actually thinking of some sort of country-wide event that shifts votes from R to D or vice versa, e.g., Canada begins bombing NY. Point is all the polling error covariances will be positive and there are a LOT of them – for 33 races thats 33 individual variances and over 950 covariances. Over short periods the chances of a Canada invasion-type event are small. But over 60 days? I fear Rob Ford may attack at any time. (I’ll shut up)

    • Sam Wang

      Ah yes. The black swan. I cover that too. More soon.

    • Amitabh Lath

      I am not getting the point about covariances. Is the issue that if the same pollster is polling several different states, any “house bias” that pollster has will affect several states in the same way?

      Please explain!

  • Joseph

    OK. I’m going to go out on a limb and suggest why strictly using polling versus “fundamentals” might be working better at real world prediction. I’ve felt for a while now that some of these pollsters (Rasmussen comes to mind) have their fingers on the scale. They do that to build enthusiasm and help GOTV. So by NOT filtering them, you in effect create a built-in weighting of the polls in favor of the Republicans.

    • Joseph

      Just to clarify: Weighting with “fundamentals” assumes that GOTV is going to favor the Republicans. If the sum of the polls favors the Republicans (because of fingers on the scale), then “fundamentals” force those forecasters to overshoot the mark in favor of the Republicans. Ergo, not using “fundamentals” comes closer to the actual results.

  • RPF

    Last year when I asked, you said you did not bother to account for error covariances. Since you got em all right, I guess that was not important. Regardless, Your multi-step random walk, for example, might have a different spread and therefore different probs with the cov terms .

    • Sam Wang

      It was a good question then, and it is now too. Here is how I think about it.

      Basically, I think that one state’s polls contribute only very modestly to information for another state. For practical purposes, to first order it is sufficient to treat them all as independent variables.

      One exception to this is the possibility that as a community, pollsters are biased by some average amount. For Senate control, this is where the Meta-Margin can be useful. The Meta-Margin can be used to model across-the-board swings — or covariance of the type you suggest. One can model the effects of covariance in the following manner: (a) estimate how much the bias or covarying error is; (b) compare it with the Meta-Margin. That’s actually what I do to calculate the Election Day win probability. I’m currently playing with a pollster bias of +/-0.5% or +/-0.7% (the win probability’s not affected much by which choice I make). Finally, I note that because we have the Meta-Margin from June until now, we actually have a lot of implicit information about covariance.

      Back to single-state win probabilities. There I would take a different approach. In that case, I would add the uncertainty to the single-state margin, then calculate the single-state probability. As you can see, this sort of dodges the covariance question. Basically I think covariance is an interesting can of worms, which I am deliberately leaving out of the analysis. It’s not professional statistics, man!

  • Scott Supak (@ssupak)

    Oh how I miss Intrade…

    The play money prediction markets are putting this at about 75% chance for the GOP…

    I’m shorting as much as I can.

  • W.W.

    Are you going to be putting out individual race likelihoods at all?

  • ArcticStones

    “…the Election Day prediction does not take into account yesterday’s developments in the Kansas race”

    Yes, I think you should make adjustments to fully account for that development.

    • Sam Wang

      Adding Orman v. Roberts as a toss-up has the effect of shifting the Meta-margin by 0.47%. That is now added to the November prediction.

      Although this is defensible, I will be glad when it’s not necessary. Such “fixes” make me itch…

  • tfitznc

    As I understand it, utilization of polling data in the PEC model does not include so-called fundamentals that I have always thought of as ‘opportunities’ to introduce history (including previous voting patterns), and overt bias such as may have affected Republicans with Romney.

    While I do understand the necessity of including controls on house effects, I don’t understand how we can assume that the error variance due to plummeting contact rates/cell phone use is randomly distributed. No doubt Sam has already covered this.

    BTW, do we have a hypothesis why PEC missed the Reid-Angle race e.g. was there relative under-polling?

    • Sam Wang

      My miss in the 2010 Nevada Senate race was quite unexpected at the time. Nevada had three factors that made polling hard: (a) a lot of people moving into and out of the state, (b) a high cell-phone-only population, and (c) a high Hispanic population. In other races as well, when pollsters have missed, one or more these factors seem to be at play. Since 2010, pollsters have developed and adopted cell-phone and internet methods. The problems should diminish somewhat.

      In regard to house effects, in my view the question is what the overall average bias is. This year there’s a lot of chatter among polling nerds about the low quality of Senate polls. In the past I would have said the average bias was less than 0.5%. What is it now? This is worth investigating. One approach would be to do it the easy way, i.e. measure it after the election. But can it be determined beforehand? hmmm.

    • ArcticStones

      Sam, I have a suggestion: The key is situations where you have two, or three or more polls within a short space of time, and these deviate. In these cases, I believe one should be able to take a close look at each poll to determine whether its deviation, if any, is caused by noise, poor quality or bias.

  • 538 Refugee

    In science if your data doesn’t support your hypothesis then you generally throw out your hypothesis. You might collect new data (take another poll) to make sure the data was good. Making the data fit the hypothesis (let’s call them fundamental adjustments) is somewhat frowned upon.

    I think what Sam is showing is that individual pollsters may make errors in their sampling but overall/collectively they do a decent job getting the data right. Making adjustments to their work is not needed. It will be a sad day when this model doesn’t work.

  • SFBay

    Thanks for the explanation of your model. I know you don’t use fundamentals in your prediction which takes opinions out of the prediction. There is psychological effect on voters I think, regardless of how the prediction is made. I just hope that your strictly poll driven model gets as much play in the news as other predictors do. It will help counteract those making predications who are at least as interested in their own notoriety as they are in they predictions.