Technical notes on the House prediction

October 6, 2012 by Sam Wang

To the reader: This post focuses on technical notes regarding the House prediction. It is not a popular essay, but is for diehard geeks. Additional notes will be added here.

All error bars below are 1-sigma values. Underline indicates a parameter that is used for the calculation.

Part 1: Converting national vote share to seat count.

I have broken this question down into (i) the relationship between national House popular vote, 1946-2010, and seat count; (ii) effects from immediately preceding Congress (“incumbency effects” and other historical effects); and (iii) the effect of redistricting for the 2012 election.

(i) Popular vote as a function of seat count.

This is calculated using a linear fit of the form

(seat margin) = a0a1 * (%vote margin)+ a2 * (previous Congress seat margin)

where margins indicate the Democratic-minus-Republican difference. Both a0 and a2 are needed to effectively correct the generic Congressional poll margin.

The addition of a2 decreases the residuals considerably, and leads to a modest increase in parameter uncertainties. As I have written before, adding more parameters fails to meet these criteria, and may constitute overfitting.

From 2002-2010, a0 = -3.3 +/- 8.2 seats and a1 = 6.2 +/- 1.1 seats/%vote.

From 1992-2010, a0 = -0.5 +/- 6.2 seats and a1 = 6.8 +/- 1.0 seats/%vote. The ratio a0/a1 is R+0.1+/-0.9%.

From 1948-2010, a0 = +5.9 +/- 4.8 seats and a1 = 8.0 +0.5 seats/%vote.

The parameter a1 appears to be smaller over the last 20 years compared with post-WWII. This might be a reflection of increased incumbent advantage and/or redistricting.

(ii) Historical effects (“incumbency”). An incumbent’s advantage has been estimated to be as high as 5-8%. This could affect both a1 and a2. The generic Congressional ballot is a direct measurement of opinion, and therefore is likely to already capture the effects of this advantage. For this model, the question is how to estimate the macro-level advantage.

Because I previously referred to a2 as reflecting incumbency, I will continue to refer to it that way. The macro-incumbency advantage for 2012, based on recent data, gives estimates that go all over the place when even one data point is added or removed. It is not a stable parameter, suggesting other effects that require district-by-district analysis. Here, I use as much data as possible to get the error down. For 1948-2010, a2=0.2+/-0.1, which in units of generic Congressional ballot translates to a macro-incumbency advantage of R+1.2+/-0.4%.

(iii) Redistricting. From 2010 to 2012, the net overall shift in PVI distribution is R+0.62 +/- 0.06%. Because the seats-vs.-vote data above have a similar slope to the PVI distribution, I assume that this shift will translate fully to an effective change to the seats-vs.-vote relaitonship. Therefore the relationship in (i) requires a redistricting correction of R+1.2+/-0.1%.


Part 2: Estimating the national Congressional vote.

This is done by taking a median of all post-RNC/DNC convention generic Congressional preference polls. Aggregated-poll performance from RealClearPolitics suggest that these polls do a good job of predicting the final national vote. They are not perfect – a discrepancy can arise in the home stretch of up to 2-3%. Therefore the nominal error bar on a polls-now snapshot must include +/-2% uncertainty.


Part 3: Estimating future movement by Election Day.

Movement should be at least comparable to Presidential movement, which at >20 days from the election I have estimated as +/-1.8%. Congressional movement is likely to be greater because of low attention to local Congressional races. I make a baseline assumption that the movement in opinion is +/-2%.

Possible corrections:

  • In a Presidential year, movement tends to be toward the Presidential winner. In a midterm year, movement tends to be away from the incumbent President. This would suggest that I should assume movement toward President Obama, by about D+2% to D+3%.
  • The Meta-Margin is currently above its average for the season. If House polls followed Presidential preference (coattails), this would give an average R+0.5%.
  • As of October 6, national House undecided voters are 10.5+/-0.6%, considerably higher than undecideds in the Presidential race (5%). This is a likely source of the break toward/away from the President’s party. If it were to break in proportion to Obama/Romney preference, it would give a net D+0.5%.
  • A recent event, the debate…to quote the Rude Pundit, “Obama may have done more to depress voter turnout than all the i.d. laws combined.”

Taking into account these and other possibilities I have not thought of, it would seem safe to stay with a symmetric assumption. I will assume +/-2% movement in either direction, symmetric around zero.

The combined errors from Parts 2 and 3 above are sqrt(2*2+2*2) = 3%. Therefore the estimate of Election Day generic Congressional preference is post-convention median, with an error bar of +/-3%.

This is converted to an “effective” margin that takes into account incumbency and reedistricting as follows:

(effective margin) = (predicted true generic Congressional preference) + a0/a1 + (incumbency advantage) + (redistricting advantage)

Currently, that is

(D+2.5 +/-3.0) + (R+0.1+/-0.9) + (R+1.2+/-0.4) + (R+1.2+/-0.1) = D+0.0 +/-3.2%.

Converted to seat margins, this gives a seat margin of D+0 +/- 22 seats. 1-sigma prediction: median D 217.5 +/- 11 seats, R 217.5 +/- 11 seats.

Predictions: D+2.5+/-3.0% popular vote, D 217 +/- 11 seats R 218 +/- 11 seats. Democratic control: 50%.


Jack Rems says:

I noticed the top post headline changed from “Predictions 10/6: House of (un)Representatives ” to “Predictions 10/6: House of (mostly)Representatives ” and then back. Or I hallucinated it.

Sam Wang says:

You’re losing it. 😉 erm, which seems more accurate?

William Ockham says:

It is a mistake to assume that there is actually a higher percentage of undecideds who will actually end up voting in the congressional elections. In presidential election years, there are always more votes for the race at the top of the ballot than for the U.S. House races.
The higher percentage of undecideds reflects the fact that there are a huge number of voters who won’t vote in their House race (>10 million, 8-9% of the electorate in the presidential race).

Sam Wang says:

Good thing I didn’t make that assumption.

Craig Barber says:

Could gerrymandering manifest itself either as a correction (i.e. R+1.2) or as an “inelasticity” impacting the effect of other corrections? (Hmmm, I’m being influenced by the CW that the GOP in fact tried to simply lock in their present numerical advantage.)

Sam Wang says:

If you look at the slope (seats per % vote), it has decreased in recent years. That might be a consequence of incumbent protection. If it continues, it would tend to make all effects smaller.

Leave a Reply

Your email address will not be published. Required fields are marked *