If we want to forecast House control in 2014 without drilling into individual Congressional districts, we need to know two things:
- What national popular vote margin is needed to flip House control; and
- What the likely range for the national popular vote margin will be.
For #1, last week here at PEC I estimated that a margin of D+4% to D+5% (i.e. Democrats win popular vote by 4-5%) would be necessary. My estimate is not far from other analysts.
However, a notable outlier is Alan Abramowitz at the Crystal Ball, who claims D+13% is needed. This appears to be an error of overfitting, which I have previously mentioned in relation to The Monkey Cage* and FiveThirtyEight. That is not bad company…but seeing as how this problem is a recurring one, I would like to get into the details. It might reduce the possibility of similar future missteps.
First, let me show you the nature of the problem:
This is a graph (previously explained here) of House elections from 1946-2012, with the Democratic-Republican seat margin plotted as a function of the Democratic popular-vote margin. The shaded gray zone indicates a maximum region for all data not including the Great Gerrymander of 2012. The X’s indicate the PEC analysis (red, with error bar) and the Abramowitz prediction (green).
How could Abramowitz have come up with a prediction so far from post-WWII norms? He reports that his result comes from a best-fit equation of the form
PCRHS = 127.0 – (.54*CRHS) – (1.35*PRPM) + (1.73*RGBM)
This is a fit to predict PCRHS = predicted change in Republican House seats, based on 17 midterm elections. The three free parameters are CRHS = current Republican House seats, PRPM= previous Republican presidential margin and RGBM = Republican generic ballot margin.
All those coefficients give the appearance of precision. However, there are no uncertainties (error bars) given. And as a general rule, adding more free parameters captures more of the variation…but also increases the uncertainty of the prediction. I don’t know what his error bar is, but I imagine it’s at least 6%. In other words, I think this elephantine fit is waving its trunk a bit.
(Update, to reflect comments: We have to know the error bars are on the coefficients to allow an estimate on the uncertainty in the “D+13%” estimate. For example, if the coefficient on the most sensitive parameter, CRHS (0.54), has an uncertainty of +/-0.07, that contributes an error of +/-0.07*234 = +/-16 seats. If you look at Abramowitz’s table, that corresponds to a +/-9% error in his estimate of the necessary value for the generic House ballot estimate. And ”D+13+/-9%” is not what readers think when they see “D+13%.”)
Uncertainties also get worse when a model is driven out of the range of data used to generate the fit. The parameter CRHS is in a highly unnatural place this year relative to PRPM, due to the Great Gerrymander of 2012.
I think this goes to show the difficulty of using linear regression, which sounds simple but has hidden problems. My view is that for this kind of data, fits are of limited use. I never do anything more complicated than single-variable regression if I can help it. Even then, I only do a linear fit if I have a clear and fairly simple idea of the reason for the relationship. And, of course, error bars are a must.
Long story short: Democrats face an uphill climb in 2014…but it’s not the north face of the Eiger.
*In comments, Sides points towards this reply from last year regarding the uncertainties arising from multiparameter models.