I’m preparing a long-form piece (for elsewhere) on the topic of partisan House gerrymandering. We’re cooking up some graphs to drive home some basic points. Your immediate reactions and critical questions will be welcome.

This graph shows what fraction of the two-party vote would have been needed for Democrats to control the House of Representatives.

The procedure was:

- Calculate the % two-party vote for all 435 districts.
- Calculate the shift in vote needed to make an outcome of exactly 218 Democratic seats.
- Add this shift to the national % Democratic vote.

The colored horizontal line segments indicate which party was in control. Generally, the out-party needs a bit more than 50% of the two-party vote to gain control. This extra barrier is an advantage for the incumbent party.

*Note 1:* Dealing with uncontested races is a challenge. For instance, the 2006 data point is distorted by the fact that there were 47 uncontested races won by Democrats (versus only 10 won by Republicans). Forty-seven is an unusually high number. With other definitions, this data point is more comparable to 1996-2004.

*Note 2:* I came into this analysis expecting the 2012 value to be unusually high because of partisan gerrymandering. It is indeed high – but it is only on a par with 2004. I am pondering if there is a problem I am missing.

This post will self-destruct in 12 hours.

]]>The code is a bit of a mess: mysterious variable names, bad structure, that kind of thing. I’ll clean it up later.

If you have 2010 or earlier House voting data in tabular form, let me know. It will allow additonal tests.

]]>I miss my commenters! Let’s see if Facebook-based threads are sustainable. Open discussion thread for the Presidential race. Ro-mentum, early voting, whatever…**have at it!**

I’ve identified the districts – now I need a way to display them conveniently. The ideal tool would be a compact app that uses a ZIP code to return the nearest three swing CDs, along with links to resources such as Pollster.com and campaigns (both D and R). For example, in California the swing districts are CA-07, 09, 10, 24, 26, 41, and 52. These are places where Get-Out-The-Vote (GOTV) activity would be most effective – for either side.

The swing districts are listed after the jump. Write me directly (left sidebar, About Us).

**Update for the very knowledgeable:** in one solution, the key missing piece of information is GIS-friendly Congressional district boundaries. If you have those…swoon!

**Pacific Coast states**

CA-07

CA-09

CA-10

CA-24

CA-26

CA-36

CA-41

CA-52

WA-01

**Arizona/Nevada/Utah/Colorado**

AZ-01

AZ-09

CO-03

CO-06

NV-03

NV-04

UT-04

**Midwest**

IA-03

IA-04

IL-10

IL-11

IL-12

IL-13

IL-17

IN-08

KY-06

MI-01

MI-11

MN-08

OH-06

OH-16

WI-07

**South, including Texas**

FL-10

FL-18

FL-22

FL-26

GA-12

NC-07

TX-23

**New England**

CT-05

MA-06

NH-01

NH-02

RI-01

**Northeast**

NJ-03

NY-01

NY-11

NY-18

NY-19

NY-21

NY-24

NY-27

PA-08

PA-12

All error bars below are 1-sigma values. **Underline** indicates a parameter that is used for the calculation.

**Part 1: Converting national vote share to seat count.**

I have broken this question down into (i) the relationship between national House popular vote, 1946-2010, and seat count; (ii) effects from immediately preceding Congress (“incumbency effects” and other historical effects); and (iii) the effect of redistricting for the 2012 election.

*(i) Popular vote as a function of seat count.*

This is calculated using a linear fit of the form

(seat margin) = *a0* + *a1* * (%vote margin)+ *a2* * (previous Congress seat margin)

where margins indicate the Democratic-minus-Republican difference. Both *a0* and *a2* are needed to effectively correct the generic Congressional poll margin.

The addition of *a2* decreases the residuals considerably, and leads to a modest increase in parameter uncertainties. As I have written before, adding more parameters fails to meet these criteria, and may constitute overfitting.

From 2002-2010, a0 = -3.3 +/- 8.2 seats and a1 = 6.2 +/- 1.1 seats/%vote.

From 1992-2010, ** a0 = -0.5 +/- 6.2** seats and

From 1948-2010, *a0* = +5.9 +/- 4.8 seats and *a1* = 8.0 +0.5 seats/%vote.

The parameter *a1* appears to be smaller over the last 20 years compared with post-WWII. This might be a reflection of increased incumbent advantage and/or redistricting.

*(ii) Historical effects (“incumbency”).* An incumbent’s advantage has been estimated to be as high as 5-8%. This could affect both *a1* and *a2*. The generic Congressional ballot is a direct measurement of opinion, and therefore is likely to already capture the effects of this advantage. For this model, the question is how to estimate the macro-level advantage.

Because I previously referred to *a2* as reflecting incumbency, I will continue to refer to it that way. The macro-incumbency advantage for 2012, based on recent data, gives estimates that go all over the place when even one data point is added or removed. It is not a stable parameter, suggesting other effects that require district-by-district analysis. Here, I use as much data as possible to get the error down. For 1948-2010, *a2*=0.2+/-0.1, which in units of generic Congressional ballot translates to a **macro-incumbency advantage of R+1.2+/-0.4%**.

*(iii) Redistricting.* From 2010 to 2012, the net overall shift in PVI distribution is R+0.62 +/- 0.06%. Because the seats-vs.-vote data above have a similar slope to the PVI distribution, I assume that this shift will translate fully to an effective change to the seats-vs.-vote relaitonship. Therefore the relationship in (i) requires a **redistricting correction of R+1.2+/-0.1%**.

>>>

**Part 2: Estimating the national Congressional vote.**

This is done by taking a median of **all** post-RNC/DNC convention generic Congressional preference polls. Aggregated-poll performance from RealClearPolitics suggest that these polls do a good job of predicting the final national vote. They are not perfect – a discrepancy can arise in the home stretch of up to 2-3%. Therefore the nominal error bar on a polls-now snapshot must include +/-2% uncertainty.

>>>

**Part 3: Estimating future movement by Election Day.**

Movement should be at least comparable to Presidential movement, which at >20 days from the election I have estimated as +/-1.8%. Congressional movement is likely to be greater because of low attention to local Congressional races. I make a baseline assumption that the movement in opinion is +/-2%.

Possible corrections:

- In a Presidential year, movement tends to be toward the Presidential winner. In a midterm year, movement tends to be away from the incumbent President. This would suggest that I should assume movement toward President Obama, by about D+2% to D+3%.
- The Meta-Margin is currently above its average for the season. If House polls followed Presidential preference (coattails), this would give an average R+0.5%.
- As of October 6, national House undecided voters are 10.5+/-0.6%, considerably higher than undecideds in the Presidential race (5%). This is a likely source of the break toward/away from the President’s party. If it were to break in proportion to Obama/Romney preference, it would give a net D+0.5%.
- A recent event, the debate…to quote the Rude Pundit, “Obama may have done more to depress voter turnout than all the i.d. laws combined.”

Taking into account these and other possibilities I have not thought of, it would seem safe to stay with a symmetric assumption. I will assume **+/-2% movement in either direction, symmetric around zero**.

The combined errors from Parts 2 and 3 above are sqrt(2*2+2*2) = 3%. Therefore **the estimate of Election Day generic Congressional preference is post-convention median, with an error bar of +/-3%**.

This is converted to an “effective” margin that takes into account incumbency and reedistricting as follows:

(effective margin) = (predicted true generic Congressional preference) + *a0/a1* + (incumbency advantage) + (redistricting advantage)

Currently, that is

(**D+2.5 +/-3.0**) + (R+0.1+/-0.9) + (R+1.2+/-0.4) + (R+1.2+/-0.1) = **D+0.0 +/-3.2%**.

Converted to seat margins, this gives a seat margin of D+0 +/- 22 seats. 1-sigma prediction: **median D 217.5 +/- 11 seats, R 217.5 +/- 11 seats.**

**Predictions: D+2.5+/-3.0% popular vote, D 217 +/- 11 seats R 218 +/- 11 seats.** Democratic control: 50%.

The Democratic candidate is a physicist, Bill Foster. He is one of only three physicists to have ever served in Congress (the others were Vern Ehlers, R-MI, and Rush Holt, my own Congressman). Foster has been involved in research relating to the top quark and to the Supernova 1987A neutrino burst, and in his youth created a company that manufactures theater lighting equipment.

Foster is also a former Congressman from the Illinois 14th District. He is attempting to make a comeback against Judy Biggert (R), a longtime Congresswoman. Therefore their records can be compared directly, from the DREAM Act to Fermilab. They are running in the new 11th District, and this race is right on the edge. I’ll be seeing Foster today, along with Representative Holt, at our local Triumph Brewing Co. Here’s the invitation. (And here is Judy Biggert’s site.)

]]>(1) Set a Bayesian prior for the Meta-Margin by calculating average and SD for June-September 2012, using a t-distribution (3 d.f.) to generate the shape. In practice the tails do not matter, but leave them in. Result: Obama +3.26 +/- 1.02 %.

(2) Calculate the distribution of forward-going change in June-September 2012 to estimate the probable amount of divergence by November 6th. The approximate expression for the divergence is *d* = 0.4*sqrt(N) for N<=20 days, and *d* = 1.8% for N>20 days. The sqrt(N) indicates random walk-like behavior. Calculate a Gaussian with width parameter *d*.

(3) Multiply the distributions in (1) and (2) to get a final predicted distribution of Meta-Margins. From this calculate the mean, 1-sigma (68%), and 2-sigma (95%) confidence intervals. Convert all three to units of EV using 2012 data to interpolate.

(4) For the red zone, plot the 68% confidence interval. Plot as a diverging zone from today’s snapshot.

(5) For the yellow zone, plot the union of the snapshot 95% CI (gray zone today) and 95% predicted CI (step 3 above). Plot as a zone starting from today’s 95% CI.

And here is the MATLAB script.

>>>>>>>>

% First, input parameters (pass MM to it or leave the first line)

%

% Where are we today?

MM=5.06 % today’s Meta-Margin

MMdrift=1.8

N = 38 % days until election

%N=max(N,1) % seat belt

%N=datenum(2012,11,6)-today; % assuming date is set correctly in machine

%MMdrift=min(0.4*sqrt(N),1.8) % random-walk drift as seen empirically

%MMdrift=max(MMdrift,0.2) % just in case something is screwy with date

% cover range of +/-4 sigma

Mrange=[MM-4*MMdrift:0.02:MM+4*MMdrift];

% What is near-term drift starting from conditions now?

now=tpdf((Mrange-MM)/MMdrift,3); % long-tailed distribution. you never know.

now=now/sum(now);

% What was long-term prediction? (the prior)

M2012=3.26; M2012SD=2.2; % parameters of long-term prediction

prior=tpdf((Mrange-M2012)/M2012SD,1); %make it really long-tailed, df=1

prior=prior/sum(prior);

% Combine to make prediction

pred=now.*prior; % All hail Reverend Bayes

pred=pred/sum(pred);

plot(Mrange,now,’-k’) % drift from today

hold on

plot(Mrange,prior,’-g’) % the prior

plot(Mrange,pred,’-r’) % the prediction

grid on

% Define mean and error bands for prediction

predictmean=sum(pred.*Mrange)/sum(pred)

for i=1:length(Mrange)

cumulpredict(i)=sum(pred(1:i));

end

Msig1lo=Mrange(min(find(cumulpredict>normcdf(-1,0,1))))

Msig1hi=Mrange(min(find(cumulpredict>normcdf(+1,0,1))))

Msig2lo=Mrange(min(find(cumulpredict>normcdf(-2,0,1))))

Msig2hi=Mrange(min(find(cumulpredict>normcdf(+2,0,1))))

% Now convert to EV using data from mid-August and some added points at the

% ends. If the race swings far, these endpoints need to be re-evaluated.

mmf=[-1.48 -.74 0 .74 1.4800 1.8125 2.1383 2.5667 3.3200 3.7400 4.2000 4.6600 5.1050 6 7 8 9 10 11 12];

evf=[247 258 269 280 290 299.25 304.1667 310.0000 321.6667 328 343 347 347 347 347 347 347 358 369 383];

bands = interp1(mmf,evf,[predictmean Msig1lo Msig1hi Msig2lo Msig2hi],’spline’);

bands = round(bands)

ev_prediction = bands(1);

ev_1sig_low = bands(2);

ev_1sig_hi = bands(3);

ev_2sig_lo = bands(4);

ev_2sig_hi = bands(5);

bayesian_winprob=sum(pred(find(Mrange>=0)))/sum(pred)

drift_winprob=tcdf(MM/MMdrift,3)