Dear PEC readers, I have a math puzzle. It relates to my gerrymandering project. If you are good at working with probability distributions, take a look. Can you solve it?

Here is the puzzle. It is basically a closed-form calculation of the numerical simulations I did for that NYT piece. It is for a peer-reviewed paper I am writing on how to establish criteria for fair Congressional districting.

**Partitioning of voters in a state with randomly selected districts. **Imagine a state with N districts, and a two-party winner-take-all system (i.e. the U.S. system for electing House members). Select districts at random from a distribution whose vote share v (for party #1) follows a near-Gaussian distribution whose average is A and standard deviation is S.

Now add the condition that the statewide two-party vote yields a fraction F_{1} of votes for political party #1 (and of course the other party gets F_{2}=1-F_{1}). Therefore districts v_{1..N} must satisfy the constraint sum(v_{1..N})/N=F_{1}.

What is the probability distribution of k, where k is defined as the number of districts in which v_{i}>0.5? Give the mean and SD of the expected number of seats to be won by party #1. Also describe the degree to which the distribution resembles a Gaussian.

*P.S. When F _{1 }is close to A, I believe the answer is approximately <k> = N*p and std(k) = sqrt [N*p*(1-p)], where p = normcdf(F_{1},0.5,S). If you can do better, let me know!*

P.P.S. Here is a rephrasing of the problem: *Consider a normally distributed variable with mean mu and standard deviation sigma. Draw from it k times. You only accept sets of draws whose average is constrained to be mu’, which is unequal to mu. What is the distribution of the draws?*

*P.P.P.S. Probably solved. It’s as above, except instead of normcdf(F_{1},0.5,S) we have normcdf(F_{1},0.5,S*sqrt((k-1)/k)). This arises in a semi-obvious way from the derivation of the standard error of the mean.*

>>>

The gift I have in mind is kind of small: a signed copy of either (or both) of my books. I will see if I can think of something nicer to send…

]]>I am traveling, and will reply soon. You’ll want to read it. Check back soon!

]]>I’m preparing a long-form piece (for elsewhere) on the topic of partisan House gerrymandering. We’re cooking up some graphs to drive home some basic points. Your immediate reactions and critical questions will be welcome.

This graph shows what fraction of the two-party vote would have been needed for Democrats to control the House of Representatives.

The procedure was:

- Calculate the % two-party vote for all 435 districts.
- Calculate the shift in vote needed to make an outcome of exactly 218 Democratic seats.
- Add this shift to the national % Democratic vote.

The colored horizontal line segments indicate which party was in control. Generally, the out-party needs a bit more than 50% of the two-party vote to gain control. This extra barrier is an advantage for the incumbent party.

*Note 1:* Dealing with uncontested races is a challenge. For instance, the 2006 data point is distorted by the fact that there were 47 uncontested races won by Democrats (versus only 10 won by Republicans). Forty-seven is an unusually high number. With other definitions, this data point is more comparable to 1996-2004.

*Note 2:* I came into this analysis expecting the 2012 value to be unusually high because of partisan gerrymandering. It is indeed high – but it is only on a par with 2004. I am pondering if there is a problem I am missing.

This post will self-destruct in 12 hours.

]]>The code is a bit of a mess: mysterious variable names, bad structure, that kind of thing. I’ll clean it up later.

If you have 2010 or earlier House voting data in tabular form, let me know. It will allow additonal tests.

]]>I miss my commenters! Let’s see if Facebook-based threads are sustainable. Open discussion thread for the Presidential race. Ro-mentum, early voting, whatever…**have at it!**

I’ve identified the districts – now I need a way to display them conveniently. The ideal tool would be a compact app that uses a ZIP code to return the nearest three swing CDs, along with links to resources such as Pollster.com and campaigns (both D and R). For example, in California the swing districts are CA-07, 09, 10, 24, 26, 41, and 52. These are places where Get-Out-The-Vote (GOTV) activity would be most effective – for either side.

The swing districts are listed after the jump. Write me directly (left sidebar, About Us).

**Update for the very knowledgeable:** in one solution, the key missing piece of information is GIS-friendly Congressional district boundaries. If you have those…swoon!

**Pacific Coast states**

CA-07

CA-09

CA-10

CA-24

CA-26

CA-36

CA-41

CA-52

WA-01

**Arizona/Nevada/Utah/Colorado**

AZ-01

AZ-09

CO-03

CO-06

NV-03

NV-04

UT-04

**Midwest**

IA-03

IA-04

IL-10

IL-11

IL-12

IL-13

IL-17

IN-08

KY-06

MI-01

MI-11

MN-08

OH-06

OH-16

WI-07

**South, including Texas**

FL-10

FL-18

FL-22

FL-26

GA-12

NC-07

TX-23

**New England**

CT-05

MA-06

NH-01

NH-02

RI-01

**Northeast**

NJ-03

NY-01

NY-11

NY-18

NY-19

NY-21

NY-24

NY-27

PA-08

PA-12

All error bars below are 1-sigma values. **Underline** indicates a parameter that is used for the calculation.

**Part 1: Converting national vote share to seat count.**

I have broken this question down into (i) the relationship between national House popular vote, 1946-2010, and seat count; (ii) effects from immediately preceding Congress (“incumbency effects” and other historical effects); and (iii) the effect of redistricting for the 2012 election.

*(i) Popular vote as a function of seat count.*

This is calculated using a linear fit of the form

(seat margin) = *a0* + *a1* * (%vote margin)+ *a2* * (previous Congress seat margin)

where margins indicate the Democratic-minus-Republican difference. Both *a0* and *a2* are needed to effectively correct the generic Congressional poll margin.

The addition of *a2* decreases the residuals considerably, and leads to a modest increase in parameter uncertainties. As I have written before, adding more parameters fails to meet these criteria, and may constitute overfitting.

From 2002-2010, a0 = -3.3 +/- 8.2 seats and a1 = 6.2 +/- 1.1 seats/%vote.

From 1992-2010, ** a0 = -0.5 +/- 6.2** seats and

From 1948-2010, *a0* = +5.9 +/- 4.8 seats and *a1* = 8.0 +0.5 seats/%vote.

The parameter *a1* appears to be smaller over the last 20 years compared with post-WWII. This might be a reflection of increased incumbent advantage and/or redistricting.

*(ii) Historical effects (“incumbency”).* An incumbent’s advantage has been estimated to be as high as 5-8%. This could affect both *a1* and *a2*. The generic Congressional ballot is a direct measurement of opinion, and therefore is likely to already capture the effects of this advantage. For this model, the question is how to estimate the macro-level advantage.

Because I previously referred to *a2* as reflecting incumbency, I will continue to refer to it that way. The macro-incumbency advantage for 2012, based on recent data, gives estimates that go all over the place when even one data point is added or removed. It is not a stable parameter, suggesting other effects that require district-by-district analysis. Here, I use as much data as possible to get the error down. For 1948-2010, *a2*=0.2+/-0.1, which in units of generic Congressional ballot translates to a **macro-incumbency advantage of R+1.2+/-0.4%**.

*(iii) Redistricting.* From 2010 to 2012, the net overall shift in PVI distribution is R+0.62 +/- 0.06%. Because the seats-vs.-vote data above have a similar slope to the PVI distribution, I assume that this shift will translate fully to an effective change to the seats-vs.-vote relaitonship. Therefore the relationship in (i) requires a **redistricting correction of R+1.2+/-0.1%**.

>>>

**Part 2: Estimating the national Congressional vote.**

This is done by taking a median of **all** post-RNC/DNC convention generic Congressional preference polls. Aggregated-poll performance from RealClearPolitics suggest that these polls do a good job of predicting the final national vote. They are not perfect – a discrepancy can arise in the home stretch of up to 2-3%. Therefore the nominal error bar on a polls-now snapshot must include +/-2% uncertainty.

>>>

**Part 3: Estimating future movement by Election Day.**

Movement should be at least comparable to Presidential movement, which at >20 days from the election I have estimated as +/-1.8%. Congressional movement is likely to be greater because of low attention to local Congressional races. I make a baseline assumption that the movement in opinion is +/-2%.

Possible corrections:

- In a Presidential year, movement tends to be toward the Presidential winner. In a midterm year, movement tends to be away from the incumbent President. This would suggest that I should assume movement toward President Obama, by about D+2% to D+3%.
- The Meta-Margin is currently above its average for the season. If House polls followed Presidential preference (coattails), this would give an average R+0.5%.
- As of October 6, national House undecided voters are 10.5+/-0.6%, considerably higher than undecideds in the Presidential race (5%). This is a likely source of the break toward/away from the President’s party. If it were to break in proportion to Obama/Romney preference, it would give a net D+0.5%.
- A recent event, the debate…to quote the Rude Pundit, “Obama may have done more to depress voter turnout than all the i.d. laws combined.”

Taking into account these and other possibilities I have not thought of, it would seem safe to stay with a symmetric assumption. I will assume **+/-2% movement in either direction, symmetric around zero**.

The combined errors from Parts 2 and 3 above are sqrt(2*2+2*2) = 3%. Therefore **the estimate of Election Day generic Congressional preference is post-convention median, with an error bar of +/-3%**.

This is converted to an “effective” margin that takes into account incumbency and reedistricting as follows:

(effective margin) = (predicted true generic Congressional preference) + *a0/a1* + (incumbency advantage) + (redistricting advantage)

Currently, that is

(**D+2.5 +/-3.0**) + (R+0.1+/-0.9) + (R+1.2+/-0.4) + (R+1.2+/-0.1) = **D+0.0 +/-3.2%**.

Converted to seat margins, this gives a seat margin of D+0 +/- 22 seats. 1-sigma prediction: **median D 217.5 +/- 11 seats, R 217.5 +/- 11 seats.**

**Predictions: D+2.5+/-3.0% popular vote, D 217 +/- 11 seats R 218 +/- 11 seats.** Democratic control: 50%.