Princeton Election Consortium

A first draft of electoral history. Since 2004

Help with district-partitioning calculation?

April 25th, 2015, 2:15pm by Sam Wang

(Considering that this is a fairly narrow-appeal post, I will pipe it over to the right-hand “Meta-Analysis” column shortly.)

Dear PEC readers, I have a math puzzle. It relates to my gerrymandering project. If you are good at working with probability distributions, take a look. Can you solve it?

Here is the puzzle. It is basically a closed-form calculation of the numerical simulations I did for that NYT piece. It is for a peer-reviewed paper I am writing on how to establish criteria for fair Congressional districting.

Partitioning of voters in a state with randomly selected districts. Imagine a state with N districts, and a two-party winner-take-all system (i.e. the U.S. system for electing House members). Select districts at random from a distribution whose vote share v (for party #1) follows a near-Gaussian distribution whose average is A and standard deviation is S.

Now add the condition that the statewide two-party vote yields a fraction F1 of votes for political party #1 (and of course the other party gets F2=1-F1). Therefore districts v1..N must satisfy the constraint sum(v1..N)/N=F1.

What is the probability distribution of k, where k is defined as the number of districts in which vi>0.5? Give the mean and SD of the expected number of seats to be won by party #1. Also describe the degree to which the distribution resembles a Gaussian.

P.S. When F1 is close to A, I believe the answer is approximately <k> =  N*p and std(k) = sqrt [N*p*(1-p)], where p = normcdf(F1,0.5,S). If you can do better, let me know!

P.P.S. Here is a rephrasing of the problem: Consider a normally distributed variable with mean mu and standard deviation sigma. Draw from it k times. You only accept sets of draws whose average is constrained to be mu’, which is unequal to mu. What is the distribution of the draws?

P.P.P.S. Probably solved. It’s as above, except instead of normcdf(F1,0.5,S) we have normcdf(F1,0.5,S*sqrt((k-1)/k)). This arises in a semi-obvious way from the derivation of the standard error of the mean.


The gift I have in mind is kind of small: a signed copy of either (or both) of my books. I will see if I can think of something nicer to send…

Tags: 2012 Election · 2014 Election · House · Meta-analysis · Redistricting

7 Comments so far ↓

  • Sam Wang

    I “solved” this by reformulating it so that the question looks like the answer will come from the derivation of the standard-error-of-the-mean. To test that…I calculated the answer explicitly for k=2, then did some numerical simulations to see that in fact, the answer involves a factor of sqrt((k-1)/k) difference from the unconstrained distribution.

    Intuition-building code:
    for i=1:1000000
    if mfoo>1.5
    foo2=[foo2 foo'];
    [k std(foo2) sqrt((k-1)/k)]

  • George Fleming

    Each district has the same number of voters?

  • Keith S

    If we have N independent events that occur with probability p and son’t occur with probability q = 1 – p, we get the probability generating function f(x) = (q+px)^N.

    So f’(x) = pN (q+px)^(N-1) and f”(x) = N(N-1) p^2 (q+px)^(N-2).

    Thus E[k] = f’(1) = Np and Var[k] = f”(1) + f’(1) – (f’(1))^2 = Np(1-p) and this is obviously the binomial distribution.

    Since this matches your answer and seems to be a simpler problem than the one you’re actually trying to solve, you might have more work to do. However, since I’m a little bit unclear on what you’re trying to do, this may be sufficient.

    • Sam Wang

      Good try! However, it doesn’t answer my question. Your answer contains two approximations: (1) the vote’s the same nationwide and in a state, and (2) the trials are independent. With those assumptions, the problem can be solved using introductory probability.

      As per my original NYT piece, the idea is for a state with k districts, we randomly select k districts from the national sample, then impose the additional criterion that their vote sum up to the actual statewide popular-vote total.

      This is not a simple probability problem. It is more of a partitioning problem, where the sum of the k districts constrains what is an allowable combination.

  • Jason Dick

    Hmm, I don’t quite understand the constraint here. Certainly over many realizations, A and F_1 must be the same. So in practice they also would have to be very close to identical. What is the motivation in separating this into two separate parameters?