*(Considering that this is a fairly narrow-appeal post, I will pipe it over to the right-hand “Meta-Analysis” column shortly.)*

Dear PEC readers, I have a math puzzle. It relates to my gerrymandering project. If you are good at working with probability distributions, take a look. Can you solve it?

Here is the puzzle. It is basically a closed-form calculation of the numerical simulations I did for that NYT piece. It is for a peer-reviewed paper I am writing on how to establish criteria for fair Congressional districting.

**Partitioning of voters in a state with randomly selected districts. **Imagine a state with N districts, and a two-party winner-take-all system (i.e. the U.S. system for electing House members). Select districts at random from a distribution whose vote share v (for party #1) follows a near-Gaussian distribution whose average is A and standard deviation is S.

Now add the condition that the statewide two-party vote yields a fraction F_{1} of votes for political party #1 (and of course the other party gets F_{2}=1-F_{1}). Therefore districts v_{1..N} must satisfy the constraint sum(v_{1..N})/N=F_{1}.

What is the probability distribution of k, where k is defined as the number of districts in which v_{i}>0.5? Give the mean and SD of the expected number of seats to be won by party #1. Also describe the degree to which the distribution resembles a Gaussian.

*P.S. When F _{1 }is close to A, I believe the answer is approximately <k> = N*p and std(k) = sqrt [N*p*(1-p)], where p = normcdf(F_{1},0.5,S). If you can do better, let me know!*

P.P.S. Here is a rephrasing of the problem: *Consider a normally distributed variable with mean mu and standard deviation sigma. Draw from it k times. You only accept sets of draws whose average is constrained to be mu’, which is unequal to mu. What is the distribution of the draws?*

*P.P.P.S. Probably solved. It’s as above, except instead of normcdf(F_{1},0.5,S) we have normcdf(F_{1},0.5,S*sqrt((k-1)/k)). This arises in a semi-obvious way from the derivation of the standard error of the mean.*

>>>

The gift I have in mind is kind of small: a signed copy of either (or both) of my books. I will see if I can think of something nicer to send…

Jason Dick// Apr 25, 2015 at 11:03 pmHmm, I don’t quite understand the constraint here. Certainly over many realizations, A and F_1 must be the same. So in practice they also would have to be very close to identical. What is the motivation in separating this into two separate parameters?

Sam Wang// Apr 26, 2015 at 2:34 amOne is the national margin, the other is a single state’s margins.

Keith S// Apr 27, 2015 at 12:41 pmIf we have N independent events that occur with probability p and son’t occur with probability q = 1 – p, we get the probability generating function f(x) = (q+px)^N.

So f'(x) = pN (q+px)^(N-1) and f”(x) = N(N-1) p^2 (q+px)^(N-2).

Thus E[k] = f'(1) = Np and Var[k] = f”(1) + f'(1) – (f'(1))^2 = Np(1-p) and this is obviously the binomial distribution.

Since this matches your answer and seems to be a simpler problem than the one you’re actually trying to solve, you might have more work to do. However, since I’m a little bit unclear on what you’re trying to do, this may be sufficient.

Sam Wang// Apr 27, 2015 at 1:24 pmGood try! However, it doesn’t answer my question. Your answer contains two approximations: (1) the vote’s the same nationwide and in a state, and (2) the trials are independent. With those assumptions, the problem can be solved using introductory probability.

As per my original NYT piece, the idea is for a state with k districts, we randomly select k districts from the national sample, then impose the additional criterion that their vote sum up to the actual statewide popular-vote total.

This is not a simple probability problem. It is more of a partitioning problem, where the sum of the k districts constrains what is an allowable combination.

George Fleming// Apr 27, 2015 at 2:01 pmEach district has the same number of voters?

Sam Wang// Apr 27, 2015 at 3:46 pmyes

Sam Wang// Apr 28, 2015 at 12:52 amI “solved” this by reformulating it so that the question looks like the answer will come from the derivation of the standard-error-of-the-mean. To test that…I calculated the answer explicitly for k=2, then did some numerical simulations to see that in fact, the answer involves a factor of sqrt((k-1)/k) difference from the unconstrained distribution.

Intuition-building code:

k=8;

foo2=[];

for i=1:1000000

foo=normrnd(zeros(k,1),1);

mfoo=mean(foo);

if mfoo>1.5

foo2=[foo2 foo’];

end

end

[k std(foo2) sqrt((k-1)/k)]