Princeton Election Consortium

A first draft of electoral history. Since 2004

A Contest: Hack the Gerrymandering Standards!

May 20th, 2017, 5:28pm by Sam Wang

June 1: Thank you for your entries. We are evaluating them, and will post the results – as well as an explainer for what we learned – shortly.

Today I did an Ask Me Anything on Reddit Politics. In conjunction with the AMA, we have a contest! The deadline is Wednesday, May 31st.

I have developed multiple statistical tests (Stanford Law Review) to detect partisan gerrymandering. These tests are focused on the principle of partisan symmetry, a phrase that appears in Supreme Court writings. In the past, a majority of justices of the Court has expressed interest in partisan symmetry as a standard, but there was not agreement on how to identify it. This year two cases, one in Wisconsin and one in North Carolina, give them an opportunity to address this task.

We need your help to see if the simple standards can be hacked. Many standards look reasonable on paper, but a clever person may be able to come up with ways to “game” the standards to benefit their side. In this contest, we are looking for talented hackers who can evade the rules and find ways to slip through the net. If you can do that, you can help us either rule out standards that are too loose or that need to be combined with other standards to close the net. Think of it as helping us build a spam filter for gerrymandering!

The contest is here. Use this worksheet [UPDATED 5/25 10:30am] to construct your entry, and then mail it to

If there are problems with the contest, please let us know in the comments section here.

Tags: Redistricting

32 Comments so far ↓

  • Mark

    Are you going to give an update on the contest? Did it yield any lessons about the proposed standards? Any impact on the Supreme Court cases?

  • RN

    Does it make sense to use chi squared test to measure the goodness of fit of the distribution of % of one party votes of the gerry-mandered districts to expected distribution of votes for that party?

  • LondonYoung

    Philosophical question – In court decisions I read a lot about the “intent” of the map designers.

    This contest seems to start with a target, i.e. disproportionately high number of republican seats, and then tests mathematical algorithms to see if they are closely correlated with the target.

    So, is there a danger that the result is just a mathematically complex way of asking for the more simply stated outcome of proportional seating?

    Another way of saying this is: if the court wants to use these test shouldn’t they derive some constitutional validity from something other than proportional seating?

  • Kevin Baas

    mean minus median is automatically “hacked” in highly partisan states: a measure at the middle(50/50) gives the opposite result of a measure at the ends.

    • Sam Wang

      Thank you! We are aware of this issue. It’s in my Election Law Journal article. This is why the contest specifies a 50-50 vote split.

  • Ken Miller

    Also, I have a question about criterion 1 vs criterion 3. #1 defines a safe seat as 55/45. #3 says the outcome shouldn’t change if turnout is 45/55, but that would turn a 55/45 district into 50/50, i.e. a tossup. So is 55/45 considered “durable” under #3? And are higher numbers like 60/40 considered better under #3, or does “The best partisan gerrymanders are impervious to a uniform statewide swing of at least 10%” mean that 10% is a ceiling, so that doing better than that — better than 55/45 (or does it have to be 55.1/44.9?) does not gain any more credit under #3?

    • Brian Remlinger

      One of the weaknesses of the contest is that we’re using hypothetical 100 person districts. In reality, 55/45 districts would be 55+epsilon/45-epsilon, so they’d still be slightly Republican leaning after a 5% shift. So you can consider 55/45 districts to be durable under #3.

      10% can be considered a ceiling. If a redistricter could make all of their districts 55/45 and all of their opponent’s districts 15/85 geographically feasibly and without tripping any alarms, they’d be ecstatic. Every vote over the 55% “safe” threshold looks is a waste to them.

  • Ken Miller

    Re your new criterion (efficiency test): since vote margin is restricted to 50% +/- 0.5%, the 2*(V-0.5) term can be at most 1%. The other term is 5% for 10 Republican seats and goes up by 5.5% for each additional Republican seat. So to pass the 3rd test you automatically can’t have more than 10 R seats. Seems like that defeats the purpose of the exercise? So we have two choices — do the best on the other two tests with 10R seats, thus passing all 3 tests; or do well on the other two tests with as many R seats as possible, but blow the 3rd test. Which one are you looking for? Maybe you should split the contest in two, one for each of these two choices??

    Also, re your 3rd criteria: it seems to me this should be modified to say a +/- 5 point swing should not lower the number of Republican seats compared to the 50/50 case. If going to 55R/45D adds to the number of R seats, that makes it an even better gerrymander, not a worse one; so that shouldn’t count against the solution, even though it means the swing to 55R/45D changes the outcome. As the criterion is stated, that would be penalized.

    • Brian Remlinger

      Hi Ken,
      I work with Sam on this project. Your issues with the Efficiency Gap are spot on — it places an upper bound of 10 on the number of seats. We’ve amended the contest rules to point out that the other two tests (the t-test and the mean-median difference) are more important for hacking. Feel free to ignore the Efficiency Gap, so long as you evade the other standards.

      Your point about the third criteria is well-taken. I’ve modified the rules as per your suggestion to make that clear.

  • ColinMcAuliffe

    I’m not sure how much this kind of hacking really tells us about the robustness of a given standard. Aside from ignoring geographical constraints it also ignores the uncertainty in future election outcomes. Voter preferences are fairly stable but they are not known with perfect certainty over the course of an entire decade.

    The mean median difference and t test are sensitive to perturbations in the outcome of a single district. This means it is possible that the strategies for gaming the metrics as well as the overall gamability of the metrics could be quite different when uncertainty in future elections is considered.

    • Brian Remlinger

      The geographic constraints are meant to be captured by the 85/15 criteria. In reality, districts can be made 90/10 or even more lopsided, but that’s not sustainable at a state-wide level.

      As for the uncertainty in future elections, that’s where the 45/55 criteria comes in. Congressional seats that are routinely won by a 10% or more are usually considered safe rather than competitive because they’re insulated from most uncertainty. Republicans wouldn’t have spent tens of millions of dollars creating dozens of 10-12 point wins in 2010 if they weren’t essentially guaranteed those wins for the next decade. This is also part of the reason it’s predicted that Democrats would need to win the popular vote by 7% in order to have a chance of retaking the House in 2018.

    • ColinMcAuliffe

      I wasn’t referring to the effect of uncertainty on the outcome of an election in terms of number of seats won. As you’ve mentioned gerrymanders are designed to result in a number of safe seats that are robust against a realistic statewide partisan swing.

      The uncertainty I am referring to is in precisely knowing the future vote percentages in each district, which is important for determining if these standards can be gamed. Changing the outcome in a single district will affect the p-values of either test, and so while we could test the stability of the gerrymander in terms of seats with a uniform partisan swing, we really need to consider nonuniform swings in order to determine the stability of the p-values of tests through a redistricting cycle.

    • Sam Wang

      This is a minor point. In actual fact, straight-ticket voting is at a high, and election-to-election correlation is also extremely high. Finally, we now have three election cycles, and the 2012 gerrymanders are going strong.

    • Colin McAuliffe

      It could be a minor issue in the end but it seems relevant when evaluating metrics since sensitivity to nonuniform swings is something that will tend to vary from metric to metric.

      Since you mentioned the durability of recent gerrymanders this reminded me of a separate point. The asymptotic p-value for mean-median assumes independent and identically distributed districts, but districts are neither of these. This makes the test vulnerable since the p value can be increased by increasing the dispersion of the district percentages. It seems unlikely that the overall dispersion of the results in a state would have much effect on the magnitude of asymmetry that could have occurred by chance in actual elections, but this is exactly what you get with the IID assumption.

  • Kevin

    This paper by Greg Warrington contains some critiques of your methods and suggests a declination method instead (better described by the mathiness in the paper than by me). See Section 6.3, e.g. “Additionally, it is quite possible for packing and cracking to occur without having any effect
    on the mean-median difference. For example, consider the 2012 Indiana congressional election shown in Fig. 7.C.”


    More from Greg:

  • Arbitrary

    Your 85/15% split limit is a significant problem to actually implementing this–in the real world some districts in some states wind up uncontested by a major party. If you include these 100-0 districts, Massachusetts, for instance, suddenly looks enormously gerrymandered just because the Republicans didn’t run candidates in some of the districts (5 out of 9 in 2016, so the median was 100-0, even as the mean was about 86-14 with a standard deviation of around 17; the t-test method, of course cannot be used when one party wins all the districts). But if you don’t include districts where one party doesn’t run a candidate, a party can game the test by not running candidates in districts they have packed with their opponents.

    At the same time, we have California, which determines district boundaries by independent commission. Including the districts where there was no second round Republican (i.e., there were two Democrats left after the primary, or a Democrat and an independent), the 2016 results fail the t-test with a p-value under 0.0002. Without them, California’s t-test p-value only improves to under 0.0007. Is it your intention to claim that California is gerrymandered?

  • Lorem

    I think I had a passing result at 13:5 with 6 safe seats. And 14:4 should be doable if you’re okay with failing one test. Alternatively, you could probably make the test numbers look less suspicious and/or pad your margins a bit more at 11:7 (though I haven’t explicitly tried – this is speculation).

    Really depends on what trade-offs you want to take. But yes, it seems like if you allow completely fictitious “maps”, the tests are not as restrictive as we’d like.

  • Pramod

    So how did you like the AMA?

    I saw that a bunch of aggressive posters tried to claim you have no expertise to talk about gerrymandering based on the results of the election. It reminded me of this radio interview I heard last night with Tom Nichols, the author of “The Death of Expertise.”

    Any idiot with a keyboard can now trash an expert, and apparently expert now means someone who has never been wrong about anything ever. I’m coming round to the view that this is the real problem we have in today’s world. There’s absolutely no respect for hard-earned expertise and specialization.

    • Sam Wang

      It was fun!

      I don’t know what to do about aggressive trolls like that. I agree, this is what Tom Nichols wrote about so well.

    • 538 Refugee

      You could point out that someone said “Let he who is without sin….” NOT “Let he who is without THAT sin…” Of course, it may just get you stoned so maybe not. ;)

  • sc

    A brute force approach to hacking gerrymandering standards would be viable. I live in PA, so I’d take a precinct-level map of partisan voting patterns in the most recent election (maybe use straight ticket selections, which are an option here, to filter out independent/swing voters who might be more likely to behave differently from election to election). Randomly sample contiguous collections of precincts to form district maps, throw out any maps that would run afoul of your tests, and maximize seats for the preferred party in remaining maps.

    This is a generalizable approach that can function in any state for which a redistricter has access to fine-scale voting patterns, requires relatively little specialized technical expertise to implement, and is robust to any conceivable statistical anti-gerrymandering standard. One can always get an advantageous map in this manner and create a post hoc public rationale for the boundaries. The interesting question is whether statistical gerrymandering tests can put absolute bounds on the size of the partisan advantage gained. If the bounds are tight enough, perhaps the benefit of gerrymandering becomes too small to be worth the effort.

  • ColinMcAuliffe

    I took a look at how to manipulate mean-median a little while ago. One strategy is to pack a few of your own districts which pulls the mean closer to the median and increases the standard deviation giving you a higher p value. You would need a fairly polarized state for this to work in practice

  • Lorem

    Wait, which side should we gerrymander for?

    The relevance is: the mean-median statistic should probably be -0.3<x<0.3. Right now with only the <0.3 side, the sheet lets the Democrats get away with it =)

    • Lorem

      Also, here’s a submission:


      12 safe wins while skirting the edge of both tests (-0.270 mean-median, 0.0911 t-test p-value).

      Also, not sure how you want to trade off test closeness vs seats. I guess one reasonable thing would be to look for the least close tests at every seat number–I think this submission should be it for 12+ (modulo trading off between the two tests).
      (And of course I’m kind of cheating by skirting the edge on 15%/85%, which would be difficult to pull off in practice.)

      I guess I’d like a precise utility function of how you do the trade-offs, unreasonable as that may be!

    • sc

      Yep, I came up with something similar after a minute of tinkering. Getting a 12:6 seat split on a 50:50 vote split while remaining within the bounds of these tests is easy enough. Would have to play around a bit more to get to the actual split (13:5) but on first glance it looks like these standards would have limited the ability to gerrymander by one seat. Still totally worth it to rig the redistricting process in that case. For purposes of the contest criteria 3 and 4 will be key – both your gerrymander and mine die in a wave election, and are close enough to the bounds of “acceptable” to arouse some suspicion.

    • Sam Wang

      Yes, getting something that only meets some of the criteria is easy. That’s not what we’re looking for!

  • Sohier

    In case anyone else feels like mucking about with this in Python, I’ve replicated the tests in a script posted here:

Leave a Comment