Princeton Election Consortium

Innovations in democracy since 2004

Outcome: Biden 306 EV (D+1.2% from toss-up), Senate 50 D (D+1.0%)
Nov 3 polls: Biden 342 EV (D+5.3%), Senate 50-55 D (D+3.9%), House control D+4.6%
Moneyball states: President AZ NE-2 NV, Senate MT ME AK, Legislatures KS TX NC

Analyzing Iran 2009: Part 2, The Official Returns

June 21st, 2009, 2:56am by Sam Wang

Current events have completely overwhelmed the relevance of any statistical analysis. But a critical look can still point us toward a better understanding of what happened on Election Day.

Analysis of fraud in Iran 2009 is an unfolding story. In this post I focus on Election Day returns themselves, which suggest: Votes may have been, in some sense, transferred from the minor candidates (Karroubi and Rezaee) to one of the major candidates. This could have been legitimate (that is, voters changing their minds when it came time to vote) or illegitimate (for example, minor-candidate votes being counted for Ahmadinejad). It is currently not known whether enough fraud occurred to flip the election.

Obviously, fraud is not necessarily confined to that suggested by this analysis. For summaries and updates, see David Shor and (start with basic returns and poll analysis).

> > >

Some crude types of fraud can be identified entirely from the numbers used to commit the fraud. This is a principle known in financial auditing, and relies on a phenomenon called Benford’s Law. The basic idea is that a wide variety of data have first-digit distributions that tilt toward low values. In other words, measurements such as 1345, 198, or 229 (first digits 1, 1, and 2, respectively) are much more likely to occur than 397, 745, or 9618 (first digits 3, 7, and 9). A similar observation even applies to second digits and so on, though the effect is much weaker.

This somewhat counterintuitive phenomenon arises in many situations: areas of islands, financial numbers, and physical constants, and more. An excellent popular article comes from Ted Hill, a mathematician who provided a rigorous proof for how Benford’s Law could arise when one takes random samples from distributions that themselves vary randomly.

A likely mathematical consequence is that observations can be uniformly distributed – on a log scale. To illustrate what logarithmic distribution implies, imagine an investment that doubles every ten years. It would take ten years to get from $1000 to $1999 (first digit = 1 all the way), but only ten years to get from $4000 to $7999 (first digit = 4, 5, 6, or 7). In this case 1 is the far more common leading digit. This is just one example of a logarithmic distribution – many financial quantities do the same thing. But when people cook books, they often don’t pick numbers that lead to Benford’s Law. Failure of numbers to match this expectation is often interpreted as evidence for possible fraud. A similar principle can be used to analyze elections.

Roukema’s claim. Boudewijn Roukema, an astronomer at Nicolaus Copernicus University in Toruń, Poland, made waves last week when he analyzed data from the Iranian Ministry of the Interior for the four candidates (Ahmadinejad, Mousavi, Karroubi, and Rezaee) in 366 voting areas. A PDF of his report is here. In brief, he found several deviations, including too many leading 2’s and not enough leading 1’s in Ahmadinejad’s reported numbers. He also reports quite an odd phenomenon: too many leading 7’s for Karroubi’s totals.

A problem with Roukema? A poltiical scientist at the University of Michigan, Walter Mebane, specializes in the application of Benford’s Law to the detection of electoral fraud. He has pointed out that first-digit distributions often don’t follow Benford’s Law well. Indeed, this was originally noticed by Benford himself (see Table IV of his classic 1938 paper).

In the case of election returns, one big reason Benford’s Law can fail is that voting areas aren’t necessarly randomly sized (recall that Hill assumed that distributions varied randomly; here a “distribution” can be construed as being a voting area). For example, if voting areas are set up to be uniformly sized to contain 100,000 people each, then close races will produce lots of 4’s and 5’s as leading digits. So Roukema’s claim is not yet overwhelmingly convincing.

Beyond first digits? Mebane advocates the use of second digits (for instance, the 5 in 4567; technical article here). Second digits don’t show as strong a trend and therefore require careful statistical analysis. But they are less likely to violate the ideal-case prediction of Benford’s Law for first digits. Mebane has begun to analyze the 2009 returns.

At the level of the 366 districts, he found that the results fell within expectations. However, he has more recently obtained data from 12 provinces including over 11,000 ballot boxes. At this much finer-grained level he has found deviations from expected second-digit distributions for Karroubi and Rezaei (see pages 21-22 of Mebane’s report).

What happened? So did something funny happen to Karroubi and Rezaei’s votes? Maybe, but not necessarily. Over 80% of ballot boxes showed single-digit numbers of votes (i.e. fewer than 10 votes) for Karroubi, and over 70% showed single-digit numbers for Rezaei. For these boxes there is no second digit to test. In such a situation one might imagine exceptions to the second-digit distribution should obey Benford’s Law.

In email with Prof. Mebane, he has expressed faith in the analysis’s applicability in this situation. My own inspection of the ballot box-level data, which he very kindly provided, is that the minor-candidate data appear to have a “scale-invariant” distribution, a condition that appears in proofs of Benford’s Law. I may have more thoughts on this later.

But another line of evidence can approach the question independently: pre-election polls.

Pre-election polls. I previously pointed out that in Tehran, Ahmadinejad outperformed all six pre-election polls by a median of 16 percentage points. Now let’s look at the minor candidates.

In 12 polls taken since May 1st, polled support for minor candidates was Karroubi 7+/-1% and Rezaee 8+/-4% (median and SEM). They showed at least 3% each in Iranian polls (though less in a Western-commissioned poll that showed a large fraction of undecided voters).

In official election returns, Karroubi obtained 0.9% of the vote and Rezaee obtained 1.7%. Where did their support go? Maybe it was never there, though that would require assuming a significant failure of Iranian polling. A second possibility is that minor-party candidates reverted to Ahmadinejad or Mousavi when it came time to vote. Certainly this is a common occurrence in U.S. politics. But it’s been pointed out elsewhere that such a “coming home” does not fit with Iranian politics, and notably it did not happen in Iran’s 2005 presidential race.

A third possibility is that minor-party votes were fraudulently transferred to a major candidate. This possibility is consistent with the statistical analysis of Prof. Mebane. Also, the total support in opinion polls for Karroubi and Rezaee is not far from the 16-point discrepancy in Tehran I identified earlier.

What does this mean? If minor-party candidate votes were transferred to Ahmadinejad, it’s conceivable that he did not originally get 50% of the vote, which would have forced a runoff. However, I would not go so far as to say that the election was stolen. My reading of this evidence is that fraud may have occurred, but not enough to flip the election.

That having been said, the threat of fraud undermines faith in free elections, whether or not it influences the outcome. In Iran, we’re now seeing the bloody aftermath of that loss of faith.

A final note: none of this analysis rules out other forms of fraud that are harder to trace. For that one can turn to other sources of information such as the 2005 election…

Tags: Politics

5 Comments so far ↓

Leave a Comment