A technical note: Non-independence among states

August 7, 2008 by Sam Wang

The comments on the last thread were quite instructive, and led me to look over Silver’s methods documentation in detail. Wow, that’s quite a complex procedure he has. I should probably address your questions about it before commencing with further description of the Meta-Analysis (which is not a prediction).

Many individual components of the FiveThirtyEight model seem reasonable. But I also get the impression that Silver has added assumptions one at a time, sometimes at the suggestion of his readers. I do like the idea of one of the early steps, trend adjustment, which helps compensate for the relative infrequency of state polls. I was just discussing how to do this with my colleague Ed Witten. This step is potentially terrific, especially while polls are sparse.

Overall, the procedure appears to have grown piece by piece. Based on my own experience with getting floods of mail in 2004, I’ll guess that his code is somewhat juryrigged, and therefore not in a condition to be used by anyone else. Lack of overall advance design, lack of testability, and lack of transparency. It might be time to simplify and streamline. Every garden needs to be weeded now and then.

* * *

Today let’s deal with one frequently asked question in the last thread – non-independence of state results. A number of you wanted to know how one should deal with the fact that if sentiment in one state moves, others will tend to move in the same direction and by a related amount.

Before embarking on any sort of correction, one should stop and ask whether the correction will increase the accuracy of the ultimate prediction, and by a detectable amount. In the case of taking a snapshot, as I do, second-order corrections should alter the snapshot very little because Election Eve polls do so well (for example see Tanenbaum and me) in predicting Election Day outcomes. To put it another way: outcomes among states are coupled, and this is already contained in the primary polling data.

But for making a distant prediction, how should one proceed?

Go examine Charles Franklin at Pollster.com’s Presidential polling trends in 2000 and 2004. As you can see, the amplitude of swing in sentiment can be large – up to six percentage points of voter sentiment. To guess what will happen on Election Day, one would need to offset all the state margins by a randomly selected 0-6 percentage points. Then calculate the EV outcomes. Silver has a strategy for doing this.

Now, let’s go back to my 2004 analysis. For those of you who were not here then, I put a variable in the MATLAB script named bias that performed the function of shifting voter sentiments in all states at once. Varying bias allowed me to give median EV estimates for no swing, 2% swing to Kerry, 2% to Bush, and so on. I gave the results so you could see what the effect would be for your own favorite swing. A remnant of this calculation is still visible under “Interactive Maps” on the right sidebar (see “+2% for Obama” and “+2% for McCain”).

Does this complication make it necessary to do all those simulations on FiveThirtyEight? No. Let’s consider a simple example. Imagine that you think that national opinion could swing 2% to Obama, 2% to McCain, or not at all. Okay, then do my Meta-Analytic calculation for the three corresponding sets of win probabilities. Then average the three histograms. You’re done.

Now, that’s only three cases. If you want to space it more finely, you can do the Meta-Analysis at 0.1% intervals (2.0%, 1.9%, 1.8%…) and weight the sum according to your swing model. It’s still far easier.

Some of you think that state-state correlations can account for the spikiness of the FiveThirtyEight simulation histogram. That’s probably not true. Basically, the simulations are now drawing from a range of Gaussians instead of just one Gaussian, and the sum of similar smooth functions is still a smooth function resembling the original. Besides, now the simulations are undersampling many distributions, not just one.

But let’s take one more step back – a big step. Under the current win probabilities listed at fivethirtyeight, even the calculation of probability distributions is unnecessary. When most probabilities are between 5% and 95%, the result is a sum of many uncertain outcomes. The swing model then adds variation. Such a situation is in the regime of the Central Limit Theorem and related concepts. In general, when many variable outcomes are summed, the resulting distribution will look approximately Gaussian. One consequence is that you can get an accurate result simply by calculating a sum of state EV weighted by win probability. No simulations, no fancy swing model – just a weighted sum. It could all be done in “closed form,” i.e. one could write a formula for it. It wouldn’t be as intuitive, but it would be about as accurate.

19 Comments

Andrew Foland says:

A single number I’ve never seen anywhere: looking back (since that’s all we can do) over the past “n” elections, how much “eventyness” happens in elections? That is, given a perfect snapshot at time t, and a perfect snapshot on Election Eve, how much do they differ (in RMS)?
You’d imagine that function looks basically like k sqrt(t-t_election); and adding a term like that in quadrature to the snapshot uncertainty (I believe you quote about 1%) would be a number one could usefully use a proxy for projecting final election probabilities. In a rational world “k” would represent the impact of events and new information happening between t and t_election.
Maybe you’ve estimated k? I thought at first when I read 538 that Silver was doing exactly this, but instead he does an explicit mean reversion. Numerically, they have the same effect I imagine (sort of like out-of-the-money options), but mechanistically they differ and I find it unlikely that many people are actually explicitly voting for the loser.

Sam Wang says:

Andrew, I like the way you have posed the question. Another way to put it would be to ask: what are the frequency and amplitude distribution of “event” sizes? But there are some problems facing such an analysis and what to do with it.
1) Dense state data are only available for 2004 and maybe 2000. So the number of campaignss for which we have information is small. Something might also be done with national data – though not the smoothed version shown in Charles Franklin’s display.
2) The biggest events, the ones we care the most about, are infrequent. See this crude graph from 2004 (I know, I need to revise it). You can see some apparent swing events: the opening of Fahrenheit 9/11, the Democratic and Republican National Conventions, the addition of John Edwards to the ticket, the Swift Boat campaign, and the first debate. Eyeballing it, each represents up to a 2-point swing (converting EV to %LV) an average of ~25 days apart.
Then one could turn this all into some canned function F that could be used to calculate H(bias) * F(bias), where H is the probability histogram and * indicates convolution.
The net effect of all this might be a little more accurate than what Silver is doing. But I think it’s overkill. For now, I am more interested in using the Meta-Analysis as a tool for understanding the current campaign’s twists and turns.

Sam Wang says:

One more thought – let’s do a very rough estimate for illustrative purposes.
Assuming that my interpretation of the graph is correct, we would expect about 4 “events” between now and the election, partly because this year both conventions are late. In order to win the election, McCain needs both of the non-convention events to be in his favor. If the events have random sign, the probability of this occurring is 25%.

Sam Wang says:

Wow, I keep on getting comments from people who want a probability of a November outcome.
A firm probability estimate creates a false sense of certainty. But okay, here’s one for you:
Given my comment about the likelihood of two favorable events happening to McCain, one way to estimate the probability of an Obama win is that it’s today’s win probability multiplied by the probability that it won’t flip. In other words, 99% * (100-25)% = 74%. Assuming the EV estimator doesn’t move, this number will move upward by about 2% per week between now and Election Day.

Andrew Foland says:

FWIW if I take your “crude graph from 2004” and look at the RMS of serial point-to-point deviations over the sqrt of elapsed time between points, I find that k, as measured in (EV/538)/sqrt(days) is about 0.017. I believe elsewhere you state that 35 EV ~ 1% PV, so if I did all the math right, this value of k corresponds to about 0.3% / sqrt(days) in Meta-Analysis PV.
Or put another way, the expected haziness 100 days out from the election is about 3%. That’s actually rather less than I might have guessed.
Of course, dynamics of each race could be different, k in Kerry-Bush may be very different from Obama-McCain, etc etc. But it’s interesting (to me at least) to get any quantitative measure at all, even if many caveats are associated to it.

James says:

Is there a way to leave a generic comment outside of a pre-determined thread?
I’m a bit surprised by the very large values in the “voter influence” tables. Is a voter in NV really 130,000 times more influential than one one in NJ??
Maybe something went haywire in the calculations…

David Shor says:

Andrew Foland,
I’ve looked at distributions of day to day shifts in national polling, which should in theory mirror state polling. (Larger sample sizes and more frequent polling as well)
So by central limit theorem, it does indeed follow a square-root rule.
I have a pretty good estimator for the random walk parameters(Basically your k), but I’m still working on coding it.
[Shameless plug] Of course, I talk about it on my blog [Shameless plug]

David Shor says:

Andrew,
We already have plenty of data for *this* election, enough for accurate estimation.
But actual estimators are a bit complicated. Weeding out poll variation from changes in voter opinion complicates things somewhat.

Sam Wang says:

Andrew, your estimate of 3% sounds about right to me. It’s not all that surprising. Surveys have shown that even voters who call themselves independent usually turn out to have a strong preference. Historically, even Barry Goldwater got nearly 40% of the vote, and his defeat was legendarily crushing.
The difficulty is that it’s not clear how the dynamics will play out in the next few months. I would have thought the gap between the candidates to be considerably wider than it is now. We’ll see what happens.
James (and everyone else): no, there is no way to comment generically. I am adjusting to having this site in weblog format. It’s far more work than the 2004 site, which was simply hand-coded. This year I want to make things interesting, orderly, and not too high-maintenance. I’m still working on that strategy.
In regard to the jerseyvotes calculation, it really did take off today. Values like that were more typical in 2004. It’s because the denominator is the power of a vote in New Jersey, a unit prone to fluctuation. Today the NJ margin jumped to Obama +10%, making our votes here particularly feeble. I may have to think of a different valuation unit. In the meantime, just take ratios between the other states.
And by the way: the lack of rounding is embarrassing! What kind of person reports numbers like that with four digits after the decimal? The kind of person whose code guru is off in New Orleans with Habitat for Humanity. I can’t wait until he gets back.

James says:

Pursuing the “voter influence” numbers a bit:
I understand you are looking at the marginal effect of one voter, but extrapolated to even fairly small numbers in NV, the results seem outlandish.
E.g., assume someone could convince 100 people in NV to vote for McCain. A linear extrapolation would seem to claim that’s equivalent to getting 13 million people to vote for McCain in NJ!
On the other hand, getting 100 people in NV to vote for Obama might well be as significant as getting 13 million more to vote for him in NJ if NJ is already a total lock.
I promise to stop nit-picking now. Great site–I like your approach of just computing the expected values instead of running simulations.

Sam Wang says:

James, that’s not nitpicking at all. Jerseyvotes made more sense in 2004, when the race was closer. The concept needs revisiting, and you are driving at an important point.
The original motivation in defining this quantity in 2004 was an attempt to figure out where local get-out-the-vote (GOTV) and other outreach would be most effective. This year, any such effort will only make sense if the race narrows at a national level.
The current definition of voter power is exactly what you say. If the election were today, McCain could only win if voting diverged from current polls in multiple states. The marginal voter power measure is calculated in terms of moving the win probability in one state, which means that it focuses on cases where at least one other state has flipped.
The extreme values have to do with the current poll medians in NJ (Obama +10%) and Nevada (Obama and McCain tied). NJ is in the long tail of the normal distribution, which is currently what I am using. I really ought to be using a t-distribution. Fixing that will resolve the extreme absurdity you have detected (and de-spike the histogram a little bit).
This is the literal fix to the problem. But the measure needs more revision. Here are two possible ways:
1) Calculate the measure under the assumption of that all state polls are offset by the same constant amount, one that makes the win probabilities 50-50. This is equivalent to assuming that some large-scale campaign event occurs in the very near future, making things a toss-up. (By the way, this is equivalent to the non-independence idea discussed so intensely by election projection hobbyists.)
2) Normalize to the most powerful voters, not the least powerful ones. This would give results that do not offend intuition. I would be disappointed to let go of the word “jerseyvotes,” but it *is* the voting equivalent of a hyperinflated German mark circa 1923.

Mark S. says:

This is a purely technical comment concerning the question: does the electoral vote distribution from FiveThirtyEight contain mathematical errors, given his assumptions?
Prof. Wang appears to say Yes. He wrote that the FiveThirtyEight calculation is imprecise and that an exact mathematical solution would yield different results. Prof. Wang seems to conclude that the FiveThirtyEight result is an error, which he attributes to undersampling (being based on a Monte Carlo simulation with 10,000 random cases).
I hope to convince Prof. Wang otherwise. I think that the spikiness in the FiveThirtyEight histogram is due to the high correlation among state results, and not due to imprecision or undersampling. Prof. Wang did consider this and concluded: “That’s probably not true.” I think that some specific examples would show that it very likely IS true.
FiveThirtyEight separates uncertainties into two components: state and national. State uncertainties are represented by independent random variations in each state. National uncertainty is represented by an additional random factor that is applied simultaneously to ALL states.
Consider what would happen to the EV distribution if either of these components is allowed to go to zero.
First, zero out the national uncertainty. In this case the states vary independently and each state can be assigned a specific Obama win probability. The resulting EV distribution can be calculated exactly using Prof. Wang’s formula. I assume this is what Prof. Wang did in his calculation, which showed a smooth almost-pure-Gaussian distribution with little spikiness.
Now take the other extreme. Assume that state-by-state uncertainty is zero and the uncertainty is only due to trends in the national popular vote. Now, the vote in each state remains the same relative to other states. For example, if Ohio votes for Obama in a given scenario, then every state that is more Obama-friendly than Ohio also votes for Obama. If Ohio votes for McCain, then all the more McCain-friendly states also vote for McCain.
The result is a very spiky EV distribution in which most EV values have zero probability. Suppose, for example, that the states more Obama-friendly than Ohio represent a total of 273 electoral votes. Then, the national-uncertainty case would include a spike at 293 votes for Obama (Obama wins Ohio and everything more friendly) and a spike at 273 votes (Obama loses Ohio but wins all the more friendly states), but nothing in between. If the next most Obama-friendly state is Colorado (9 ev’s), then the next spike would be at 264 votes with nothing between 264 and 273.
Since Fivethirtyeight uses both state and national uncertainty (and currently has similar magnitudes for each in projections from August to November), I would expect that the resulting distribution is somewhere in between the two extremes of smooth Gaussian (state uncertainty only) and extreme spikiness (national variation only). That is exactly what he gets.
What about Prof. Wang’s argument that the combined state and national uncertainties can be viewed as a probabilistic sum of individual state-only EV distributions, and that the sum of individual Gaussian distributions is also Gaussian? Prof. Wang has found that the individual-state uncertainties from FiveThirtyEight results in a near-Gaussian distribution, so shouldn’t any sum of these uncertainty cases also be near-Gaussian?
That is true. But if the uncertainties used by FiveThirtyEight were separated properly into state and national components, then the EV distribution based on state component alone might no longer be near-Gaussian.
Look at the “Today’s snapshot – all possible outcomes” from Prof. Wang. His distribution is somewhat Gaussian, but is also quite spiky. I suspect that his all-possible-outcomes EV distribution becomes increasingly smooth and near-Gaussian as the uncertainties become larger. (Again, consider the limiting cases of very small and very large uncertainty.) This would explain why Prof. Wang calculates a near-Gaussian distribution using FiveThirtyEight uncertainties, but not with his own (smaller) uncertainties. (Smaller because Prof. Wang’s uncertainties are for an election held today. Fivethirtyeight’s results are a projection for November 4, so the uncertainties are much higher.)
I would guess the following: the state component of the uncertainty in FiveThirtyEight is similar in size to the total uncertainty in Prof. Wang’s ‘snapshot’, and generates a similarly spiky distribution. The final result (combining state and national uncertainties) can be thought of as a probabilistic sum of individual state-component-only EV distributions. The state-only distributions are spiky, and the combined distribution can also be spiky.
This explanation seems much more likely then the notion that Fivethirtyeight undersampled. It is certainly more elegant to use an exact mathematical solution rather than a bunch of Monte Carlo simulations, but it seems very unlikely that a 10,000-case Monte Carlo simulation would show such a large difference from the true solution.

Sam Wang says:

Mark S., you’ve made some good points. Let me give a quick reaction. I will probably come back to this later, and perhaps publish a more polished response as a regular blog post.
First, I think you’ve given an excellent, clear description of the difference between (a) state-level uncertainty and (b) national fluctuation. It is not only clear but also corresponds to the components of my calculation using (a) state polls and (b) the bias variable I introduced in 2004.
Since, in any reasonable model, factor (b) tends to increase as with longer time intervals dT from polls to Election Day, it could make rather a large contribution when dT is over two months, as it is now.
Consider the extreme cases Mark describes. Let’s use as an example the following four states:
TX — CO — IA — NJ
where TX is the most reliably Republican, NJ is the most reliably Democratic, and the others are somewhere in between. If there is only national fluctuation and states are constant relative to one another, then McCain could win {TX}, {TX, CO}, {TX, CO, IA}, or {TX, CO, IA, NJ}, but no other combinations. This would lead to a spiky distribution.
Conversely, if states fluctuated a lot relative to one another, then many more combinations would be possible. Given the tendency of states to rank-order, the distribution of EV therefore tends to be more spiky than a purely independent model.
But that wasn’t my point (though some readers seemed to think I did not understand it). My point specifically has to do with the fact that the reported state-by-state win probabilities on Silver’s site are intermediate, i.e. not close to 0% or 100%. Such a situation is very different from probabilities that are calculated purely from current-poll snapshots. When probabilities are intermediate, the compound distribution has a very strong tendency toward Gaussian. And, as I have said before, the sum of smooth distributions is also smooth. This is inescapable.
There are two ways to get spikiness back that I can think of.
1) If the step of introducing state-state correlations comes after calculating win probabilities, then those correlations have to be extremely strong. In this case I think it is reasonable to say that the model is poorly formed at an intermediate step, and needs to be re-formulated explicitly in the terms Mark has outlined.
2) It may be that the “win probabilities” listed at FiveThirtyEight are not the actual probabilities used as inputs to the step of introducing state-state fluctuation. For instance, for clarity to help most lay readers understand his model, Silver could be giving the poll-based win probabilities, individually filtered through Mark’s rule (b), but for actual calculations use more purely poll-based probabilities as an intermediate step.
I should note that this description is basically the same as Mark’s guess in his second-to-last paragraph. It’s a reasonable interpretation of what’s going on, but raises a question having to do with “elegance.”
I am not wedded to elegance as a criterion for modeling; sometimes one has to put ugly patches in to get a model to work. However, I do think that a model should be clear, and described so that people can clearly understand its strengths and weaknesses.
In this case, what I see is that the model has been built piece by piece, eventually yielding something that can easily be misinterpreted – and could contain errors. Given what I know about Silver, I think the pieces are all probably OK. But really, only a few aficionados can follow what’s going on.
Mark, I invite a reply!

Mark S says:

Thanks for your kind response.
In reply: I mostly agree with what you said. Thinking of the final distributions of the sum of two factors – state uncertainties and national trends – then the final result is “spiky” only if the state component alone would generate a spiky pattern. Otherwise, the sum of Gaussians is a Gaussian.
So the questions are: how many of the state results are near-certain and how many are intermediate? And how much of the uncertainty is due to state-by-state uncertainty as opposed to national trends?
Next: the “win probabilities” on the FiveThirtyEight site? I have no “inside” information, but they seem to be composite win probabilities. All the other results on the site are composite results that include both state and national uncertainties. In his FAQ he says that his win probabilities are “the number of times that a candidate wins a given state… based on 10,000 daily simulation runs.” (It sounds like he just counts it up.)
When I read his method I imagined it like this: He generates a median percent-vote and a percent-vote uncertainty in each state, and a second national uncertainty which adds to all states. Then he uses random numbers (state and national) to get an adjusted popular vote in each state for each of his 10,000 cases, and if the result comes in over 50% he counts it for Obama. No use of win percentages.
One other thing: after making all these guesses about Gaussians and spikiness, I decided that I really ought to see how some actual calculations turn out. I used the current win probabilities (8-16-08) from the FiveThirtyEight site, treated them as state variations only (though I think they are state and national combined), and used both your method and his method. The result was a very smooth Gaussian with your method and a smooth Gaussian but with greater variation (20% for neighboring EVs) using 10,000 simulations. Also, in doing this I found that I needed a good random number generator. At first I used a bad one and got a much more spiky and uneven pattern – but I still got the same national win percentage, even with just 1000 simulations. (There was a large difference in top-line EV, but the peak was very broad.)
Also interesting: The national win percentage was .72 from your analytic solution, .73 from the 10,000 simulations – but the win percentage on the web site is 58%. I expect this difference occurs because the intermediate win probabilities on the FiveThirtyEight site are driven largely by national trends.
I tried an case with arbitrary lower state uncertainties. Both the analytic solution and the 10000 simulations became spikier (the simulations more so). Overall win probabilities were still similar, but the spikes in the simulations did not match to the spikes in the analytic solution. This seems to be a problem with the random number generator. I hope I can fix this, because I think this is really the test of FiveThirtyEight.
So, I’d guess that FiveThirtyEight is running simulations with relatively low state uncertainties (generating a spiky pattern that is not smoothed out by the additional national uncertainty). Or maybe he just has a bad random number generator.

Mark S says:

Short addition: sometimes the answer is simple. The EV peaks did not match up between your method and the Monte Carlo simulations because the EV was erroneously shifted in the Monte Carlo output. Now that is fixed, and the EV distributions match quite well.
The original win probabilities generate a smooth EV distribution. When I reduce these to smaller values, I get more spikes. When I then add national uncertainty, the spikes are preserved – spikiness remains about the same as in the case with no national uncertainty.
So far, at least, this seems to say that FiveThirtyEight is OK.

Sam Wang says:

I am glad to see that your calculations give spiky distributions when win probabilities are extreme, and smooth distributions when they are not. The persistence of spikes in the distribution with national variation is exactly what I was describing.
Overall your findings are consistent with either interpretation (1) or (2) above. In regard to whether FiveThirtyEight is “OK”: As I have said repeatedly, it’s a reasonable numerical model that attempts a prediction. But its design includes unnecessary imprecision. In the end, this may be acceptable since future predictions are intrinsically imprecise. This is a contrast to a simple snapshot of current polls, which has the potential to be very precise.

M. Pace says:

“But for making a distant prediction, how should one proceed?”
Following this question in the original 7:28 a.m. Aug. 7 post above, I don’t not see any serious attention paid to the quite likely non-zero correlations between states.

M. Pace says:

Er, I didn’t mean “don’t not”. I meant “do not”.

Jay says:

The claim made regarding the central limit theorem, i.e.:
“One consequence is that you can get an accurate result simply by calculating a sum of state EV weighted by win probability.”
is incorrect. It is true that the central limit theorem implies that, assuming the distributions being summed have finite variance, the limiting distribution is Gaussian. However the variance of that Gaussian is not the weighted sum of the variances of the distributions being summed.
Put more simply, the variance of the sum of two (or more) correlated Gaussian variables depends explicitly upon the correlation of those variables.

Leave a Reply

Your email address will not be published. Required fields are marked *