Princeton Election Consortium

A first draft of electoral history. Since 2004

About the Meta-Analysis (FAQ)

(Updated on 19 September 2012 by Sam Wang)

The right-hand sidebar features a meta-analysis directed at the question of who would win the Electoral College in an election held today. Meta-analysis provides more objectivity and precision than looking at one or a few polls, and in the case of election prediction gives a highly accurate current snapshot. In 2004, the median decided-voter calculation captured the exact final outcome. In 2008, the final-week decided-voter calculation was within 1 electoral vote.

Calculations are based on recent available state polls, which are used to estimate the probability of a Democratic/Republican win, state by state. These are then used to calculate the probability distribution of electoral votes corresponding to all 2.3 quadrillion possible combinations. For a popular article about this calculation, read this article and the follow-up.

Is this Meta-Analysis a prediction of what will happen on Election Day?

The basic analysis does not; it is a snapshot of conditions today. Between now and Election Day, think of the Meta-Analysis as a precise snapshot of where the race stands at any given time. In late October the Meta-Analysis should come quite close to the actual outcome.

Starting in 2012, this site also provides a prediction (see essay 1 essay 2) based on the current year’s polls and the amount of variation observed in similar past races. This is a true prediction for November. It has the specific advantage of not relying on poorly-justified assumptions such as econometric conditions. It relies only on polls, which are the only direct measure of opinion. The approach taken in both popular and political science models introduces more noise than signal, as discussed in this essay.

What’s different about this analysis in 2012 compared with 2008?

The main difference is the addition of a prediction for Election Day as described above.

What was different about this analysis in 2008 compared with 2004?

In 2008, three major changes were made.

First, the Meta-Analysis relies entirely on the well-established principle that the median of multiple state polls is an excellent predictor of actual voter behavior. On Election Eve 2004, a calculation based on this principle made a correct prediction of the electoral vote outcome. Additional assumptions were unnecessary and unwarranted. In 2008 the calculation is kept simple – and therefore reliable.

Second, the calculation is automated to allow tracking of trends over time. This allows the Meta-Analysis to be used to identify changes in voter sentiment as seen through the lens of actual electoral mechanisms.

Third, instead of focusing on battleground states, we are tracking all 50 states and the District of Columbia.

In the Meta-Analysis, how can you possibly go through 2.3 quadrillion possibilities? Wouldn’t that take forever?

The Meta-Analysis doesn’t actually calculate the probability of every combination of states one at a time. At a rate of going through a million combinations per second, that process would take over 71 years. Yet repeated simulation is exactly what other sites do – though they only do thousands of simulations, not quadrillions. Such a laborious approach means that they can only approximate the expectation based on a set of win probabilities.

Instead, the Meta-Analysis uses an overlooked method to calculate the probability of getting an exact number of electoral votes, covering all ways of reaching that number given the individual state win probabilities. This is a much easier problem – it can be solved in less than a second. Here is a simple example.

Imagine that there are just two states. State 1 has EV1 electoral votes and your candidate has a probability P1 of winning that state; in state 2, EV2 electoral votes and a probability P2. Assume that EV1 and EV2 are not equal. Then the possible outcomes have the following probabilities:

EV1+EV2 electoral votes (i.e. winning both): P1 * P2. EV1 electoral votes: P1 * (1-P2). EV2 electoral votes: (1-P1) * P2. No electoral votes: (1-P1) * (1 – P2).

In general, the probability distribution for all possible outcomes is given by the coefficients of the polynomial

((1 – P1) + P1 * x^EV1) * ((1 – P2) + P2 * x^EV2) * … * ((1 – P51) + P51 * x^EV51)

where 1…51 represent the 50 states and the District of Columbia. This polynomial can be calculated in a fraction of a second.

Why don’t other projection sites use your approach?

Three reasons.

First, the Meta-Analysis is unlike, say, fantasy baseball, where a lot of enjoyment comes from thinking about individual scenarios. We take no interest in specific scenarios; we want the median outcome that takes into account all possibilities. This gives the most precise possible answer, but it lends itself poorly to color commentary.

Second, the treatment is somewhat mathematical. Hobbyists at other sites may not have the expertise to take the polynomial shortcut, which is made possible by the fact that the Electoral College follows a relatively simple system in which EV are added up. Certain aspects of the Meta-Analysis are original and may someday be published.

Third, a properly done calculation reduces noise. This, in turn, reduces variation – and opportunities for commentary. Most media organizations want more commentary, not less.

What polls do you use? When do you exclude a poll?

We use all available state polls, with a preference for likely-voter polls over registered-voter polls when both are released. We do not exclude any poll.

For the current snapshot, the rule for a given state is to use the last 3 polls, or 1 week’s worth of polls, whichever is greater. A poll’s date is defined as the middle date on which it took place. In some cases 4 polls are used if the oldest have the same date. At present, the same pollster can be used more than once for a given state. From these inputs, a median and estimated standard error of the median are used to calculate a win probability using the t-distribution.

What do you think of my favorite/despised pollster?

Appropriate aggregation methods remove the need to dissect individual polls. For example, median-based statistics correct for outliers. Also, human beings engage in motivated reasoning, and look more critically and closely at polls with which they disagree. Avoiding this bias leads to more accurate results. For these reasons, commenting on individual pollsters is usually not productive.

Indeed, good aggregation has the potential to free up mental (and media) space for information about more important topics than individual polls.

When do updates occur?

Every day at 8:00am, noon, 5:00pm, and 8:00pm.

Your estimate fluctuates less than other sites. Why is that?

It’s the power of meta-analysis. Even though individual polls may vary, aggregating multiple polls per state reduces uncertainty. Calculating the entire distribution of outcomes does an even better job. As a result, the Electoral Vote estimator on this site typically doesn’t move much. This is in fact the point of the analysis – to get past the vagaries of day-to-day poll reports. Rigorous meta-analysis is sometimes less exciting to watch than a site that varies every day, but in our view it’s the best way to present polling data.

Your calculation could be used to give a win probability. Why don’t you show this?

The uncertainty at any given moment is small enough that the results of an election held today would not be in much doubt. At any given moment, a current-poll win probability is typically greater than 95% for either candidate. Because this quantity is the wrong one to focus on, it is not given.

The greater uncertainty comes from changes that may happen over time between now and Election Day. This can be used to derive a true November-win probability. Some challenges in estimation are discussed here.

From day to day, a very useful quantity is the Popular Meta-Margin, defined as how much swing would have to take place to generate a near-exact electoral tie. The Popular Meta-Margin is equivalent to the two-candidate difference found in most single polls. It has many uses because it tells you where the race stands in units of voters. Errors in polling such as cell phone user undersampling and third party candidates are in these units, and therefore can be compared directly to the Meta-Margin.

For those who still insist upon getting a probability from the Meta-Analysis, it can be computed by pasting current histogram data into a spreadsheet and summing rows 270 through 538.

Why should I believe the Meta-Analysis? In 2004, didn’t it predict a narrow Kerry victory?

Actually, the method was fine, but its inventor, Prof. Sam Wang, made an error. In the closing weeks of the campaign, he assumed that undecided voters would vote against the incumbent, a tendency that had been noticed in previous pre-election polls. Compensating for the “incumbent rule” had the effect of putting a thumb on the scales, lightly – but unmistakably – biasing the outcome.

Leaving out this assumption, the prediction in 2004 was exactly correct: Bush 286 EV, Kerry 252 EV. In retrospect, it’s clear that the incumbent rule is subjective and cannot be relied upon. You can read about the confirmation of the prediction in the Wall Street Journal (pre-election story here). A second confirmation came in 2006, when, using a related but simpler method Sam expected the odds of a Democratic takeover of the US Senate were 50-50, a higher chance than predicted by either pundits or electronic markets. Indeed, that event did end up occurring. Finally, in 2008, and Presidential and Congressional calculations did extremely well.

Overall, the analysis is kept as simple as possible as a means of avoiding unintended bias. Both data and the code for doing the calculations are freely available. That way, anyone can check the results. Everything was open in 2004 as well; readers provided lots of useful feedback, such as this exchange.

State polls are done less often than national polls. Does that introduce a delay into your analysis?

Yes. As of early August this delay is about two weeks in key states. The delay will diminish dramatically as the campaign season progresses. A correction based on national polls is possible, but adds considerable uncertainty to the estimate.

What is the Popular Meta-Margin?

The Popular Meta-Margin is the amount of opinion swing that is needed to bring the Median Electoral Vote Estimator to a tie. It helps you think about how far ahead one candidate really is. For example, if you think support for your candidate is understated by 1%, this can overcome an unfavorable Meta-Margin of less than 1%. If you think that between now and Election Day, 1% of voters will switch from the other candidate to your dude, this is a swing of 2% and can compensate for a Meta-Margin of 2%.

What if I think that polls are biased against my candidate? Do you provide a tool for me to see how a bias changes things?

One tool is the Popular Meta-Margin (see above). Another tool is the map in the right-hand column, which comes in flavors that show single-state probabilities with a 2% swing toward either candidate.

What are jerseyvotes? And can you explain the “Power Of Your Vote” table?

Jerseyvotes, invented at this site in 2004, are a way to measure the power of individual votes to sway the election. Conceptually, jerseyvotes are distantly related to the Banzhaf Power Index, but normalized to the power of one individual. If you have ten times as much influence over the national win probability as a voter in New Jersey, your vote is worth 10 jerseyvotes. Sadly for the hosts of this site, one jerseyvote is not worth very much.

The Voter Influence table in the right-hand sidebar displays information about the ten states currently with the highest jerseyvotes, plus New Jersey for comparison. The jerseyvote statistic for each state is listed in the “Power” column, and they are normalized so that the most powerful state has power equal to 100. (Originally, this power statistics was normalized so that NJ voters had power equal to 1, hence the term “Jerseyvotes”.) The current polling margin, as determined by the meta-analysis, is also displayed for each state. For example, if the meta-analysis indicates that NJ is currently polling 50% for Obama and 44% for McCain, then NJ’s “Margin” column would read “Obama +6%.”

In your future prediction / current snapshot, the probability is very different from the InTrade price. Why is that?

It is wrong to interpret InTrade prices as true probabilities. Those prices reflect what a number of bidders think to be the win probability. InTrade bidders tend to be underconfident in evaluating polling data. On Election Eve, even a 5 +/- 1 point lead for a candidate is often insufficient to drive a share price above $0.90. However, it is true that the candidate with an InTrade price above $0.50 is usually the leading candidate. The issue is analyzed further here.

I wrote a comment but it does not appear. What happened?

The site is moderated to shape the discussion. Our audience includes a wide range of numerically-oriented professionals and academics. Many are also partisans. Statements that appear to be false are deleted. Comments that veer far from some kind of evidence (quantitative arguments favored) might find a better home at sites such as HotAir DailyKos.

Several other sites emphasize the possibility that states tend to vary together, so that if one poll is off, then others will be off in the same direction. Why don’t you include that in your model?

Assumptions should only be added to a model if they make a difference in the outcome.

For a snapshot, adding covariance makes very little difference. For instance, let us make the assumption that all polls move together between the last day of polling and Election Day by a random small amount (up to 1%, say), and the random amount is unbiased. In this situation the median EV estimator does not change by a measurable amount. The uncertainty in the final outcome, as measured by 95% confidence interval, gets a little wider. But that’s it. Thus, since covariance has no substantive effect, it is left out.

For a prediction, the answer is more nuanced. In this situation, the change between polling day and Election Day could be considerable. Now, the way that the change is modeled affects the shape of the distribution. However, the median is still the same. In this case a simple and effective way to vary long-term change is to covary all states together by a random amount. This is at the core of the prediction, a feature that was introduced starting in 2012.

This site has discussed the subject of non-independence here and, most recently, here.

What do you do with third-party candidates? Can such a candidate shift the outcome?

We take whatever the pollster gives us as the margin between the two leading candidates. Third-party candidates tend to fade in the finish in a system with two dominant parties. Some pollsters give third-party results and some don’t. This kind of detail might help, especially for analyzing local/state races where third-party local candidates run strong, such as Maine.

 

67 Comments

67 Comments so far ↓

  • Frank

    I appreciate your analysis. In addition to the meta-margin, it would be helpful to me to see the % of undecideds that would equalize the median EVs. Can you show both?

    Also, am I correct that Obama’s median EV is holding at 309 but his meta-margin is increasing slightly, and if so what would the interpretation be?

  • jcie

    I think the presentation would be a bit clearer with extra parentheses in the polynomial:
    ((1-P1) + P1*x^EV1) * ((1-P2) + P2*x^EV2) …

  • Sam Wang

    jcie, thank you.

    Frank: If we assumed 10% undecided, they would have to break about a 2-1 break in McCain’s favor. However, this is a very unreliable estimate. Some pollsters push harder than others in forcing respondents to choose a candidate. The recent undecided numbers in Pollster.com’s national polls range from 3 to 15% (some pollsters do not even report the figure). At the low end of the range, even 100% of the undecideds voting for McCain wouldn’t do it; at the high end, little more than a 3-2 break would be sufficient.

    I noticed the same thing that you did about the meta-margin. Perhaps a state (or states) in the “safe” range for Obama or McCain recently reported a result more favorable to Obama, leading to a subtle shift in the distribution. Maybe the impact of the recent McCain attacks (Obama is a celebrity and/or the anti-Christ) has peaked. However, that would be a lot to hang on this result. We are unfortunately in a period when state polls are sparse and changes take several weeks to be fully seen in the meta-analysis.

  • Pat

    Since you have the probability distribution of electoral votes, it might be nice to display a probability of victory for each candidate. Currently, with the distribution almost entirely above 270 Obama EV, I assume there would be a probabilitly of victory of at least 95% for Obama.
    It could be a useful indicator to plot over time. And since you made a point of differentiating your approach from that of fivethirtyeight.com, it would be interesting to see the extent to which the two probabilities differ.

  • Sam Wang

    Pat, a probability measure of the type you describe (currently >99%) would only be a snapshot of today. To get a true probability, it would have to be multiplied by the probability that the polls don’t move far enough to flip the outcome. That’s a highly uncertain number. My reading of the situation is that the true probability is about 75% or so. I’ll write about this later.

    In the meantime, the history graph in the right sidebar gives something I think you will like, namely an indicator that can be plotted over time.

  • JoeA

    Even though I know it’s a bit of a lagging indicator in August, it would be great if we could get a graph of the metamargin and how it changes over time. Would be a nice way to see how the campaign’s evolving, and until the # of polls picks up, might also be a good indicator of trends likely to continue.
    Love the site!

  • Richard Gilman

    I’m not a scholar so I basically read this site for the forecasts in the most basic sense. Right now, at the top of the home page, it says Obama 300, McCain 238. Where is the list of states that you project for each candidate? Thanks.

  • Sam Wang

    Mr. Gilman, there is no such list. The EV totals are the center of the range of the preponderance of likely outcomes. So they do not guarantee any particular outcome. However, it is possible to say with high confidence that if the election were held today, the total EV outcome would be at or near the numbers given.

    If you really insist on definite assignment of states based on polls, you can click on the maps at right, which give probabilities. Click the map, at which point every state is forced to be assigned to Obama or McCain. This is more like what sites like electoral-vote.com or Pollster.com provide.

  • Oliver

    With all respect, Sam, the assumption in 2004 that undecideds would vote against the incumbent–an assumption that, by the way, makes sense and empirically is proven valid– was not wrong. The evidence is good that Kerry did win, as the investigative report by Congressman John Conyers says.

    But let’s step back a moment. It’s an important tenet of science that we let the evidence lead us to conclusions, not the other way around. You will concede, I hope, that there are legitimate questions about the outcome of the 2004 election. If there are doubts about a piece of data, then one should not make conclusions based on either including or excluding the questionable data point. It’s bad science to do so. The proper approach is to suspend judgment, and let new data prove the matter one way or the other.

    Personally, I think elections are mostly cooked by a terrible news media and the alienation and gullibility of less affluent/less educated voters. But there is plenty of evidence of outright cheating around the margins. What the US Civil Rights Commission documented in 2000, the racial disparities that emerged in the Lopategui case in New Mexico in 2004, and the March 2007 conviction of two Cuyahoga election board workers on felony counts for rigging the 2004 election are incontrovertible facts that make it clear that our elections are tilted. How much is still unclear. Until we know, I suggest letting your analytical system be guided purely by data and not by assumptions.

  • Pat

    Currently, the site fivethirtyeight.com projects a composite electoral vote total of 272 for Obama and 268 for McCain. On the other hand, they observe that McCain wins more often in the simulations (52% of the simulations).
    You explained before that Nate Silver’s simulations were not only useless, but also inaccurate: indeed the distribution they get is far from a regular gaussian. Could these factors cause that paradox of one candidate getting a larger median EV total and the other winning more often? In principle, if they did not only perform 10,000 simulations, but an arbitrarily large number of simulations, would these situations be avoided? (in other words, can we expect the distribution of outcomes to be totally symmetrical in all cases?)
    Thanks for your comments.

  • Sam Wang

    Pat – first, let me say that I have moderated my thoughts of Silver’s site. His commentary is excellent and he is quite knowledgeable about opinion polling. All prediction is intrinsically inaccurate, and singling him out is beside the point. His approach is a bit complex but the assumptions are not radically wrong. As you know, my preference is for the current snapshot.

    The fact that the most likely specific EV total doesn’t always favor the probably winner is not the fault of numerical simulation. It is an intrinsic oddity of the Electoral College.

    It is true that the spikiness of the probability distribution is closely linked to the fact that the most likely specific EV total (i.e. the mode) doesn’t have to match the median outcome. The median is by definition the most probable outcome since it represents the middle of the probability distribution.

    If this is confusing, consider the following simplified example. Imagine that your favorite candidate has a 49% chance of winning in both OH and in VA. The most likely specific combination (the mode) is that he will lose both. Yet it is probable that he will win at least one.

  • John

    Would you please consider adding a new chart to present the all possibilities” chart.

    A cummulative distribution chart would be easier to understand. It could still have the same axis Electoral Votes.

    This would show the closeness of the race nationwide in an elightening way.

  • Steve Roth

    Dear Sam:

    I’d be very interested to see a blog post from you on the following.

    538 has said repeatedly that if Obama wins VA or OH or CO, there’s a very high probability of his winning the election.

    Question: can your model give the odds of Obama winning at least one of those three states?

    Can it give the odds of his winning the election if he wins at least one of those three states?

    Love to hear your thoughts.

    Thanks,

    Steve

  • Steve Roth

    Sam:

    Thanks! I’m kind of confused by the wording in the post though.

    “Steve asked if the model can give the odds *if* Obama wins CO, OH, or VA.”

    The “if” confused me. My first question was about the odds *of* Obama winning at least one of these.

    I *think* you said that the odds of that are 89%.

    The second question, about Obama’s odds of winning *if* he takes at least one of these. Am I accurate in subtracting the McCain numbers you provide and giving the remainder as the odds for Obama, as follows?

    * CO – with it, 99%
    * OH – with it, 99.9%
    * VA – with it, 99.9%

    Finally, is there are statistical reason why these wildly Obamomistic numbers are meaningless, even as a snapshot?

    Thanks again,

    Steve

  • Sam Wang

    Steve, I took the liberty of rewording your question to give me an opening to address the setup of your question.

    Yes, the win probability of one is 100% minus that of the other. However, your “Obamomistic” interpretation (now there’s a wild word) is wrong. Please read the following.

    I usually refrain from quoting probability measures because the uncertainty between now and Election Day is very large. The true probability has to be multiplied by the probability that polls won’t move far enough to reverse the outcome. This latter event is highly uncertain.

    The reason the probability looks so high is that the Meta-Analysis is a very precise snapshot of what is happening right now (“now” = polling dates). For example, in 2004 the final outcome was extremely close, a 34 EV margin. Yet the polling snapshot gave a high win probability for Bush, around 90%.

    So please, please, please focus on the Meta-Margin. It is far more to the point.

  • John W

    Sam: Thanks for the explanation re: anti-incumbent bias in the votes of undecideds. Are there specific data suggesting that these voters broke with higher probability toward the incumbent (Bush) in 2004 than in years past? For example, exit pollsters could have found this out by asking voters who they voted for and when they made their final decisions.

    Finally, any comments re: apparent discrepancies between exit-poll results and voting results in 2004?

  • Sam Wang

    The incumbent rule isn’t all that reliable. Read this and this.

    Exit polling from 2004 supports the idea that undecideds split about equally for Bush and Kerry: see this piece at Pollster.com.

    In regard to “discrepancies,” it’s probably the raw exit poll results that are at fault. Exit polls are prone to inaccuracy because they involve in-person interviews, which are prone to selection bias. The proper use of exit polls is to identify blowouts and to look at demographic data after they are normalized to fit actual results.

    Relevant to this topic, read the obituary of Warren Mitofsky, inventor of the exit poll. It’s fascinating. I think he was unfairly vilified by online commentators. It’s too bad; he was such an innovator.

  • Andrea Moro

    For conditional probabilities, PA is more interesting than the ones you are computing. My site has reported them since before the summer, I am currently forecasting McCain winning if he wins PA.
    http://presidentforecast.andreamoro.net

  • Sam Wang

    PA does not show signs of being competitive this year.

  • Sam Wang

    Comments are moderated. All comments are read. Since this is a permanent page, only those with a long shelf life will end up being posted. We also have email.

  • Stat

    Many thanks for doing this!

    I think your FAQ section on “what if states tend to vary together?” is a little off the mark. Elsewhere you have explained that this is a snapshot, so I think a discussion about states moving together between the polling and election day is already outside of purview.

    Instead, I think the question to address is whether the errors in the polls are correlated. For example, when polls rate McCain too high in NH (as compared to the results of an actual election in NH held at the time of the poll) are they likely to also be rating McCain too high in OH? My thinking is that, if there is any such correlation, it is probably dominated by a nationwide principal component. Thus, you are already doing an excellent job of quantifying this effect with the meta-margin!

    I am glad you adopted the polynomial approach that I suggested back in 2004. In case you want to give someone credit … I believe Joseph Fourier (1768-1830) is the one to thank. The polynomial approach is equivalent to efficiently achieving convolutions on finite integer distributions via discrete Fourier transforms.

    Thanks again!

  • David Kline

    Great site — thanks for doing it!

    If I’m reading you correctly, Jerseyvotes are proportional to the influence a single vote in a given state on the probability Obama gets at least 270 electoral votes.

    That probability is the sum of all the coefficients of x^k in your polynomial for k>= 270.

    Am I remembering this right, that somewhere in the nerd section you posted about the formula for the derivative of that probability with respect to pk?

    Thanks.

  • Sam Wang

    David, there is no explicit description of the definition given. To translate the code into prose, Jerseyvotes are calculated in the following way.

    Starting with current state polls, move all the margins down by the Meta-Margin, i.e. force a perfect 50-50 toss-up. Then vary the margin in one state S by a small amount and recalculate the probability you describe. Call the probability shift delta(S). Finally, define voter power, VP(S), as delta(S) divided by the number of persons casting votes in that state in 2004, in thousands.

    VP(S) is the relative power of one voter in state S to influence the overall outcome of the election, assuming things are close. It measures the effectiveness of persuading or transporting one voter in state S to the polls.

    I currently normalize VP(S) to the most powerful state that day. That state has been New Hampshire lately; today it’s Virginia. I can say with confidence that it will not be New Jersey this year.

  • S. DeDeo

    Hi Sam –

    I had a question on your work (which I think is terrific — it is always pleasing to find such a combinatorial simplification.)

    The first is how the probability P for each state is calculated. You can get the “expected” vote percentage from the state poll median, but how do you turn that into a probability (which would amount to determining the error on the estimate of the median.)

    Is it jack-knifed from subsets of the individual polls? Or is there an assumption of some intrinsic distribution which you then fit out?

    I say this in part because (as a scientist) I find the poll results very problematic — if you look at twenty polls, not one will be more than a or so sigma (estimated from 1/Sqrt(N)) from the mean. Put another way: pollsters cheat and often “renormalize” their results whenever they get an outlier. (Usually with some demographic claim.)

    A conservative thing to do, I feel, would be to use the average N of the polls and assume a variance of 1/Sqrt(N). Problematic would be to use the total N of the polls, or measure the intrinsic spread of each poll.

    (You are helped, of course, by the fact that using different states enforces greater honesty among pollsters who won’t cheat and make Delaware closer to Alaska.)

    And of course somewhere on your site you may have already measured the poll covariance. I believe your discussion of covariance above measures a different effect (the Delaware-Alaska coupling)? But perhaps I got that wrong.

    All the best,

    Simon

  • Sam Wang

    Simon, the methods are described in some detail here.

    For each state I use polling data to calculate a median and estimated standard error of the mean (SEM), then use these to infer a win probability using the z-score, mean/SEM. A floor value is placed on SEM to account for coincidentally similar results. This floor is approximately equal to the error predicted by sampling statistics.

    I have never noticed the anomaly you suggest exists. I am under the impression that the amount of variation among polls is what it should be, i.e. a bit more than expected by sampling error. More because of variation in methods among pollsters.

    Professional pollsters seek an accurate and honest result.They do weight data by age and other variables, but they do this consistently. By the way, the heterogeneity of the polled population poses a challenge that isn’t commonly discussed. I haven’t looked into the theory of that – yet. I am sure it has been written about.

  • Chris

    Sam, Thanks for the site and all your hard work. I have a question about the probability distribution of outcomes you compute from the polynomial equation you’ve described. Have you done any work to propagate the polling uncertainty through that equation, i.e., to quantify the uncertainty of the computed probability distribution itself due to sampling uncertainties in the underlying polling data? I don’t think I’ve seen that mentioned on the site, but perhaps I’ve missed it. Thanks.

  • Sam Wang

    Chris, that’s an interesting question. No, I haven’t. It would be manifested as uncertainty in the height of the peaks, which would still be in the same positions due to the granularity of the Electoral College.

    However, there is a small chance of a misunderstanding here. Polling uncertainty contributes to calculating a win probability, which is by definition a measure of uncertainty. Are you suggesting a calculation of the uncertainty in the win probability? Certainly it’s possible to do. But practically speaking, I think a global systematic error such as polling bias may be a larger source of error.

  • Chris

    Sam, Sorry if my question wasn’t entirely clear. Your calculation of the probability distribution of outcomes assumes that the individual state probabilities P_1,…,P_51 are known exactly, and that the computed distribution is the “true” one. But those probabilities have some uncertainty, and there will therefore be some uncertainty in the computed distribution. Any quantity derived from that distribution (a win probability, a 50th percentile expected outcome, etc.) will also have some uncertainty arising solely from uncertainties in the probabilities P_i. There’s presumably a reasonably simple procedure for propagating errors through your polynomial equation/convolution, or one could compute an ensemble of “exact probability distributions” based on different instances of the input probabilities P_i (drawn from the appropriate distributions). You are right that this would be manifested as uncertainty in the height of the peaks, and in practice, other sources of error may dominate. Not having digested all of your methodology, I was curious if that sort of uncertainty quantification was lurking somewhere in your analysis.

  • Eddie

    Would it be better to allocate undecideds based on a distribution than a fixed figure (I recall 50%?)? Maybe such a distribution could be uniform or normal with a somewhat naive and unbiased (by favor or history) assumption.

    There are perhaps other places to add ‘real world’ uncertainty to the model, but this could maybe be a softball.

    Maybe for the ObamaVPalin election in 2012 you could offer a series of snapshot distributions: the basic could be the current model, then added variation in another, then quantification of more wild and interesting subjective thoughts in one further, etc.

  • Sam Wang

    Using the uncertainty of how undecided voters will break would not alter the median of the EV distribution. See this calculation. It would broaden the distribution slightly if the uncertainty were +/-2% or greater. It doesn’t strike me as a strong improvement. But I do agree that some ability to consider such possibilities could be useful for Obama vs. Huckabee. Functions of a loosely similar sort can be found by right-clicking the pop-up maps.

  • patrick mcguire

    How may we find analyses of state polls – for instance, where are the changes in MO occurring (geographic, voter type, etc.)? As a native of the Show-Me State, I follow it closely! Do you consider some polling orgs to be more reliable than others? Thanks for your thorough website and your impressive work.

  • Lee

    In computing the meta-margin what do you do with the probability that falls to a 269-269 tie? Are you looking for the shift such that the total for 0-268 exactly balances the total for 270-538, as would be appropriate if the tie were resolved by a coin flip?

    My understanding is that the House of representatives gets to decide in the case of an electoral college tie that is not resolved by electoral college defectors. Each state’s delegation votes to assign its state’s one vote. You could assume that each representative votes along party lines, and thus determine which candidate would win such a tie breaker. If it is, say Obama, then I think the proper meta-margin is one that gives exactly 50% probability to the event that Obama has greater than or equal to 269 electoral college votes.

    But it could be more complicated. For instance, is it reasonable to assume that each representative of a state votes in accordance with the state’s popular vote? That case can be solved with a bivariate polynomial:

    f(x,y) = \prod_{i=1}^{51} ((1-p_i) + (p_i * x^{EV_i} * y))

    (Though if the District of Columbia does not participate in the tie breaker than it does not get the factor of y like the states do.) Assuming Cheney would vote for McCain, the polynomial terms for x^{269}*y^{k} for k at least 26 contribute to an Obama win, and the x^{269}*y^{k} terms for k no more than 25 contribute to a McCain win.

  • Sam Wang

    I don’t “do” anything with it. The Meta-Margin is the shift that would bring the median EV count to 269. The probability of that particular event is quite small even when the Meta-Margin is zero. This is pretty much all I have to say on the topic!

  • Brian

    How big a factor do you think voter purges will be in the upcoming election? I’m rather alarmed by today’s NYT article:

    http://www.nytimes.com/2008/10/09/us/politics/09voting.html?partner=rssuserland&emc=rss&pagewanted=all

  • Sam Wang

    Voter rights is an important topic. But I think that effectively, purges will not be a factor. All the maneuvering with regard to voter rolls and challenges are really useful at the margins, no more than 1 percent. That can make the difference in a very close race (think Florida 2000), but it’s rare.

  • Lee

    Given the relative insensitivity of the median EV to the changing meta-margin at this point, do you have time to supply us with an ongoing historical meta-margin graphic? Thanks!

  • Chris

    I noticed that in your blurb (right sidebar) you claim, “I suggest that the Meta-Analysis is a useful tool for gauging what moves voters.”

    Do you think your data are suitable for econometric time series analysis; if so, have you considered any empirical models (cointegration /multiple time series or ARIMA-style intervention analysis)?

    Not to sound too much like the cynical social scientists (that pehaps I am) . . . Elections are well and good, but what I find really interesting is your proposition that meta analysis provides a tool to gauge what moves voters (in the aggregate, at any rate).

  • Mark

    Your tools for letting the reader adjust for bias are the “Popular Meta-Margin” at the top of the page, and the +/- 2% maps for single state probabilities. Since you are a self-described geek and these use Java… you should replace that tool with a slider bar so the reader can adjust the bias to whatever level they wish. It would also link that presentation to the “Popular Meta-Margin”.

    I think it would be a very interesting presentation, and certainly nothing that the other poll aggregation sites have.

  • Bill

    Yet more evidence that your “split undecideds up 50:50″ assumption is applicable in this election cycle as well (and builds credibility for its general applicability).

    http://www.pollster.com/blogs/undecided_voters_and_racial_at.php

    Though I wonder if its applicability is limited to the general election. In the Democratic primaries, the undecideds seemed to break very significantly for the same candidate in each state: Clinton.

  • Peter Gozinya

    Sam Wang,

    Great Site!

    Can you post all 50 states for the Jersey votes..not just the top 10? I live in Cali and was going to vote for McCain but wanted to know if it’d be better if I just stay home.

    Thanks,
    -PG

  • bjorn akerman

    I value this site because you care enough to show us your past performance (2004), and because you extract sensible advice to any US citizen, democrat or republican. Mathematics for democracy !!
    As a Swedish citizen I cannot follow your advice, and out of this frustration I voted in the world-wide poll run by “the Economist”
    http://www.economist.com/vote2008/?a=true&cid=134&v=true
    Does this poll make any statistical sense, or is it just a pacifier for the rest of us 5.7 billion world citizens that cannot vote in the election that will affect us more than our own elections.

  • Sam Wang

    Bjorn – pacifier.

  • Lee

    In the current swing in the polls, I bet that some states’ polls have moved more than others. My intuition says, for instance, that highly partisan states probably had smaller percentage swings than swing states. Can this be quantified in a way that is useful for honing the predictions?

    That is, can you look at the data from this season and measure how much beta — see https://en.wikipedia.org/wiki/Beta_(finance) — each state’s poll has with respect to the benchmark of the national poll? Then you can compute the meta-margin not by assuming that each state’s polls move by the same amount, but by assuming that the national poll moves by some amount, and each state does its usual beta multiplier of that move.

  • LDE

    Hi, I hope I haven’t missed an explanation for this, but I notice that there is a discrepancy between meta-margins and predicted EVs on the main page versus those on the “history-of-electoral-votes-for-obama” page. On the main page, as of writing at 5.32 pm on 10/13/12, I see Obama 277, Romney 261, and MM at +0.66, whereas on the history page I Obama 286, Romney 252, MM at +0.96.

    Which of these is the most up to date?

    This is a great site–thanks for all the work and insight.

    Thanks, L

  • Frank

    Sam, I’m looking in after a while. Please remind me where on your site I can read an explanation for the difference in methodology that leads to an Obama EV distrib today that for Nate Silver peaks at (what appears to be) 332 which would include VA and FL instead of what appears to be) your 290 which would exclude them.

  • Thomas Russo

    Prof. Wang:
    How can you not screen for allegedly inaccurate state polls and yet arrive at an accurate result? As an example, if all three recent state polls were by Strategic Vision and were widely considered fraudulent, how would using such data arrive at likely correct result? Screening polls for accurate methodology may well insert bias, but do not statistical standards for polling methods provide some reliable level of screening polls?

    • Neus

      I believe Prof Wang has written or alluded to this in some of his writing and even in the review of past EVs.
      Prof Wang, correct me if I’m wrong, your model has factored this suspect polls indirectly by including them as possible “combinations” in the simulations. In otherwords, whether they are Nate’s numbers or (frankly) obscure polls, they are as indicated somewhere on this website “baked” in to the “quadrillions” of combinations. And it especially this attribute more than anything else, that has “wedded” me to this PEC site.

      Neus!

  • Frank

    PS. Your EV peak (290) makes more sense to me than Nate’s (332) because the latter has Obama winning Virginia and Florida despite Nate’s estimate of Obama’s chances of winning those states being respectively 47% and 33% so they will more often be in the Romney column.

  • Benjamin

    New reader to PEC. Excellent site.

    Given probabilities of carrying different states, how do you model correlation between state behaviors?

    Thanks.

  • JohnW

    It appears that Barrack’s chances jumped from ~90.5% to ~97% today. Is this normal volatility or did he have a really good polling day? Thanks.

  • Neus

    Dr. Wang—I need to make an upfront confession:
    After I have my morning prayers, and morning cup of tea, I go to Nate’s 538blog–and then I hurry back to PEC to check your numbers. So far I like the numbers I’m seeing. But even then, before getting to Nates and your numbers, my sighs and breathe are so deep! It is really like waiting for a doctor’s diagnosis of sorts.
    Before I hit my pillow, can you tell me the difference or the relationship between the Meta-Margin and the spread between Random Drift % and Prediction %.
    And thank you so much for being so professional. Your analytical rigour is an open secret!

    Neus!

Leave a Comment