Princeton Election Consortium

Innovations in democracy since 2004

Outcome: Biden 306 EV (D+1.2% from toss-up), Senate 50 D (D+1.0%)
Nov 3 polls: Biden 342 EV (D+5.3%), Senate 50-55 D (D+3.9%), House control D+4.6%
Moneyball states: President AZ NE-2 NV, Senate MT ME AK, Legislatures KS TX NC

Stat-urday: How to pinpoint the Ryan bounce

August 17th, 2012, 4:05pm by Sam Wang

I don’t own a cat. This rules out catblogging. Instead I will do something geekier: statblogging. (Stat-urday, nyuck nyuck.)

Today I show you how I look at a simple data set. It’s a timely example: the Ryan “bounce.” Did it happen at all? How big was it?

I will show you how I think about such a question. The process involves some of the first lessons I use when introducing my students to basic statistics. As a timely example, we’ll put some of Nate Silver’s statistics under the microscope. My conclusion is that Ryan probably did yield a 1-2% bounce (tiny), but that seeing it properly is aided greatly by our EV snapshot of state polls.

If you find this kind of geekery boring, see my older post, without square roots and whatnot. The bottom line: Ryan is more likely to affect the balance of power in the Senate!

When I teach my freshmen to be statistically literate consumers of news, I tell them that there is a difference between (a) how large an effect is, and (b) proving that there’s an effect at all. In today’s example, doing the latter is surprisingly hard.

The first concept, (a), is what we ultimately care about. It refers to things like how much longer we’ll live if we give up smoking, or how much John F. Kennedy benefited from having Lyndon Johnson as a running mate. In these two examples, the effect was large.

However, for an academic to publish a result, he/she only needs to meet standard (b). And effects can still be quite small, yet still pass statistical tests. For example, the billion-dollar drug Tarceva was approved even though it prolongs the life of pancreatic cancer patients by only 14.7 days on average. The one advantage of standard (b) is that if it’s not met, then the idea lacks support and it is time to stop babbling about it. (As you might guess, most pundits have very little background in practical statistics.)

So before we can talk about how large an effect is, for instance the Ryan bounce, we first have to determine whether there was a bounce at all, i.e. is criterion (b) above met? As it turns out, Paul Ryan is the Tarceva of VP candidates – at least based on one type of gold-standard evidence.

Recently Nate Silver helpfully updated his compilation of all the cases where the same pollster has done before-and-after-Ryan surveys. Statisticians call these paired measurements. Here is the key table:

But did Romney really get a 1-point bounce? On these grounds alone, no.

To analyze these paired measurements, we need to do what is called a “paired t-test,”which involves doing the following:

(1) Calculate the mean and standard deviation (calling a Romney gain positive and an Obama gain negative) to get mean=+0.80%, SD=3.28%.

(2) Divide by the square root of the number of comparisons (n=11) to get the standard error of the mean. SEM=3.28%/sqrt(15)=0.85%. SEM is our measure of how close we are to the exact size of the actual Ryan bounce, whatever it is. The sqrt(n) means that more comparisons give us a smaller SEM.

(3) Calculate the Z-score, which is (average effect)/SEM = (Romney +0.80%)/(0.85%) = 0.95. Colloquially, Romney +0.80% is a 0.95-sigma difference.

Now we have to calculate (using a stats table or program) the probability that Z=0.95 arose by chance, using a one-tailed test. “One-tailed” means we are trying to rule out one tail, in this case the possibility that Romney lost ground. If we can rule that out, Romney must have gained ground.

As it turns out, Z=0.95 for n=15 gives a one-tailed probability of 18% (or as research scientists would put it, p=0.18). In experimental sciences, the usual publication standard is p<0.05. In other words, there is a 1 in 5 chance that Romney lost ground! In fact, the 95% confidence interval is between a 2.5% gain for Romney and a 0.9% gain for Obama. That’s right – based on the data above, we couldn’t rule out the possibility that Obama gained 1 point!

To be fair, the other way of looking at the data is that there’s an 82% probability that Romney gained, however little. This gets us back to point (a); the gain is so small that we have to struggle to find it.

What Z-score does it take to get a significant result? Here are some rules of thumb.

If we are expecting the result to go in a particular direction, do a one-tailed test. Ballpark, for a one-tailed test it takes a Z-score of 1.71 to get to p<0.05 significance. (If we don’t know which direction the effect will go, then we need a two-tailed test. The Z-score needs to get to about 2.0.)

In the case of the Ryan bounce, if the effect really were Romney +0.8%, for a one-tailed test we’d need a total of 49 paired polls — i.e.,  more than three times as many as are available in the example above. Practically speaking, such a small bounce is unlikely to be resolved using paired polls alone.

However, a second source of information is available.

* * *

That source would be us – the Meta-analysis here at the Princeton Election Consortium. In this analysis, we (Andrew Ferguson and I) have combined close to 100 state polls to get a high-resolution snapshot.

Today’s snapshot suggests an initial estimate of the Ryan bounce as being about 2.0%. Note that the input data overlap somewhat with the FiveThirtyEight posting, so the two are not entirely independent estimates.

Estimating statistical confidence for the Meta-analysis is complicated, and a long story. Bottom line, I estimate that the Meta-margin is usually accurate to about 0.5%, perhaps a bit worse when polls are sparse. Combining our estimate (R+1.0% to R+3.0%) with the estimate from the paired polls (O+0.9% to R+2.5%), Ryan probably did give a 1.0-2.0% bounce to Mitt Romney.

Update and caveat: The fact that post-Ryan state polls are so dominated by Rasmussen and Purple Strategies, both GOP-leaning outfits, is a real issue. Time will tell.

History of Popular Meta-Margin for Obama

Tags: 2012 Election · Meta-analysis · President

19 Comments so far ↓

  • Stuart Levine

    Query: Is that “average,” “better than average,” or “worse than average.”

    • Sam Wang

      Despite the large amount of media gas on the topic, it’s too early to tell. Ask again Monday.

      As of today it’s equivalent to 24 EV, somewhat less than the approximately 36 EV for John Edwards in 2004. Before that, much less information was available.

      The bottom line is that VP picks are usually important for the VP’s home state, and that’s about it. For Wisconsin, Scott Walker would have been a better choice, but he would have said no. Read my previous posts for what really matters about Ryan: the yoking of the Presidential race with Senate/House races.

  • Matt McIrvin

    Isn’t it usually the convention, not the actual VP announcement, that produces the bigger bounce? Since frequently these are nearly contemporaneous events, I imagine it’s hard to tease them apart.

    I fully expect this to happen again: Ryan will appear at the convention before a sympathetic audience, give the kind of carefully scripted speech that politicians usually give at conventions, and come off surprisingly well, just like Sarah Palin did in 2008.

    But since this year the Democratic convention will come immediately afterward, it may not be easy to see the resulting bounce in isolation.

  • Bill N

    So given your previous analyses, what would you estimate the half-life of a bounce such as this? I suspect your answer will be like so many answers to questions about dynamic systems…it depends.

    • Sam Wang

      I don’t think we’ll be able to tell. Matt McIrvin has it about right. Also, see Nate Silver’s comment that even a subtle shift changes the mood, mainly through media coverage I think.

      Honestly, I think these teeny shifts don’t actually matter. But what else is there to discuss? Policies?

  • wheelers cat

    “I don’t think we’ll be able to tell.”
    or care. i predict the nano-bounce wont flip the curves.

    “What else is there to discuss?”
    You can’t separate out all the interactions and dependencies. Everything is connected.
    For example I believe Romney asked Rubio first, or at least felt him up heavily.
    Its logical from Nates VP post that Rubio could help deliver FL, which Romney desperately needs. Ryan can hurt Romney with seniors in FL.
    But Rubio refused, and more to the point, the GOP elites didnt force him onto the ticket. That means (i think) that the GOP elites expect Romney to lose.
    Ryan was the most telegenic option left. The least worst choice. The Ryan plan is not an advantage.
    Krauthammer already told Ryan to walk away from it.
    Rubio was spared for 2016– The Battle for the Browns.
    Where he will have to go up against Julian Castro, current keynote speaker for the democratic convention… Obama was in 2004.

    • Sam Wang

      Mostly I agree. Strong candidates sat out the primaries. Now strong running mates seem to have faded into the woodwork, either because they are keeping their powder dry, or because Romney is playing a defensive game.

      The ones likeliest to say yes to the VP slot are those whose careers would be helped by it. Think of Jack Kemp in 1996, accepting second position to Bob Dole. In this context the choices of Portman, Pawlenty, and Ryan make sense. Romney was probably correct to throw the Hail Mary – but as I wrote the other day, he also threw Congress under the bus.

      Rubio 2016, huh? That does require the GOP to become more tolerant of heterodoxy.

  • Jacob Hartog

    Could we run an interrupted time series on the meta-analysis?

  • Jacob Hartog

    I’m not sure we should distinguish “real” changes from potential draws from a probability distribution in this way. Another way of looking at this— if we believe that the movement of the meta-analysis prediction from day to day is a random draw from a gaussian distribution, where in that gaussian distribution is the day to day break following the Ryan pick? In the 95th percentile of predicted day to day movement, or the 70th?

    • Sam Wang

      The EV estimator has complex properties for several reasons: pollster bias, irregular data availability, and lumpy distribution of EV among states, to name three big ones.

      The first one could be corrected with effort. For example, many recent polls are dominated by Purple Strategies, a GOP-leaning outfit by about 2%. This is a weakness in today’s estimate, one that will resolve itself as more data come in.

      Rather than get into complicated modeling, it is better to look at past years empirically and see how the snapshot wanders. Even better than that…wait until next Monday or Tuesday and see what happens.

  • Matt McIrvin

    2008 was a case where the timing of the VP announcement itself may have been particularly consequential, at least in the short term: it came the day after the Democratic convention ended, and by picking Palin they made such a big media splash that it may have squashed any Obama/Biden convention bounce instantly. Of course it did them no good in the long run.

  • Bill N

    When does the next big “dump” of polls occur that can help clarify the apparent Ryan bounce in terms of magnitude and durability? Especially from Ohio and Florida? If I understand the characteristics of the polls the estimated Ryan bounce is based on, they tend to have a Republican bias. Also, have you given any more thought to a sensitivity or perturbation analysis on the potential impact of the voter ID laws in such important states as Pennsylvania?

  • wheelers cat

    Dr. Wang, was Palin a black swan event?
    She got an unprecedented response, and she did flip the curve for a while.

    • Sam Wang

      I hate to be unsentimental about these events, but if you ignore their content, they are all simply “events” with characteristics: size, duration, and unknown long-term effects. When you look at waves on the ocean, some seem larger and some seem smaller, but they follow an orderly distribution. There’s a book coming out this fall by Erikson and Wlezien that shows timelines of presidential elections. Detailed analysis of that should reveal the distribution…if the authors haven’t done that already.

      Palin was a most unusual choice, and did yield a brief but large bump for McCain. But not quite black-swan in the sense that I mean, i.e. driving the final outcome to (say) 3 SD from the mean.

  • wheelers cat

    …and, I think its totes Rubio. The subtext of this election is the demographic timer, but conservatives arent going to talk about that. I think we have already passed the tipping point.
    Media saturation and the extended campaign season have nearly eliminated undecideds. Almost everyone has already made up their mind.
    This election is dependent on turnout– not GOP turnout, that is already maxxed. Its dependent on democratic turnout, and whether GOP methods like redistricting and voter id laws can suppress enough democratic voters while the party tries to begin to appeal to latino and hispanic demographics.
    There are simply more democratic voters in the electorate now. Surely you have heard the screams of “dem oversampling” from conservative pundits. 2008 was the first year grouped minorities plus white liberals began to achieve electoral parity with white conservatives….but the black swan event of the Econopalypse obscures the data.
    That reminds me…..Nate never reposted his hastily retracted sampling post like he promised.
    I wonder why….

  • Olav Grinde

    @Wheeler’s Cat: “Nate’s hastily retracted sampling post…” Would you be so kind as to elaborate?

  • Jacob Hartog

    “They are all simply “events” with characteristics: size, duration, and unknown long-term effects.”

    This expresses something I’ve been thinking about: why distinguish between “random” variation in political opinion and “real” events; apart from measurement or sampling error (people lying to pollsters or a non-representative sample being surveyed), all changes in polls express something real, although only a portion of it is worthy of note (as being larger than the day-to-day change for any given day) and still less will potentially alter the electoral outcome in November or the policy outcome in 2013.

  • wheelers cat

    Nate tweeted a new post on sampling on 8/4
    but when you go to that link its something else entirely.
    Nate said:
    An earlier post in this space about poll oversampling was published in error and will be updated and published later this week.

    Its the 18th, TWO weeks later, and still no sampling post.
    I read the sampling post before he pulled it– it seem fully vine-ripe to me.
    I call shenanigans!

Leave a Comment