#### Protected: Two ways to estimate primary outcomes without polls (transcript)

This content is password protected. To view it please enter your password below: Password:...

Senate: 50 Dem | 50 Rep (range: 47-52)

Control: (D+0.9%) from toss-up

Generic polling: D+1.2%

Control: D+1.2%

Biden/Harris 255 EV (R+1.0%)

Moneyball states: President GA NC PA

Some of you ask why various models are currently giving different probabilities. At the moment, everybody’s above 50% probability for a Clinton win, but with varying degrees of certainty.

Here at the Princeton Election Consortium we follow a relatively simple procedure to generate (1) a snapshot of conditions today (see the top line of the banner), and (2) a prediction of November outcomes (“Nov. win probability” in banner).

- use the median and distribution of state polls to estimate each state’s current margin and win probability (see the right sidebar),
- combine the probabilities to make an exact snapshot of today (the EV count above),
- calculate how far national sentiment is from an electoral tie (the Meta-Margin),
- calculate what would happen if the nation were to drift in either direction by November (gives the random-drift probability), and finally
- impose a prior that the final outcome will be somewhere in the vicinity of where it’s been so far, a.k.a. regression to the mean (gives the Bayesian probability).

Polls only, no “fundamentals,” no pollster corrections. All of the above steps have a history and a justification. It’s a transparent process. I’ll populate with links later.

Note that all Senate and House analysis on this page is snapshots only, no predictions for November.

PEC’s model works well when applied retrospectively to 2004-2008, and was extremely good in 2012. It is accurate and stable – but it does not generate that much news. There is not much pressure to attract eyeballs around here. It responds to changing opinion, but only after state-level data is available.

Other sites use national polls and other factors. Josh Katz at The Upshot gives a detailed comparison.

Topics:

Sam, thanks so much for your clear, simple and accurate polling. I truly hate the massaged polls, the special sauce, the whatever.

Why on earth would we not listen to the answers given by people to the question, who do you want to be the next president?

I have an odd feeling that after somewhat misjudging the primaries, Nate is hedging his bets a little bit with all the different models.

He has indeed become very popular and likewise the stakes have grown. It’s a pity because a lot of people liked his predictions better when they were simple and straightforward and with less “sizzle” and mystery.

Having said that, its quite possible though that he might be forecasting something with his “special sauce” that Sam’s model might eventually end up concurring with.

As long as one has enough polls choosing the median will eliminate polls with consistent biases.

However, as Drew Linzer showed (and Nate Silver knows) Rasmussen and Gravis are obvious outliers with a Republican bias and could be removed from the pool of pollsters to give an accurate forecast (just a different method than here at PEC – which is obviously very good.)

One could eventually defeat a median-based system by spamming it with enough ridiculously biased polls. But it would be hard to do it without the tactic becoming too obvious to ignore.

Nate’s instinct in the general election has always been to hedge a lot. That was true even in 2008. Most of what he does just serves to fuzz out his probability distribution to encompass more possibilities, including some really unlikely ones.

There’s clarity in simplification. Even if it removes “entertainment.”

Is it possible to see the jerseyvotes expanded on another page for all the states? Or does it just stop if a state’s power is 0? For at least the presidency would you considering reporting information on the changes in jerseyvote power?

I, too, prefer the absence the arbitrage in the Wang model vs Upshot and 538.

Cleaner and unbiased relative to the operator.

Sam, can you contrast the PEC with Drew Linzer’s votamatic? It does not seem to be active now, but in 2012 it was relatively stable from early July all the way to November.

I’m pretty sure he’ll issue his forecast when Q2 GDP is in on Jul 29th.

Very early in the cycle, Linzer seemed to be interested in fundamentals-only models that favored Trump to win.

I understand and appreciate the comments of SF Bays, but I think that s/he misses an important point. I have a table that I call Election-At-A-Glance. It is, in essence, a compendium of all of the various EV predictors, polling aggregators, and wagering/odds sites I can find.

Yes, the numbers from different sources vary, but by looking at all these sources together gives a very full picture of where the election is going.

I update about twice a week.

Stuart, can you clarify your point? Virtually every forecasting model still predicts a Clinton win, though the margin has narrowed. Are you suggesting that a Clinton win is still likely, or that Trump is on the path to overtaking her?

An aggregator-aggregator! I wish there were a site that aggregated all the people aggregating aggregators.

Poly Vote is a similar site that combines multiple prediction models, including polls, to produce a supposedly better aggregate model.

“An aggregator-aggregator! I wish there were a site that aggregated all the people aggregating aggregators.”

Matt: The New York Times’ coverage leans toward the one or two data-based poll statistitions. PEC is its only prognosticator described as all data.

In 2008, 3BlueDudes collected dozens of aggregators on their site. That year was the Burgess Shale of polling analysis.

If very old polls are retained in the sample, the results will lag the trend significantly. I think there should be some cutoff when there is a trend in the data. However, I would not know how one can draw a cutoff between the older polls to be discarded and the polls to be included.

I know you’ve explained the meta margin, but I just don’t get it. If someone is for Hillary, is a higher meta margin better, or a lower meta margin? I need to know what to root for ! Thanks

High positive Meta-Margin is good for Democrats.

Actually, the PEC analysis uses only the newest polls (as detailed in “The Methods” link on the left). On the other hand, if you look at the state-by-state data at 538 you will see that some very old polls still have significant weight in their models.

The meta-margin doesn’t lag, it just requires a pronounced and sustained trend before it is strongly affected. An exaggerated example:

Suppose the state Springfield is in has the following poll results: -1, +1, +2, +2, +2, +2, +6

Average: +2%

Median: +2%

Now suppose that a new poll comes out at -57 and the +6 poll is dropped.

Average: -7%

Median: +2%

The stability of the meta-margin is a feature, not a flaw and, in my opinion at least, makes it much more reliable than any other analysis I’ve seen.

The 538 model uses old polls with a weight that declines with time, but then it tries to adjust them for the national poll trendline. And the polls themselves are also adjusted for what 538 thinks is their house bias. So once it’s through chewing on the numbers, what the model says for a state can actually be quite different from what the polls say there.

Hi Sam,

I was wondering how the drift process works — when you say that you model if the nation were to drift, do you mean that model the drift of all the states together (e.g. a +1 point shift in the nation would move all state polls by ~1point)? I’m specifically wondering if the relative confidence of your model is at all related to the correlations between states as a function of drift, and whether you’ve looked into the matter. Thanks so much!

My understanding is that a key difference between Sam’s model and Nate’s is that Sam’s prior is that the race is stable, and Nate’s prior is based on economic fundamental which is this case would predict a close race with a Republican edge.

I think that difference is also reflected in the commentary. Sam tells us not to fret over small deviations, while Nate tells readers to expect a tight race.

Good point. I guess however, I’m still confused with difference between that apparently small 2.4% and the 80% Bayesian probability of Clinton winning.

On Sam’s site, Ohio and Florida just went white while on Nate’s site they show pink. I’d say those two states are going to be interesting to watch as both convention bounces dissipate.

Why DID they go to white (ie tied) today? I havent seen any new state polls for them, and Huff post still has them slightly blue.

Might be something in the model that I just dont understand…

@Adam: it’s possible that an older poll dropped out of the time window that Sam uses.

Adam: Florida changing to white doesnt mean tied, it mean that it isnt currently over 60% probability on either side.

Am I right, that soon, PEC’s header will start to reflect the tightening shown by other pollsters, or is the model so distinct that it excludes much of what they feature? So, the recent “tightening”, either hyped, biased, or manufactured, or not, will remain outside of PEC’s numbers?

It’s already tightened quite a bit, from 4 points Meta-Margin to just above 2. If Trump is really ahead nationally, whether that shows up here probably depends on how long it lasts.

I think one positive aspect is that state polls eliminate the really strange polls like the USC tracking one – which seems to have the most bizarre trends of any group out there. For example, they have Trump up 7, now (with Clinton barely leading among women). Alright cool. But then most respondents still say Clinton will win in November. I haven’t heard many true Trump supporters/voters that also say they think Clinton will win. Something doesn’t smell right.

My view is that if Mr. Trump wins FL, OH, NC and PA , he will win the election. But he has to win all 4.

He could do it without NC, if he holds the other three plus both Iowa and Nevada. It would be close to the minimum possible win, 270 electoral votes.

He could lose Omaha and make it the 269-269 tie with a win in the House of Representatives.

…many more possibilities open up if Trump wins Virginia, but Trump is not gonna win Virginia.

I live in NC, and this state is very interesting. There’s a lot of animosity against the Governor, obviously. But as we all know, compared to the others listed, it’s the more conservative state.

Right now, Clinton has been running ads here (including the awesome “Our Children are Listening” spot) since June. Ad plays are telling, because campaigns, with their deep polling that the public never sees, run ads where they feel they have a chance to both win or lose. If a GOP candidate struggles to win NC, they will really struggle to win ALL of Florida, Ohio and Pennsylvania.

I think with Kaine, VA is Clinton’s. So she just needs to win one of OH, PA or FL.

I think she will hold on to Michigan and Wisconsin. Nevada is strange. But I also have heard Clinton just pulled TV ads out of Colorado. That is a tell, indicating they are feeling pretty good about that state.

For what it’s worth, I have only seen one Trump ad here (by a PAC). Trump did say a day ago that “I will be in NC so much you will be sick of me” (paraphrasing…we are already sick of him FWIW). So he is having to fight here.

I look for Kaine to camp out in Florida, Virginia and NC. I look for Biden in OH, MI, and PA, and Clinton to hit them all. The Kasich-Trump feud can only help.

In short: NC won’t be the deciding state, I truly believe. But as long as Clinton and Trump are fighting here, I feel better about the chances. If Clinton starts to pull away in either PA or Florida (the two more likely), it’s good.

Ohio suddenly turned quite blue on Sam’s map!! That is quite a change!! Was there a NEGATIVE bounce for Trump in that state as a consequence of the RNC? And perhaps the feud with Kasich? Just wondering!!!

Yes, there’s something a little odd with the feed. I think it’s counting the sub-tabs as polls. Diagnosing…

I think the hardest get for Trump is Pennsylvania. If he loses Pennsylvania and Virginia, he needs to win Iowa, Nevada, and New Hampshire (in addition to Florida, Ohio, and North Carolina) and that gets him a 269-269 tie which he’d probably win in the House.

That seems to me to be his easiest path to the White House, difficult as it is. Although Sam does have Pennsylvania as significantly closer than Ohio, New Hampshire, and North Carolina, most other sources list it as a lower probability. A new poll just came out with Clinton up by 9% in PA from Suffolk that I don’t think is reflected by the model here yet.

Again, people are naming and adding up states Trump needs, but I still don’t get how these states are being selected. The masthead already shows that if this election changes a lot, Trump has up to a 35% chance at victory.

How could that happen, really?

Let’s seriously examine how PEC’s edittable EV map has color-coded the states.

To be YUGE-ly unbelievably fair to Trump, let’s count as red ALL states which are now

•Red

•Coral

•Pink

•White AND

•Pale Blue.

That’s five of the 7 color types, only 3 of which actually lean toward Trump. So that throws Trump every state with under an 81% probability of going blue, including NC, Florida, ever-strobing Nevada, Iowa and New Hampshire.

Even so, Trump still doesnt win.

There are 3 medium blue states left, each roughly 81-95% likely to stay blue: Oregon, Michigan and Ohio. Trump still loses if we give him Oregon or Michigan. But okay, okay, we know you want it. So just for the Art of the Deal [title by ghostwriter] we’ll throw in the one medium blue state with the MOST electoral votes: Ohio.

Aaaand… with that Trump finally only ties Clinton at 269 to 269. That’s happened 3 times in history -the 1800s. One of those was a real four way race. But a tie is likelier than _that_ since 1964, when DC was awarded 3 EVs. Even so, a tie hasn’t happened since the 19th century.

To actually outright win, Trump needs two out of three of these states.

BUT: the House of Representatives decides ties, and the House skews Don-ward. Still not a sure thing though, not this year.

Yeah, yeah Pennsylvania could eventually pink up, but now at over 95% likely Dem, today she’s Blue Safe and not in play.

So. That’s what’d have to happen and it’s pretty outrageous.

Unless of course Sam knows something I don’t know. Which he does, every day.

I am beginning to suspect that Nate Silver’s proprietary (non-transparent) “special sauce” has a hallucinogenic ingredient in its mix.

He has not only Iowa and Nevada leaning towards Trump but Ohio, Florida and North Carolina. Perhaps most surprisingly, he even has New Hampshire colored pink!

I’ll have my polls-only election prediction, hold the “special sauce”, to go, please.

Is it too early to say that Trump’s bump has run its course and we’re about set for a Clinton rebound? The fact that Hillary is still ahead in the meta-margin and some late polls tells me so.

Polls are going to keep trickling out from the earlier period. NH just turned white because of an oddball R-convention-week poll from “InsideSources/New Hampshire Journal” giving Trump a huge lead there. Might be the Trump bump, might be an outlier.

Thanks, Sam. This is helpful.

I have two questions:

(1) when you say “combine the probabilities to make an exact snapshot of today,” how do you determine correlations between state outcomes?

(2) are you going to put out some of the results of the Google Correlate analysis that someone else (sorry, can’t remember the name) did in the primaries?

He doesn’t-each state is treated as independent, and the coefficients of the probability generating function terms give the probability for each total EV outcome.

The metamargin is an attempt to address a systematic correlation–by finding how much every poll would have to be off to induce a tie, you can approximately bound the effect of correlation.

I’ve been thinking of making a homegrown model that would impute polling values to unpolled states from other sources, rather than last election, but mixing models like that makes my head a little sore.

The only strange / questionable part of the process seems to be this:

“impose a prior that the final outcome will be somewhere in the vicinity of where it’s been so far, a.k.a. regression to the mean (gives the Bayesian probability)”

First, because it leaves open the question of how strong the prior is. It’s easy to make it virtually impossible for the race to shift more than 2% from the March-June average, but that would be a mistake since the actual odds aren’t that low.

Second, because it assumes that voters have already basically decided who to vote for, so that the majority of changes in the polls don’t reflect changes in public opinion, they’re just assumed (by the Bayesian prior) to be statistical flukes. What makes a prior non-arbitrarily chosen in a political poll? How do you avoid the problem where after a decided shift in public opinion (e.g. +3 meta-margin for Trump), the Bayesian win probability for Clinton doesn’t reflect that?

No, the prior is fairly weak. It is quite broad around the season’s average… +/-6%, if I recall correctly. It exerts a gentle bias.

As the election draws closer, the random-drift range becomes quite narrow and then the prior does not matter any more.

In the past two weeks, we have had judges throw out part of the laws in NC and WI designed to lower voter turnout. Do you figure that into any of your predictions? If so, how?

While it is quite significant to those that would have otherwise been prevented from voting, I’d have to believe the percentages are quite small. This may be more significant ‘down ticket’ at this point. The only thing to watch here is if 3rd party candidates start pulling significant numbers.