#### May 1st, 2016, 3:08pm by Sam Wang

Note: I have found a problem with this calculation. The national-poll-based Clinton win probability is closer to 70%. I will have an update and explanation soon.

General-election matchup polls (e.g. Clinton v. Trump) started to become informative in February. In May, they tell us quite a lot – and give a way to estimate the probability of a Hillary Clinton victory.

First, let us examine the primary evidence. Wlezien and Erikson have gathered presidential preference polls from 1952-2008:

These graphs show that during the year of the general election, polls gradually converge to a point that is close to the actual November outcome.

Wlezien and Erikson expressed their findings in terms of correlation coefficients. In early February (about 280 days from the election), the correlation between polls and November outcomes is +0.2, where 0.0 corresponds to no relationship and +1.0 indicates a perfect relationship. The correlation rises to +0.9 by October. However, this measure is not easily used by consumers of polls.

Instead, a more intuitive measure is how far polls tend to move over time.

To calculate this box-and-whisker plot I also included 2012 data (spreadsheet here). Positive values indicate that the Democratic candidate did worse in November than in polls. The box indicates the interquartile range, i.e. the middle 50%, and the whiskers indicate the range. The red points indicate two outliers: the elections of 1964 (Johnson v. Goldwater) and 1980 (Carter v. Reagan v. Anderson). In May, polls overestimated support for the Democratic candidate by over 10 percentage points. For obvious reasons, Republican-leaning pundits like to write about 1980. But that is one case out of 16 elections.

Instead of such cherrypicking, it is more accurate to include them as part of an analysis of all 16 elections. The full range and estimated standard deviation of poll-outcome differences looks like this:

On average, polls have little or no bias relative to November, but have some variation, which is what we care about. That variation is quantified by the standard deviation (SD). I estimated SD using median absolute deviation (MAD), and verified this approach using interquartile range divided by 1.35. For March and April, the standard deviation is around 4 percentage points.

The November outcome should be within 1 SD of current polls approximately two-thirds of the time. Hillary Clinton’s polling margin over Donald Trump is currently +8% (median of 19 pollsters since mid-March) – twice the standard deviation. Based on past years, how likely is it that Trump can catch up? It is possible to convert Clinton’s lead to a probability using the t-distribution*, which can account for outlier events like 1964 and 1980. Using this approach, the probability that Trump can catch up by November is 9%, and the probability that Clinton will remain ahead of Trump is 91%**. This probability doesn’t take into account Electoral College mechanisms. But since the bias of the Electoral College is quite small, it does not make a difference in the calculation.

I should note that the polls have been telling us this information for some time. In the first half of March, Clinton led Trump by a median of 9 percentage points. Using an SD of 4.5 percentage points, her win probability would come out as 93%. So today’s estimate has been knowable for several months.

This is a result that may excite Democrats. However, it is subject to change. For example, the SD increases to about 7% in June, which combined with a lead of Clinton +8% corresponds to an 83% win probability, less certain than today. And of course the polls could change. I don’t know why polls would be less predictive in summer. Maybe general election campaign events drive polls away from where they would naturally go otherwise. Post-convention bounces would be examples of such events.

This estimate is also independent of other factors, such as the state of the economy and Clinton and Trump’s net favorability/unfavorability. Most such factors should already be partially baked into the polls, and therefore might not add much information. Now that polls are predictive, they give us a more direct measure of what will happen in November.

*In MATLAB: prob=tcdf(clinton_trump_margin/4.5,3). In Excel: =1-TDIST(clinton_trump_margin/4.5,3,1)

**Modified to allow for the possibility of systematic error in polls. I assumed that polls will be off systematically by +/-2%, even on Election Eve. Calculating effective SD using the formula sqrt(SD^2 + 2*2), gives an effective standard deviation of 4.5% instead of 4%.

• Commentor

If you prefer horse race lingo, that is about 10 to 1.

• E L

Larry Sabato of UVA made a prediction with great certainty last night that Clinton would win based on current polls. He said it was extremely early for him to predict. What’s different about this year is both candidates have about a 99% name recognition and the public has 20+ years experience with each.. In every other presidential contest since 1980, one candidate had much greater name recognition than the other so the race was more subject to change as time went by. This year the margin between Trump and Clinton may be frozen already.

• I agree with him and with you. However, that involves adding more facts to the argument: (1) Public opinion during general election campaigns is more stable than it used to be. (2) Hillary Clinton and Donald Trump are extremely well-known, and there is less room for opinion to move. Those are perfectly valid points, but I am trying to show how far one can get without using that information.

• Probably the more relevant fact is Trump’s net approval of about -35% among general voters. (Update: actually -28%. See this.)

• Matt McIrvin

Hillary Clinton is herself about -13 or -14%, though, so the Trump number is less impressively bad than it sounds.

(Note: that graph’s negative slope looks scarier than it is because the horizontal scale is extremely compressed; it goes all the way back to January 2009. Clinton had very high public approval as Secretary of State and reverted rapidly to being a divisive figure when it became clear she was running for President again.)

• Donnie

Agree. My point is simply:

Trump started at -32% among Republican voters. No one covering the race at the time thought he could improve that number in a meaningful way. Now he’s at +20% among Republican voters.

Trump is currently at -35% among general election voters. Why are people covering the race still so confident that he can’t improve that number in a meaningful way?

• E L

Sam: I’m just looking for the warm comfort of assurances that Trump will lose. Sabato thinks Clinton would win all 2012 Obama states and add N. Carolina plus Arizona, Georgia, and probably (please make sure you’re seated) Utah.

• Truthy

Donnie, because in the end Republican voters have to rally around the likely Republican candiddate. The rest of the population do not, and as the polls will tell you, will not.

• Marc

@Donnie @Truthie

Trump also benefited from a very divided field in the primary. In the general he’s likely to face only one opponent. Had he faced fewer opponents for less time early in the primary process, he might not be in the position he is now.

• Matt McIrvin

Sabato’s projection sounds like it’s just a straight reading of recent “if the election were held today” state preference polls. That’s what they say; there’s a Wikipedia page that collects them.

I think that if the election were held today, the chance of Clinton beating Trump would indeed be pretty much 100%. The question is just how much things can change between now and the fall.

• Matt McIrvin

Polls have moved substantially in Hillary Clinton’s direction since six months ago.

Obviously a new recession would be bad for the incumbent party, but Trump is currently disliked enough that it might be possible to spin even big crises against him in a “whose hand do you want in the tiller?” sense.

• There is a qualitative difference to the polls in the 21st century. The lines tend to bunch up and cluster tighter around zero. Unlike, say 1964, when it appears the polls were useless until a few days before the election.

One can imagine it going the other way — the middle decades of the 20th century being the golden age of polling when everyone had a land line and picked up the phone to talk to the nice men from Gallup.

• Sean Patrick Santos

I can’t decide whether I agree with you. It looks like the craziest years were 1964, 1980, and 1992, so maybe there just tends to be such an event about every 4 elections or so, and we’re just overdue as a statistical fluke.

On the other hand, you’re right that the more recent years look much “tighter”. They’re not just pretty close to unbiased, they barely shift at all. I wonder if it has to do with the degree to which people are already committed to a party in January.

Despite the large number of independents out there, the polarization of American politics suggests a pretty significant fraction of people have a party they mistrust/fear/hate, and so have almost zero chance of voting for. On the other hand, the three elections I mentioned all are known for being cases that shuffled demographics (and in particular, cover the time over which Democrats consolidated the black vote and the Republicans locked down white Southerners). The race is more likely to have big shifts if there’s some events or particular candidate (or both) that are swaying people who used to be party loyalists and making them into swing voters.

Maybe elections are going to stay a lot more predictable until the next time one of the candidates manages to win over a lot of the other party’s voters again (or else alienate an entire demographic that their own party counts on). Which, to be fair, Trump would definitely be moving voters between parties, if it weren’t for the fact that the demographics he’s insulting are already mostly Democrats. It may still be case that this year will be less “tight” due to this.

• Perhaps certain years stand out because they were blowouts (1964, 1980, 1992) and what’s being plotted is the absolute difference, rather than the relative difference. For instance, Johnson won 61% of the popular vote against Goldwater.

If they plotted (polling-actual)/actual, that might mitigate the tendency of polls to go nonlinear when they get too far away from 50/50.

• Matt McIrvin

Flukes of timing are still going to happen occasionally. Consider what would have happened if the September 11th attacks had been in the year that George W. Bush was up for reelection. It would have been a Reagan-sized landslide at minimum; a sweep of 538 electoral votes would have been possible. Maybe 535: DC would be a hard sell.

(I actually can’t conceive of any scenario whatsoever in which today’s Democrats could win a landslide like that. Crises make people rally around the President, but they also make people more conservative. I suspect that a Democrat could not have benefited from 9/11 to the degree that Bush did.)

• Joel

How predictive are favorable/unfavorable ratings after all? I get the sense that candidate preference polls are the only reliable form of polling. I think this is especially true when you consider how negative messaging appears to be particularly “sticky”.

• That is a good point. This season in the GOP race, Trump’s favorable/unfavorable rating among Republican voters went up along with his success. However, I do not see an obvious way for that to apply to the general election. As today’s post shows, head-to-head matchups are not that movable.

• Matt McIrvin

I get the impression that a Clinton v. Trump matchup would be extremely unusual in that both candidates have high negatives with the general population.

• Ken

I’m looking forward to the 2016 Median EV Estimator. Will it make its appearance and replace the 2012 estimator soon?

• Need state polls for that. June maybe.

• There have been lots of state polls already of course. Now, they are much sparser than they will be, but they are there. If you want to use any of what I’ve collected on that please feel free. I update several times a week as I find new polls. I’ve made the raw list of state level polls I use to drive the charts and graphs on my site available as an easily machine parseable pipe delimited text file here: http://electiongraphs.com/2016ec/polldatasorted.txt

• whirlaway

There appear to be several wildcards particularly this time though.

1. Economy and stock market
2. The FBI investigation re: Clinton’s email server
3. Possibility of a terrorist attack (in the past attacks in Europe didn’t seem to matter much to the voters but that is changing now)
4. Donald Trump himself

• E L

Sam: I understand you don’t factor in these factors when making your statistical analysis but, to me, they are vital in a 2016 race. 1. I can’t see young tech guys going to Trump to set up a sophisticated voter database operation like Obama had. 2. He does not have a solid foundation for a money raising operation.

• George

Sam or some of you other wonky people can correct me if I am wrong, but the preference polls on the R side NEVER predicted a Trump implosion. That was always the wishful thinking of the R establishment. So if the polls have been relatively accurate in assessing his actual popularity, I think they will continue to accurately reflect that. And the reality of that is outside of the Republican primary voting electorate, he is not a strong candidate (I deferred from quoting Lindsey Graham)

• Mr. Wang: Thank you for the interesting analysis.

• BrianTH

I suspect the timing of when the presumptive nominees are determined (in the eyes of the public and in the behavior of the parties) may be relevant, such that merely counting back from the general election is not a perfectly reliable way of making apples to apples comparisons in different cycles.

That said, I doubt that will help Trump, since if anything the Democrats are in a much more better position when it comes to using the normal mechanisms to promote party unity. But there could be an upcoming period in which Clinton ends up with something of an “artificial” advantage if she is the presumptive nominee and Trump still is not.

• Some Body

My thoughts exactly. Another point is that the extent to which both probable nominees are already known to the public may eliminate an important source of error in early polls.

• Matt McIrvin

It could actually be the other way around, if the Republicans become resigned to a Trump nomination, but the Bernie Sanders campaign refuses to stay down and a significant number of his supporters still believe in some kind of superdelegate miracle at the convention.

• Matt McIrvin

My comment from May 2nd appears to have been prophetic, if only by less than 48 hours.

• BrianTH

I was just coming here to post the same thing, Matt. You definitely made the right call on the sequencing of the presumptive nominees.

So I will in fact reverse that point and suggest it might be Trump who now gets an artificial advantage for a while, assuming Sanders continues on (my guess is he will concede and starting campaigning for Clinton after June 7, but that is indeed just a guess).

• How important is turnout for these estimates? For the Ds, at least, shifting R or “undecided” voters is less important than GOTV. In particular, there are 5-10M Hispanic citizens who are unregistered or don’t vote, and Trump makes them a pure D goldmine.

• Josh

Most Hispanics in the US currently live in states that probably won’t be in play during this Presidential election: California, Texas, Illinois, New York, Arizona, New Mexico. Florida is an exception, as is Colorado. If current polling holds up, though, these states will go for Clinton even if Hispanic voting tracks historically.

• Matt McIrvin

Current polling suggests that Arizona is in play.

Running up the score in Florida is a useful thing to do: it’s historically been on a knife edge, and it has enough electoral votes that, as Chris Cillizza recently pointed out, winning it basically guarantees a Democratic win. If you take all the deep-blue states that have voted Democratic in the last six election cycles, then add Florida, you get 270 electoral votes.

(It’s very similar to, though slightly worse than, the counterfactual map in which Al Gore gets and wins his recount in 2000.)

• Matt McIrvin

Also, even if it’s not gettable, I want to frighten them in Texas.

• Josh

Democrats will win other states that put them over 270 EV before they get to Florida. Virginia, Colorado, Ohio–all of these are to the left of Florida. Winning Florida will pad the lead, so to speak, but it won’t provide the winning score.

And while we’re on sports metaphors: I don’t understand why “running up the score” in Florida is a “useful thing to do”; to whom is it useful, and how is it useful?

Finally, with all respect, I am skeptical that Arizona will actually be in play come November. Hillary would probably have to win the popular vote by 12-15% to win a state like Arizona. It’s certainly possible, but not especially likely.

• Matt McIrvin

I’m not sure the ordering of the states in terms of winnability is the same this year as it has been in past cycles. Things could change and there aren’t many polls yet, but I’ve seen some suggesting that Clinton might be stronger in Arizona than in Ohio, which is very close, whereas Virginia and Florida have gotten way bluer.

I’d like to see some newer polls in Iowa and Colorado; the last ones taken there were kind of disturbing but they’re old, taken at a time when Trump was doing better nationally.

• Josh

Virginia and Florida have definitely moved leftward over the last few cycles. I suppose it’s possible that Florida is now within a point or two of the median, but even if that were the case, Colorado and Virginia are still probably to the left of Florida.

Obama won Ohio in 2012 by 3% and lost Arizona by 10%. It would be unprecedented in modern Presidential politics if one state moved thirteen points to the right in less than 4 years.

• Borderpeaks

Colorado like Washington and Oregon is a modern western democracy with all mail in balloting. Obama carried the state twice. I do not see any worry that it is a swing state now unless too many Chicanos have moved to Mexico!

• Andrew

“In particular, there are 5-10M Hispanic citizens who are unregistered or don’t vote, and Trump makes them a pure D goldmine.”

You are making the wild assumption that “saving” and legalizing illegal Mexican immigrants is something of significant interest to Puerto Ricans, Dominicans, and legal Mexicans. These sort of assertions need evidence.

• Hi, Andrew. Courtesy is highly valued here!

• Andrew

“I’m not sure the ordering of the states in terms of winnability is the same this year as it has been in past cycles.”

I don’t think they will be. New Yorker Trump has significantly different areas of popularity and support than Republican candidates from Texas (Bush, Bush) and the Plains/Mountain West (Dole, McCain, Romney).

I believe there is good evidence that Trump is significantly weaker than a past typical Republican in VA, WI, MN, and CO, for example, and stronger than normal in NJ, PA, OH, and MI.

• Andrew

“Hi, Andrew. Courtesy is highly valued here!”

Hi Dr. Wang, I wasn’t aware I was not being courteous. I apologize for any offense from stating my thought.

• Well, “wild assumptions.” And there was something else, I forget what. Anyway, keep in mind the scale I use: I basically think that every other website’s comments section is radioactive.

• Andrew

Josh:

“Virginia and Florida have definitely moved leftward over the last few cycles.”

VA has been drifting leftward since 1988 (13 point Republican lean) to 2012 (pure tossup with no lean).

FL has not drifted at all. Since 1996 it has averaged a 2.5% Republican lean.

“Colorado and Virginia are still probably to the left of Florida.”

CO now has a 1.5% Democrat lean, so it is to the left of VA and FL.

“It would be unprecedented in modern Presidential politics if one state moved thirteen points to the right in less than 4 years.”

West Virginia from 1996 to 2000 did something like that. It went from a 6 point Democrat lean in 1996 to a 7 point Republican lean in 2000. It was one of the most reliably Democratic states from 1960 to 1996 and is now one of the most reliably Republican.

That isn’t the only example. VT from 1976 to 1980 did something similar. So did NJ from 1992 to 1996. and Louisiana from 1996 to 2000.

• What margin of Clinton victory equals a Blue House of Representatives?

• BrianTH

Probably hard to say for sure, since if the presidential margin is looking bad enough, early enough, you will likely see a lot more resources put into congressional races, and more or less explicit calls for ticket splitting. On the other hand, you could maybe see a geographic concentration of support for Trump that would ill-serve the Republicans in some states/districts.

As I recall, the Democrats need to be something like +7 in the House vote to be likely to take back the House. So at a guess, at something like the Democratic President +10, you might be looking at a solid chance, but it could be less.

• Josh

I think I saw this here not too long ago but if House Dems get 53% of the generic House vote, they should flip it. Not sure what the exact correlation is between President and down-ballot choices but it’s probably pretty high. I’d guess that if Hillary gets 55% of the popular vote the House will flip. Anyone care to add their thoughts?

• MikerW

Kudos again for thoughtful analysis over punditry.

That said, I suspect the media, which is driven by business objectives, will produce numerous stories about how Trump can use his unconventional status to win in November. But, that cuts both ways. Helicoptering in for a rally then leaving may work in a fractured primary field. The lesson of the most recent elections cycles is that “technology” and powerful ground games carry the day. So if Hillary has absorbed and advanced what Obama did and Trump really doesn’t have a ground machine and the R’s are somewhat fractured how does he win?

• Some Body

A possible worry (and a factor that might introduce a large systematic error into election-eve polling) are the so-called Bradley and Shy Tory effects. Trump is an explicit misogynist running against a woman. He is also often portrayed in the media as an illegitimate choice for voters to make. Both facts may easily induce Trump voters to, e.g., claim they are undecided when being interviewed by pollsters. An overly complacent attitude from Dems might slightly exacebrate this error.

A significant discrepancy between live and automated poll results may be an indicator of something brewing in this direction. Also, if Senate polls are showing clear GOP advantages relative to Presidential election results, it might mean something.

So, personally, I’d be looking less at HRC’s margin over Trump, and more at whether she’s polling above 50% in enough states.

• Kevin King

I tend to agree that the polls are reflecting my own historical, non-poll view of the situation, based on Mr. Wang’s analysis. There are circumstances that can change the nature of the race, and I agree generally with whirlaway that if we have a recession coming up, which economists don’t seem to think is in the cards (although there is some current weakness around the world which could spill into the United States), or a major terrorist attack against the United States homeland or a significant foreign interest happens, among the things that are foreseeable, that Clinton will probably win.

The other foreseeable moving part here is Sanders. If Sanders ends up submitting and getting his delegates to vote for Clinton by acclimation, that will strengthen the Democratic hand. His unexpected strength will end up damaging party unity and therefore Clinton if something like that does not happen. If the former happens, Trump will probably need both a recession and a terrorist attack to win. If, as seems likely, Sanders battles into the convention, Clinton’s victory will be much more tenuous.

I don’t think that the e-mail matter is going to hurt Clinton, based on those familiar with Federal investigations and the information that has come out so far. Nor do I think that Trump’s or Clinton’s unfavorables matter, either. At least under my hypothesis that presidential elections are almost completely referenda on the in-parties.

In the end, it seems that multiple lines of analysis are pointing in the same direction. If things stay more or less the way they are now, a 5-10% two party vote victory by Clinton seems likely, and that is what the polls are saying, too. However, the fundamental character of the race can change.

• I’m sure the 55,000 in negative, anti-Trump ads have had an adverse effect on the polling thus far and should the GOP decide to embrace Donald, could there be an after-effect that would have a gradual and cumulative improvement over the future probability or outcome?

Thank Ü in advance for the good news!

• Brian

I haven’t examined the data closely enough to confirm, but I have always suspected that favorables should improve for the candidates as you get closer to November. My reasoning is that I would expect peoples’ responses to just reflect how they would respond if asked whether they *support* a given candidate. On the Democratic side, I think you will see Clinton’s favorables improve as Sanders supporters “come around”. I do also expect to see this on the GOP side for Trump; ultimately Republican voters will adopt a “favorable” view of him as they come around and decide to vote for him.

• 538 Refugee

If something unforeseen were to happen to Hillary would you still feel comfortable with this model if “Comrade Bernie” became the nominee? (Kinda nice of Obama to come up with that so Trump won’t have to. ;) ) As has been discussed earlier, he’s been given a relatively free pass in terms of scrutiny by the Republicans.

• Mark F.

I actually think that Trump is enough of a wild card that he has more chance to be elected than a Ted Cruz, but I agree it’s a low chance at this point. Still…

• Jim Crowell

Does it matter when the poll screening turns from RV to LV later this fall?

• I don’t distinguish between the two, except to use LV data in preference to RV data when a pollster makes both available. Evidence suggests it does not matter.

• MSk

Sam,

Why do you use the one-tail tdist function and not the 2-tail? I would have thought the error could be on either side. Can you help me brush up on what must be a silly statistics misunderstanding?

• Only one tail flips the outcome. In the other tail, Hillary Clinton wins by an enormous margin.

• MSk

Thanks Sam. Very clear. One more if you will humor me, why 3 degrees of freedom? Student t degrees of freedom is typically (n1-1)+(n2-1) degrees of freedom but I couldn’t figure out how to arrive at 3.

• mediaglyphic

It appears Clinton underperformed polls in Indiana, if memory serves, she also did so in Michigan.

Is there any danger that she will under-perform the Trump v Clinton polls also? It appears there is a large margin of safety. I wonder if there is a way of determining any persistent Clinton bias.

• Tamara Baker

Michigan and Indiana have open primaries – at least Indiana does for the Democratic side – and indies who can’t be bothered to actually pick a party (or in the case of Michigan at least, NRA members and people looking to pay back the Dems for 2008’s “Democrats for Mitt” campaign) are hijacking those primaries. If you look at the votes of those who are actually registered Democrats, Hillary won those in both states.

By the way: Indiana is going to vote GOP in November, just like South Carolina. Remember how Hillary’s primary wins in Southern states didn’t count according to Sanders supporters because those states won’t be voting Democratic in the general election?

• Tamara Baker

I should also say that states with a high proportion of white voters and which have open primaries seem to be where Bernie does the best.

Now California has open primaries, but it’s also got a large and growing Latino population as well as large numbers of Asian and African-American residents. Ergo, the polls predicting a Hillary blowout are quite likely to be accurate.

• Matt McIrvin

She underperformed primary polls in a few states, and overperformed them in others. I don’t see any systematic effect across all states.

Primaries (and caucuses) are historically much harder to poll than general presidential elections, because turnout is lower in a primary, so differential turnout effects are much more important. And, as others mentioned, some primaries are open and get votes from groups who did not necessarily pass the pollster’s party screen.

I remember that in 2008, there was a lot of speculation that Obama would underperform his polls because of the Bradley Effect: supposedly white people harboring racist feelings would lie to pollsters about supporting a black candidate because they knew those feelings were socially unacceptable, but then they wouldn’t actually vote for the candidate. But no such effect surfaced: some people undoubtedly opposed Obama because of his race, but they didn’t lie about supporting Obama, they just found some other rationale.

People are already speculating on some similar “social desirability bias” that might lead Trump to overperform his polls. I think the jury is out on that: he actually seemed to be mildly underperforming his polls early in the primary race, then started overperforming toward the end.

• Andrew

“Primaries (and caucuses) are historically much harder to poll than general presidential elections, because turnout is lower in a primary, so differential turnout effects are much more important. ”

This year so far, 25.7 million voters voted in the GOP primaries (not caucuses) vs. 43.7 million GOP voters in the 2012 general election for the same states.

The two sets don’t entirely overlap, but a simply enormous number of people are coming out to vote Republican this year, so the primary GOP electorate being polled is far more similar to the general electorate, since it makes up 60% of it. Considering that many GOP primaries exclude independents and prevent crossover voting, this is a fairly remarkable real result.

Its additionally remarkable that Trump will get more primary votes than any Republican in history (probably close to 14 million and yet he may not even obtain a majority of the total votes (which will be close to 30 million).

On the Democrat side, the numbers are lower – 21.6 million voters in primaries where Obama got 45 million votes, so 48% turnout compared to the general.

I’d be interested in Sam’s thoughts on the levels of participation in the two primaries and what they might indicate for the general, especially comparing the results to 2008 and 2000.

Its also interesting that Republicans outvoted Democrats in every swing state except Pennsylvania and Minnesota. By that I mean IA, NH, NV, VA, ME, MI, FL, MO, NC, OH, AZ, WI all had more Republican primary voters than Democrat primary voters. I’m curious if there is any correlation between swing state voting in contest primaries and results in November.

• I don’t think level of primary turnout is predictive of much in the general election. I believe this has been analyzed elsewhere. I doubt that more than one or two of the states listed will end up going into the GOP column in November. At present, every general-election poll that I am aware of in those states currently favors Clinton over Trump.

At the risk of seeming obvious, when people think that their vote makes a difference or that the race is exciting, they vote. The GOP race has been nothing if not exciting.

• David Shannon

I’m engaged in an online dialog about your findings regarding the probability that Clinton will remain ahead. I find them persuasive, but my opponent claims that your use of a t-distribution to analyze 16 average polling errors is invalid because the sample is too small.

To be clear, he sees the sample size as 16 because what is under study are not the polls themselves, but the average polling error (of which there is only one per election in the relevant time period).

Can you help me respond to this?

• If I estimate s.d. using the interquartile range, this certainly captures the midrange of possibilities (it would be hard not to). But assuming a normal distribution using that s.d. then gives a serious underestimate of outlier events. So a normal distribution will not work at all. One needs some way to capture the outside possibilities.

The long-tailed problem comes up all the time in this kind of work. I have used a t-distribution with 3 d.f. before in my state-level Meta-Analysis with satisfactory results. It also fits the 16-election dataset reasonably well, by eye anyway.

Is the counterargument that somehow I am underestimating the probability of some freakish dynamics like a 20-point swing? We’re talking about measurements of public opinion here. Trump may be weird, but the rate at which people change their allegiances in a general election campaign is unlikely to change that much. Assuming otherwise is an exercise in nihilism – just walk away from that argument.

• David Shannon

Is the article entitled “Trump expands the battleground…to Utah and the Deep South” the explanation of the problem with this analysis? (It seems to rely on a different SD than you used here.) Or is the promised update yet to come?