Site improvements (and Bayesian jealousy)
Some improvements from Andrew Ferguson: The update date/time above now links to the history graph, and the Obama/Romney numbers link to the map.&...
Senate: 50 Dem | 50 Rep (range: 47-52)
Control: D+0.3% from toss-up
Generic polling: D+0.8%
Control: D+0.8%
Harris: 274 EV (D+0.3% from toss-up)
Moneyball states: President AK AZ NC
Click any tracker for analytics and data
Detailed explanation to come. Basically, it’s the same as the 2016 calculation – a simple snapshot of polls to give a sharp picture of where the race is on any single day, to allow optimization of resource allocation.
The November prediction (red zone, one-sigma; yellow zone, two-sigma or 95% confidence band) comes from estimating the likely amount of drift between now and Election Day. The major change is putting a higher floor on the minimum level of uncertainty in the home stretch. Increasing the floor (to 2 percentage points) prevents overconfidence in the home stretch, while retaining the sharp time resolution that we get from day to day from now until then.
Most of the details are here. and in an older article. More explanation to come.
Contributors to this feature: Lucas Manning, Ben Deverett. The code is at https://github.com/Princeton-Election-Consortium/data-backend. Outputs: tables and charts.
Thanks so much for your work, Dr. Wang.
> Basically, it’s the same as the 2016 calculation – a simple snapshot of polls to give a sharp picture of where the race is on any single day, to allow optimization of resource allocation.
If I’m reading you correctly, as saying that the only thing that has changed is the November prediction, does that mean that no change has been made to the charts like this one https://election.princeton.edu/election2020/outputs/ev_histogram.png (or others that predict the election outcome) to correct for correlated polling errors between states? Does this part of the model still assume that the error is independent between states?
I think I recall you saying in the aftermath of 2016 that this was one of the things the 538 folks (and maybe others) had been right about, so I was interested in what changes you thought were appropriate here.
By the way, the comment form is currently broken, because the https redirect that was just added is also broken. If you edit the URL for this page to be http instead of https, you’ll see that it redirects to an https version of the page, but without the / between .edu and 2020. Because the form is still set to post to the http address, that breaks commenting.
I’ve gotten around this manually by editing the address that the form is set to post to.
Sorry about that. It should work for everyone now.
A suggestion regarding your “Moneyball” states for the presidential election. In addition to the states for whom the meta-margin is closest to zero, you could also include the states whose meta-margin puts them close to the tipping point of 270 electoral votes. As we saw in 2016 with Michigan, Pennsylvania, and Wisconsin, a strong campaign in the states near the tipping point is important to defending against “October surprises”, systematic polling error, and attempts to bias the outcome by such methods as voter suppression.
Given the authoritarian actions by President Trump that you and others have documented and his concerted effort in 2016 to delegitimize an election that he won, it seems to be critical not only to run up the score in the electoral college in your “moneyball” states if possible, but also to run up margin in the individual states that decide the election so they cannot be contested if the result is not in President Trump’s favor. As a resident of Michigan, I’d hate to see my state (or any other, for that matter) end up in a situation like Florida in 2000.
On a personal note, I’ve been a fan of your site since the 2012 election cycle and as a mathematician very much appreciate your transparent and rigorous methodology.
Keep up the great work!
Thank you for the feedback. Actually, that *is* how we’re supposed to be calculating Moneyball states!
Basically, we move each question (Presidential outcome, Senate control) to a toss-up by shifting all margins uniformly. Then we calculate the impact of moving a small number of votes. The resulting voter power shows where get-out-the-vote yields the most delta-probability per delta-vote.
These states are listed in the banner. One way to tell if it’s working is that the top “Moneyball” states in each category should generally have small margins all in the same direction. For example, right now Nevada, Georgia, and Arizona all show small leads for Biden – and they are the top 3 Moneyball states for the Presidency. In fact, I believe their margins should be not that far from the overall Meta-Margin, because that’s the amount needed to create a toss-up.
The same is true for Senate states. Right now, Democrats lead narrowly in Montana, Maine, and Iowa. Also, these states have fairly small populations, which again makes voters powerful thanks to the malapportionment of the Senate.
Very soon we will unveil a similar calculation for state legislatures. One result isn’t surprising – Texas is a powerful state to put resources. The other, Kansas, is somewhat surprising, but falls out of the math.
I apologize, we need to write all this up in a clear manner. Also, we will add HTML snippets to make it easier to inspect the data.
Thank you for your helpful input.
Thank you for being transparent about the adjustment to your methodology that you made since 2016. That should help.
Is there any evidence that state-level polling is of higher quality now than four years ago? I seem to remember some discussion that that was part of the problem back then.
The state-level polling error in 2016 was about 3 points in key states.
In 2018, House generic polling was within a fraction of a point of the outcome.
In 2019, the Kentucky and Louisiana governor’s race were within a few points of the final outcome, correctly identifying key states for optimim citizen effort, which is the point of what we’re doing here.
Thanks for the explanation. So your “moneyball” states are those that are closest to the tipping point not (necessarily) by the measure of the meta-margin itself, but rather by the measure of how many votes that meta-margin represents?
Would this methodology have correctly identified Michigan, Wisconsin, and Pennsylvania as the key states in the last election, or were the errors in the state-level polling not sufficiently correlated to see that?
Regarding your answer to Marc, I was speaking with a friend (who voted for Trump in 2016 and will again this year — yes, I am a liberal who has no problem respecting Trump voters or discussing politics with them) and he said while he has made up his mind who he is voting for, he wouldn’t tell a pollster that. I doubt that the size of that demographic has changed much — or it may have even increased — since 2016, so I certainly wouldn’t be surprised to see the same kind of state-level polling error crop up again in 2020 now that President Trump is on the ballot again even though it wasn’t apparent in the midterm or 2019 polling.
Since there isn’t really any way to identify and correct for this ahead of time, I think that the best we can do is to be aware of it and acknowledge the possibility. As someone who has demonstrated that you will not only own your mistakes, but analyze them as well (and eat a bug), what do you think the appropriate way to deal with a potential repeat of this data quality problem?
I think many laymen don’t understand the distinction between a data collection failure and a flaw in the underlying methodology and, after something as significant as the 2016 election, they write off polling — and poll aggregation — as unreliable because of their emotional reaction (pro or con) to how 2016 played out. What do you think we can do to educate them on how to consume analysis such as PEC provides to make the best use of it while being prepared for the possible pitfalls?
@David I found a good article at the NYT comparing polling this year with polling in 2016. They listed two reasons to hope that polling will be more accurate now, and two areas for improvement.
Better than in 2016: 1) fewer undecided or minor party voters, and 2) pollsters are weighting their population samples to account for level of education.
Areas for improvement: 1) there’s no consensus on how to do on-line polling well, but there are still plenty of on-line polls, and 2) reliance on recalled 2016 vote to weight samples.
https://www.nytimes.com/2020/05/12/upshot/polls-2020-trump-biden.html?action=click&module=RelatedLinks&pgtype=Article
Also, the NYT will be releasing results from their first poll for 2020 Wednesday (national) and Thursday (swing states). In the mean time, they’ve published their methodology.
https://www.nytimes.com/2020/06/23/upshot/poll-2020-election-method.html?action=click&module=Top%20Stories&pgtype=Homepage
If I remember rightly the error in the calculation for the 2016 election was the assumption that the “polling to election” error would decline from about 3 or 4 percent to zero on election day when instead it probably should be fixed at 3 or 4 percent all the way up to election day.
Has this error been fixed?
Paul writes: I believe the error was not accounting for an appropriate quantity in each state of white voters without college degrees. Weighting had been done previously by gender and race and likely voter-or not, but not by education, and this has been adjusted in most polling since 2016. Note the 2018 polling was almost uniformly within one percent of the outcome. best wishes.
Thank you so much for providing this data!
I would love to see this presented in a “days to election” comparison with 2016. And, with the changes applied to the 2020 data retroactively applied to the 2016 data to make the comparison more of an “apples to apples”.
Sounds great. Probably not going to do that. Horserace is not an editorial emphasis in 2020. Sorry!
Fair enough!
Dr. Wang, thank you for your amazing work!
This is a little out of left field, but supposedly some GOP insiders are saying that Trump might drop out of the race if his polls remain poor. Would Pence run in his place, and if so, how would he do against Biden?
https://www.independent.co.uk/news/world/americas/us-election/trump-2020-us-election-drop-out-fox-news-republican-a9592036.html
As far as I can tell the margin for AR is currently the exact same as the margin of the single poll done for it so far. I would expect that there would be a prior of AR being extremely republican, and therefore a single poll would not be enough to note it as being so close.
Currently not set up to do priors. Could implement it if this develops into a larger problem.
While I LOVE the current results, I wonder about the 5 point move in four months. If you had drawn your confidence bars back on day 1 (looks to be about March 1), would today’s meta-margin of 7.2 been within your 1 sigma or even 2 sigma range for July 1st? The obvious reason for the question is that if it can move up 5 points in four months, what’s to say it can’t move down 5 points in four months – or is that what that little bit of yellow into the Trump wins area is supposed to capture? (and I recognize that a 5 point drop would still leave the meta-margin in the Biden wins area, but within that +/- 3% range that seems to be the limit of polling accuracy)
Why make the 2016-fix only in the home stretch? If we are saying that systematic errors can make us overconfident in the home-strech, then anything earlier is not really a reflection of the true meta-lead. If this is the case, then the ‘sharp time resolution’ is just noise. This move is ad-hoc, no?
Uncertainties at the current moment are dominated by drift that occurs between now and Election Day. So it doesn’t matter for now.
The formula for combining one-sigma uncertainties A and B that are independent of one another is sqrt(A*A+B*B). Currently the long-term drift, A, is around 6 points. Final, irreducible uncertainty is about B=1.5 points. B doesn’t contribute much until October or so. See the MATLAB script.
I’d like to suggest idea that might provide an informative graph:
Starting from today’s data at any point, project the results going forward as a set of lines, one for each historically comparable situation.
This has been done for climate data here, for example:
http://iwantsomeproof.com/extimg/siv_projections_from_current_date.png
An addendum here would be to also include the final actual vote somehow at the end of each polling trajectory.
So, the major error that made this site so eggregiously wrong in 2016 has NOT BEEN FIXED! that’s the assumption that errors for state polls for states with very similar demographics are INDEPENDENT when clearly they are NOT. Very disappointing as it means I need not check back here. This would take just a few lines of code to fix as it really only involves 3-6 States.
Your understanding is not correct. The systematic error assumption has been increased, which is equivalent to your concern. Also, you are missing the main goal of the calculation this year, which is to help identify efficient ways for voters to exercise their power.
Anyway, you are free to point out exactly what is wrong with the code. If you have examined it, that would be better than most.
Try to be nicer though please! Rudeness is usually grounds for moderation here.