# Princeton Election Consortium

## Accounting for poll biases

#### November 3rd, 2008, 11:57pm by Sam Wang

Here’s your tool for handling many biases: last-minute swings, the Bradley effect, the cell phone effect – and others as well.

The general idea here is to add a fixed percentage to all polls across the board, then redo the Meta-Analysis. The result looks like this:

It works like this. You add up all the biases you think may be present in polling data. Then you look that point up on the horizontal axis and read off the results. For example, if you think there’s a Bradley effect that hurts Obama by 2%, but you also think that landline surveys understate Obama’s support by 1%, then the total bias is -2+1 = Obama -1% (i.e. McCain +1%). In that case the Median EV Estimator is Obama 338 EV, McCain 200 EV.

Here’s a table of key values:

 Net bias Median EV Estimator (95% CI on Obama EV) McCain +5% Obama 278 EV, McCain 260 EV (251,313) McCain +4% Obama 296 EV, McCain 242 EV (268,338) McCain +3% Obama 311 EV, McCain 227 EV (277,341) McCain +2% Obama 324 EV, McCain 214 EV (290,353) McCain +1% Obama 338 EV, McCain 200 EV (305,367) No bias Obama 352 EV, McCain 186 EV (313,378) Obama +1% Obama 364 EV, McCain 174 EV (335,388) Obama +2% Obama 378 EV, McCain 160 EV (350,396) Obama +3% Obama 381 EV, McCain 157 EV (362,406) Obama +4% Obama 391 EV, McCain 147 EV (372,406) Obama +5% Obama 396 EV, McCain 142 EV (377,409)

At 0% the median and mode happen to be the same. At the moment, my personal estimate of the bias is Obama +1% because of the cell phone effect. It only shifts the median EV estimate by 12 EV, to Obama 364, McCain 174. But because of all the ties (IN, MO, ND), this 1% shift moves the mode all the way to Obama 378, McCain 160.

Now I want to call your attention to a data resource, which answers a number of your questions.

Some of you have asked for results that are already posted. Many key files are all linked from the Geek’s Directory. The most useful files are readable by Excel or any text editor. Here is how to read them:

stateprobs.csv: Each line corresponds to one state, and contains 5 items from left to right:
– % win probability based on polls alone
– the median margin in %
– % win probability if margins move toward Obama by 2%
– % win probability if margins move toward McCain by 2%
– State postal abbreviation

EV_estimates and EV_estimate_history – Today’s results and historical results. In both files, each line contains:
– Date code (1 is 1-Jan…365 is 31-Dec)
– Median EV Estimators for Obama and McCain
– Mode EV for Obama and McCain
– Safe Obama and McCain EV (safe means probability>95%)
– Toss-up EV
– 68% confidence band for Obama EV
– 95% confidence band for Obama EV
– Number of polls used in Meta-Analysis
– Popular Meta-Margin (%)

polls.median.txt – the summarized poll averages, day by day, starting from the most recent day. It’s composed of sets of 51 lines. Each line corresponds to one state, and contains, from left to right:
– The number of polls used to calculate that state
– Median date of oldest poll used
– Median margin in %
– Estimated SEM of margin in %
– Date that the median was calculated

This last file does not contain postal abbreviations. Within each set of 51 lines, the order is (10 per row):
AL,AK,AZ,AR,CA,CO,CT,DC,DE,FL,
GA,HI,ID,IL,IN,IA,KS,KY,LA,ME,
MD,MA,MI,MN,MS,MO,MT,NE,NV,NH,
NJ,NM,NY,NC,ND,OH,OK,OR,PA,RI,
SC,SD,TN,TX,UT,VT,VA,WA,WV,WI,
WY

Okay, that’s the midnight information dump…

