Princeton Election Consortium

A first draft of electoral history. Since 2004

For Fellow Geeks 2014

If you want to delve into the Meta-Analysis further, many files you will need are here. It’s best if you know a little about MATLAB and Python programming. The scripts and data files mentioned here can be found in the Geek’s Directory. All of the files in that directory are linked to the live versions currently running the calculations.

We start with data from Pollster.com, where you can find comprehensive polling information on present and past US political races, as well as running commentary. Good information with partisan commentary of several flavors can be found at electoral-vote.com, FiveThirtyEight, and RealClearPolitics.com. Pollster.com has kindly provided us with an API feed which contains all of the polling information publicly accessible via their website. Every day at midnight, 8:00am, noon, 5:00pm, and 8:00pm, the Unix script nightly.sh runs the process described here.

First, a Python script update_polls.py looks at races.csv to see which races we’re tracking. Then for each state, it calculates the median and SEM margins for the Democratic (or Independent) vs. Republican candidate. The rule for deciding which polls to include for each state is as follows: First, in an effort to decrease bias, we discard all but the most recent poll from each polling organization. We then define a lookback window of N weeks, and decrease N as the election draws nearer. If there are at least three polls taken within this window, then we use all of them. If not, then we use the three most recent polls available for the state. All polls are included, including partisan polls. The results are added to the top of a running data file, 2014.Senate.polls.median.txt (each line’s fields: number of polls, median date of oldest poll used, median, SEM, datenum, state index [see state_numbers.txt]; 18 lines are added per day). After the state-by-state summary statistics are written, the MATLAB code is invoked via Senate_runner.m.

Median/SEM are converted to probabilities assuming a normal distribution and exported to the Excel-readable stateprobs.csv (fields: current probability, November probability, median margin, probability assuming +2% for the Democratic candidate, probability assuming +2% for the Republican candidate, two-letter state abbreviation).

The core Meta-Analysis happens in Senate_estimator.m, which in turn calls the kernelSenate_median.m. This last file is the core of the entire calculation – dig it. These scripts run several times a day to generate MATLAB data structures and export output to the .csv files in the directory /code/output/. In Senate_estimates.csv the fields for the estimate files are: median Senate seats for Democrats and Republicans; mode; safe seats; toss-up seats; 1-sigma confidence interval for the Democrats; 95 percent confidence interval; number of polls used; and the Senate Popular Meta-Margin.

After the meta-analysis is completed, the controlling Unix script nightly.sh finishes by preparing the results for the website. current_senate.py generates the banner at the top of the site with the current conditions and Election Day prediction, and jerseyvotes.py makes the table under The Power of Your Vote. The files that these Python scripts generate are automatically included in the WordPress theme.

We’re also tracking polling responses for the generic Congressional ballot question and Obama’s approval. nightly.sh manages this too, first getting the polling data from the Huffington Post (here and here, respectively), and then running convert_huffpost_csv.py to parse the data files. The graphics are made by Obama_House_runner.m.