The Party ID Problem

October 5th, 2004, 12:00pm by Sam Wang

As previously mentioned, I have been looking at the party-ID numbers in the Gallup data. I have found evidence that party ID is not fixed over time. The Gallup poll internal numbers contain the fraction of voters who call themselves Republican, Democratic, or independent. The average GOP fraction is 39%, but fluctuates. The fluctuation has been the source of much discussion and is said to be too high. As it turns out, the amount of fluctuation can be predicted from binomial statistics if the fraction of Republicans (for instance) is fixed over time. The expected standard deviation is sqrt(r*(1-r)/ N), where r is the fraction 0.39 and N is the number of people per poll, about 1000. These numbers predict a standard deviation of 1.5%. From Gallup’s data, the actual standard deviation is 2.9%, almost twice this. This suggests that Gallup’s way of measuring party-ID shifts over time. This supports the defense by Gallup that weighting by party ID distorts the result.

However, using unweighted data has its own problem, namely that the sample may be consistently biased in one direction or the other. In 2000, Rasmussen did not weight and predicted a margin that was 9 points more favorable to Bush than the final outcome. This is the accusation currently being made against Gallup.

But the cure may be as bad as the disease, as exemplified by Rasmussen’s new approach. Rasmussen now weights, and now his presidential tracking poll fluctuates very little. Because party ID and preferred candidate (Bush/Kerry) are strongly correlated, this means that his weighting procedure will always work to reduce the margin of the leading candidate. This may explain why his poll is so stable – statistically, too stable to be right. In recent 3-day tracking data (analyzing every third day) the standard deviation of 0.7% (random fluctuation alone predicts an SD of 1.6%).

The real problem with weighting is as follows: The horserace result depends on assumptions on party ID. If these covary with sentiment, then real changes will be filtered out, and it will be very hard to learn from weighted data on who is ahead, a basic fact we want from polls. We can see an example of this today because a recent poll from Zogby shows little change from the previous poll.

Therefore I currently think that both weighting by party ID (Rasmussen now) and not weighting at all (Gallup now, Rasmussen in 2000) have serious problems. A better way to weight would be to use a question or questions with fixed answers, such as “Who did you vote for in the last election, Bush, Gore or Nader?” Time magazine does this, but does not weight. Of course, the unreliability of memory is a problem. Zogby Interactive does a sensible version of this: party ID is asked at a different time than candidate preference, which might de-link the variables. If anyone knows of other organizations that go beyond simple party-ID-weighting, please let me know.

