Where to next? Election polling and predictions
1/3/17 / Beth Mulligan
The accuracy of election polling is still being heavily discussed, and one point that is worth some pondering was made by Allan Lichtman in an NPR interview the day after the election. What he said was this:
“Polls are not predictions.”
To some extent this is a semantic argument about how you define prediction, but his point, as I see it, is that polls are not a model defining what factors will drive people to choose one party, or candidate, over another. Essentially, polls are not theory-driven – they are not a model of “why,” and they do not specify, a priori, what factors will matter. So, polling estimates rise and fall with every news story and sound bite, but a prediction model would have to say something up front like “we think this type of news will affect behavior in the voting booth in this way.” Lichtman’s model, for example, identifies 13 variables that he predicts will affect whether the party in power continues to hold the White House, including whether there were significant policy changes in the current term, whether there was a big foreign policy triumph, whether the President’s party lost seats during the preceding mid-term election, and so on.
Polls, in contrast, are something like a meta prediction model. Kate made this point as we were discussing the failure of election polls: polls are essentially a sample of people trying to tell pollsters what they predict they will do on election day, and people are surprisingly bad at predicting their own behavior. In other words, each unit (i.e., survey respondent) has its own, likely flawed, prediction model, and survey respondents are feeding the results of those models up to an aggregator (i.e., the poll). In this sense, a poll as prediction, is sort of like relying on the “wisdom of the crowd” – but if you’ve ever seen what happens when someone uses the “ask the audience” lifeline on Who Wants to Be a Millionaire, you know that is not a foolproof strategy.
Whether a model or a poll is better in any given situation will depend on various things. A model requires deep expertise in the topic area, and depending on knowledge and available data sources, it will only capture some portion of the variance in the predicted variable. A model that fails to include an important predictor will not do a great job of predicting. Polls are a complex effort to ask the right people the right questions to be able to make an accurate estimate of knowledge, beliefs, attitudes, or behaviors. Polls have a variety of sources of error, including sampling error, nonresponse bias, measurement error, and so on, and each of those sources contribute to the accuracy of estimates coming out of the poll.
The election polling outcomes are a reminder of the importance of hearing from a representative sample of the population, and of designing questions with an understanding of psychology. For example, it is important to understand what people can or can’t tell you in response to a direct question (e.g., when are people unlikely to have conscious access to their attitudes and motivations; when are knowledge or memory likely to be insufficient), and what people will or won’t tell you in response to a direct question (e.g., when is social desirability likely to affect whether people will tell you the truth).
This election year may have been unusual in the number of psychological factors at play in reporting voting intentions. There was a lot of reluctant support on both sides, which suggests conflicts between voters’ values and their candidate’s values, and for some, likely conflicts between conscious and unconscious leanings. Going forward, one interesting route would be for pollsters to capture various psychological factors that might affect accuracy of reporting and incorporate those into their models of election outcomes.
Hopefully in the future we’ll also see more reporting on prediction models in addition to polls. Already there’s been a rash of data mining in an attempt to explain this year’s election results. Some of those results might provide interesting ideas for prediction models of the future. (I feel obliged to note: data mining is not prediction. Bloomberg View explains.)
Elections are great for all of us in the research field because they provide feedback on accuracy that can help us improve our theories and methods in all types of surveys and models. (We don’t do much traditional election polling at Corona, but a poll is essentially just a mini-survey – and a lot of the election “polls” are, in fact, surveys. Confused? We’ll try to come back to this in a future blog.) We optimistically look forward to seeing how the industry adapts.