I’ll give you the simple answer. Surveys!
No, I don’t mean looking at whatever survey happens to catch your eye or tickles your (or your favorite network or blog’s) ideological fancy. I mean, using a system that is powered by old fashioned surveys and making very, very good explanations and predictions based off that. There is someone who has been doing exactly that for several years now and it makes sense for anyone interested in surveys to understand how he is doing that. I’m talking, of course, about Nate Silver at fivethirtyeight.com.
Interestingly, Silver does not actually do a single survey himself. Instead what he has done is build a database of surveys (that contains thousands) and used some simple and clear rules to analyze them. Based on these rules and the statistical models he has built, he is able to provide the best, unbiased view of the race. All this from survey data. How does he do it? Let’s take a look at some (and by no means all) of his rules.
Rule 1 - Aggregation
This is the most important of his rules. Anyone in the survey business (and many outside) understands ideas like margin of error (simply put, the notion that data from a survey are imprecise). There are many things a survey company can do to make the numbers more precise, but it is almost impossible to eliminate the fuzziness completely. So when looking at a survey, especially one in a tight race like what we have now, it is important not to get carried away by small differences. But when biased parties are involved it is not hard for them to pick a favorable survey (where the numbers are favorable more by chance than anything else) and turn that into news. Rather than rely on single surveys, Nate Silver aggregates the surveys that are released. As the election nears more and more surveys are launched providing a bigger and bigger sample size for his analysis. So while in any single survey a difference of 2% points may be meaningless, it becomes very meaningful when looking at aggregated results (it’s almost census-like).
Rule 2 – Eliminating House Effects
When so many surveys are released it is inevitable that some will show results that are favorable for one candidate and others for his opponent. But because of the way sampling is done (for example, Registered voters versus Likely voters being selected to participate), some surveys are likely to show a consistent lean in the direction of one candidate. Rassmussen leans toward Romney, for example, while Pew leans towards Obama. Nate Silver looks at surveys released by an organization over time (across multiple election cycles), compares it to actual outcomes and makes a House-Effect adjustment. This is like the Ball-Park adjustment that sabermetricians make in baseball to show that, for example, 30 homeruns in Coors field is worth less in other parks. He also excludes surveys conducted by organizations that don’t release any details, ones with explicit partisan agendas and those put out by the campaigns themselves. All this helps reduce the bias in his numbers.
Rule 3 – Accounting for Technology Impact
As with commercial work, Presidential surveys are done using different methods – phone and web being primary. But a particular problem with phone samples is the number of households that are cell phone only. Some surveys account for this (with a cell phone augment) others don’t and this can lead to differences. So for example, surveys that include cell phone augments tend to favor Obama more (because cell phone only households tend to skew younger). In my university lectures, I routinely quiz students on how many are cell phone only, and the proportion is usually similar to the number that are on Facebook. By keeping track of this information and monitoring which approach does a better job, Nate Silver is able to produce more reliable predictions.
Rule 4 – Contextualize
This is another really important rule. After a Presidential debate there is a huge frenzy for survey information on who won the debate and what its effect is on the race. This year, for the first time Google Consumer Surveys was deploying its unique methodology to adjudicate the debate winner. But what does winning a debate do to a candidate’s chances? Similarly, what does a convention bounce mean to a candidate’s chances? Unlike the myriad TV talking heads who provide their own (gut-based) interpretations and predictions (usually no better than throwing darts), Nate Silver goes to the data. If Obama got a 4% bounce after his convention (in aggregated poll numbers), how does that compare with what previous candidates were able to get? If Romney won the first debate how does that bounce compare with previous wins. Contextualizing the data like this allows us to better understand the dynamics of the race.
Rule 5 - Account for Time
A related issue is accounting for time lag. If a candidate gets a bounce how long does it last? Does he regress back to where he was before the debate/convention? Or does the effect create a new baseline for the candidate thus indicating the real value of that event? These are issues that Nate Silver is able to tease out by looking carefully at when particular surveys are conducted and trending the results over time. For example, he is careful enough to say that a survey that spanned the first Presidential debate may be underestimating Romney’s bounce.
Additionally, he has also taken all that he has learned and produced a statistical model that makes predictions on the presidential race. It includes predictions for each candidate on the number of Electoral College Votes and percentage of the popular vote as well as his probability of winning the election. He also uses state polls to predict the likelihood of each candidate winning a state. In my opinion, he provides the most thorough, unbiased, smart, data-driven dissection of the Presidential race. And it is all powered by good old-fashioned surveys!
If you are interested in his work, check out his book, The Signal and the Noise.
And yes, there are several lessons we survey researchers can learn from him. I’ll leave that as homework for you.