Our White Paper LibraryCheck below to find lots of helpful information on a variety of research topics. If you have any questions, or are looking for something and can't find it here, drop Rajan (our Chief Research Officer) a line. |
Qualitative Data in Surveys:Lessons from The Black SwanBy Rajan SambandamThis essay is about qualitative data obtained from quantitative studies and how to analyze them. The thesis here is that the framework used to analyze such data is different from that used for directly obtained qualitative data from methods such as IDIs and focus groups. Understanding the difference between quantitative and qualitative frameworks for data analysis (and in particular, the difference between statistical and managerial outliers) can help in deriving more value when the qualitative data are collected in a regular survey. But first, let’s take a detour through a recently published popular book that provides an analogy that helps in understanding our problem and the potential solution. A Brief Tour of The Black Swan In the informative (and entertaining) book The Black Swan, Nassim Nicholas Taleb argues that real data are either distributed normally (from “mediocristan”) or not (from “extremistan”). The former are characterized by data that follow the traditional normal distribution (or bell curve). The majority of the distribution is near the middle surrounding the average and as we venture further out the number of observations becomes increasingly scarce. It is a distribution that defines many phenomena in the natural world. In fact, basic statistics shows that with a reasonable number of observations most distributions start approximating the normal. Taleb says that the second (extreme) distribution is quite unlike the normal even though it can appear to be similar at first sight. Consider two examples: weight and assets of people. At first they appear to be similar and capable of being represented by normal distributions. But in reality weight is from a normal distribution and assets are from an extreme distribution. When we look at the weights of a few hundred people, most will be near the middle and a few will be outliers. But biology prevents the outliers from straying too far. Removing the heaviest person (a statistical outlier) does not change the mean of the distribution in any meaningful way. In other words, a normal distribution is well suited to represent this data. Now consider the other variable, assets. It would appear at first to be well represented by a normal distribution. But what if one of the people included is Bill Gates or Warren Buffett? Removing that one outlier can have a significant impact on the mean of the distribution, although from a managerial perspective you don’t want to remove that outlier because of its obvious importance. Given that, is the mean a proper way of summarizing the distribution? No, and in fact, in practice we use the median to get around this problem. But that solution will only go so far. Taleb suggests that when people mistake the extreme for normal, they use normal assumptions when inappropriate and therefore fail to consider the likelihood and impact of extreme events. For him, variables like assets should not be modeled as normal. The extreme distribution is very susceptible to outliers and can be simply explained by the 80/20 rule. For example, 80% of a company’s revenue comes from 20% of the customers. Then within the 20%, there is another 80/20 split and so on to the point where one customer could have a huge influence on the entire distribution. In the case of assets of a certain group of people, that person could be Bill Gates. Taleb’s point is that current models of finance have mistaken the extreme for the normal distribution and have hence severely underestimated the built-in risk. Scenarios that were seen as several standard deviations away from the mean and hence seen as extremely unlikely (“black swan”) have happened and caused havoc in the economy. If it were seen as an extreme distribution (as he did before the current crisis) then predictions of doom would have been loud and clear. Back to Survey Data What does all this have to do with quantitative and qualitative (open ended) survey data? Scaled quantitative survey data form distributions that are at least approximately normal. On a scale of 1-10 there are really no outliers in the conventional sense. Scores of 1 (or especially 10) are not particularly remarkable or unpredictable. More importantly, with reasonable sample sizes the mean of the distribution provides a very good approximation of the meaning of the distribution. Generally speaking, a company with a mean satisfaction score of 6 is going to have more dissatisfied customers than one with a mean satisfaction score of 8. Not considering the extremes of the distribution is generally not a problem as single observations there are unlikely to change anything. That is, statistical outliers (those that are not of use to a manager) can be ignored without consequence. To read the rest of this article and have access to the entire TRC white paper library, please click here. |

