Have you had this problem? You need to measure preference for the features in your product/service but there are just so many of them it seems like an impossible task. Using conventional approaches you would have asked about the importance of each feature on a scale. But we all know how that story goes. With no other constraints, respondents don't have an incentive to say that anything is unimportant. You could use constraint-based methods like constant sum scales, but cannot realistically deal with more than a handful of features at a time. Over the last few years, the most popular technique to address this kind of feature prioritization problem has been Max-Diff (see white paper on Max-Diff). But using Max-Diff when there are more than a dozen attributes becomes a real chore. So what can you do when you have dozens of features that need to be efficiently culled? Let's first start with a look at a standard Max-Diff approach.
If there are ten features to prioritize, the Max-Diff algorithm can be set up such that respondents see grids of 3-5 features at a time, perhaps 8-10 grids in total. In each grid a respondent would indicate the feature that is most important (or some other relevant metric) and the one that is least important. At that point the respondent is done and the analysis of the data is conducted with Hierarchical Bayesian estimation to identify not only the rank ordering of the features but also the distances between them. The really neat outcome is that this information is available for each individual respondent, allowing further cutting and filtering of the results.
But when the number of features to be tested increases, so does the number of grids. Since features have to be shown multiple times in this approach, the task quickly becomes monotonous and seemingly mindless for the respondent. When you have 30 or more features, it is hard not to sympathize with the respondent's plight and at some point the quality of data obtained will come under question. So what can we do about that?
A New Approach
We use a dynamic approach (called Bracket™) that uses a tournament structure to successively eliminate the "losing" features, thus making the task more engaging and cognitively challenging. The first round is similar to a Max-Diff task in that respondents will see a series of grids with a few features in each one and indicate their preference. The losing features (in this case those that are not preferred) fall away and the winners live to compete in the next round and so on, till we narrow the features down to each respondent's final set.
There are several advantages to this approach. Most important is that it allows us to handle large numbers of features (upwards of 50) without running into respondent fatigue as the elimination process whittles down unimportant features very rapidly. What is also significant is that the process is more engaging to respondents as they don't have to keep choosing between features they have no interest in, as in a regular Max-Diff task. Since only the preferred features move through the rounds of the tournament, the task becomes more engaging and difficult (in a good way) encouraging a more cognitive response than usual. Lastly, the most important features for each respondent are dynamically identified in real time allowing follow-up questions to be seamlessly asked in the same survey.
Sounds like a winner, but how do we know that we are actually getting good quality data on the back end? We ran tests to confirm this and here is what we found.
A Bracket™ Example
The subject in this case was how movie-goers make decisions about which movie to see and where to see it. One can imagine many such factors: the stars, the director, the theater location, the show timing, etc. We imagined 18 of them and constructed a study where one cell of respondents was provided a standard Max-Diff task, while another cell was provided a Bracket™ task.
Why choose only 18 features? We could have gone with, say, 30 or 40 features. But to make the methods comparable we needed a number that both methods could handle comfortably. The higher we went, the more the test would have been unfair to Max-Diff, so we chose to go with something in the neighborhood of 20.
To read the rest of this article and have access to the entire TRC white paper library, please click here.