You know them, right? The friends you log into Facebook to check out. They always have something interesting to share and you like to see what it is. That process is one way of measuring influence in social networks and a rather good one at that, according to recent research by Michael Trusov, Anand Bodapati and Randy Bucklin. They set out to identify influencers in a social network and did so using some interesting data and analytics. Here’s the story.
Let’s start with why it is important to identify influencers. You don’t need social media to answer that. Marketers want to identify influencers as an easy way to get at a larger population. In an online social network, however, it’s a little different as the firm hosting the network makes money through the number of impressions and clicks generated. So people who can bring more eyeballs online (i.e. influencers) are worth advertising money. The owner of the online social network should nurture said people. But how to identify them?
One could use simple metrics like number of friends and such but there has to be a better analytic way than that, right? The idea the researchers came up with is to use something simple but effective like the number of log-ins per day to get at influence. How does that work you ask?
The logic is as follows. The more a person logs-in the more likely he/she is to either create or view content. So when a person logs in a lot and the behavior (i.e. log-ins) of others linked to that person goes up, then that person is influential. When the behavior of others doesn’t change that person is not influential. Think about how this works in your case. If you keep logging into Facebook to see updates posted by one friend, then that friend is influencing you. The more frequently she updates the more frequently you may log in to check, indicating a higher level of influence. On the other hand, when a friend is posting frequently but you are not logging in to check, then that friend has little influence over your behavior. Of course, the same logic can be used with other correlated measures such as amount of time spent online or number of messages sent.
The logic to identify influencers is sound but the problem is in the modeling as the data are sparse. So, conventional approaches that work with retail panel data (such as assumptions about certain effects being similar across people) don’t work here because each person has a unique set of friends and their activities are central to the definition of influence.
One could think of evaluating every interaction between two people in terms of direction and strength, but that is not only a humongous task but also not very useful as most such interactions never even occur (the sparseness problem mentioned before). So the researchers restrict the analysis to a user’s friends (the first level network) and show that higher level network connections can only work through the first level network (that is, a friend of a friend or a second level network, cannot influence you directly). So for every user and their first level network they search for indications of influence (to and fro) using the log-in data. Thus they are able to identify not just influence but also the strength of influence (strong, moderate, weak).
They use a special type of regression model suited for this kind of data where a user’s log-in activity is modeled (or predicted) by the user’s characteristics (such as past log-ins) as well as friend effects (such as friends’ logging activity). So if a certain friend’s activity influences the user to log in more, the regression coefficient associated with that friend will show up as being statistically significant.
Warning – Wonky Explanation
It must have been a real bear to estimate. Why? When we normally run regression analysis we have more observations than variables (a dataset of 300 respondents and a regression model with 10 variables is not atypical). But in this case they have about 80 observations per person and on average about 90 friends (i.e. variables) per person (and many have hundreds). A similar problem occurs in discrete choice conjoint analysis. There it is solved using the approach known as Hierarchical Bayes estimation where additional information for a respondent is “borrowed” from others under the assumption that certain things are common (for example everyone saw the same attributes). That doesn’t work here because people can have entirely different sets of friends. So the researchers use a new approach where they borrow (or pool) information from across the friends of a given user. If you want a more detailed explanation knock yourself out reading the actual article.
Data and Results
The data set in question is from an unnamed social networking site and has 12 weeks of information in it. There are 330 target users, about 30,000 friends (first level network) and more than 2 million friends of friends (second level network). Full profile and self-reported demographic information is available for first and second level users. Average number of logins per day is 2.48.
So, what did they find?
· Female friends influence the site usage of male users more than any other gender combination
· Longer term users have more influential friends
· Same ethnicity has a higher influence
· Those with dating objectives have fewer friends who are influential
· Friends who are older than the user have less influence
When a single influence number is calculated, as expected, they find that a majority have little influence on others while a few have significant influence. The models ability to identify influencers is best seen from situations where two people with similar numbers of friends have total network impacts that differ by a factor of eight.
Can you use simple metrics like number of friends or number of page views or even a simple analytical model to identify influencers? You could, but you would be quite disappointed if you compared the results to what you get by using this approach. In fact you could conduct a revenue calculation based on the number of pages the average visitor sees in a month, the number of log-ins per day and the going rate of per thousand impressions. This calculation shows that not only does a hypothetical site lose about $100 million dollars in advertising revenue if top influencers leave, but also that the effect would be vastly underestimated using the simpler methods
Can the model be applied in other social networking situations? Of course. The authors say several adaptations and improvements are possible based on need and availability of data. Their fundamental contribution remains the identification of influencers in social networking sites through an innovative “borrowing” method.
Now let’s be clear about what this research doesn’t do. When they talk about influencing, it is only from the perspective of getting people to log-in more. There is no discussion of change in other kinds of online or offline behavior. We don’t know if these influencers are getting people to listen to their opinions on products or if they change buying behaviors outside the network. We await future research for those objectives.
You can read about this research in full in the August 2010 issue of the Journal of Marketing Research. The authors are from two different universities. Michael Trusov is an Assistant Professor of Marketing at The University of Maryland, while Anand Bodapati (Associate Professor) and Randy Bucklin (Peter Mullin Chaired Professor) are both from UCLA.
What do you think? Are there other ways to get at influence?
Use the comments section below and let me know your thoughts.