Home - White Paper Library - Methodologies

Our White Paper Library

Check below to find lots of helpful information on a variety of research topics.  If you have any questions, or are looking for something and can't find it here, drop Rajan (our Chief Research Officer) a line.
 

Validating Satiscan™ Using a Split Sample Approach

By Rajan Sambandam

A common question asked of any analytical method is whether the results produced can be validated. What this means is whether the process by which the results are arrived at is consistent enough that the same results will be produced in similar circumstances. The usual method for testing the validity of a method is to split a given sample into two random parts and run the same analysis on both parts, to see if similar results can be obtained. If the results are similar in both parts of the data, then the method is considered to be valid.

This "split sample" approach is taken here to demonstrate the validity of Satiscan™. Call center data from the energy industry was used for this purpose.

Energy Example

The dataset used here had a total of 1162 observations on 18 variables. Two random halves were created with 576 and 586 observations. Frequencies and correlation coefficients were compared to ensure that the two halves were indeed split randomly. Next, two stages of analyses were run. First, Satiscan™ analyses were run on both halves of the data. Second, stepwise regression analyses were run on both halves of the data. The regression analyses provide a benchmark for comparison, as the degree of similarity between the two regression models can be used for comparing the degree of similarity of the two Satiscan™ models. The Satiscan™ models are given next in Figures 1 and 2 followed by the total effects tables in Table 1. Total effects tables are necessary when using Satiscan™ models since some variables can have both direct and indirect impact on the dependent variable. Total effects are calculated as the sum of direct and indirect effects.

As can be seen from Figures 1 and 2, the two Satiscan™ models are quite similar but are not exact copies of each other. The first model has more direct drivers of the dependent variable (six) than the second model (three). Further the first model also has two more key drivers (15 to 13) than the second model.

When we examine the total effects table, the most striking result is the first two key drivers. They have almost exactly the same weights in both models. The top six variables in both models are the same, although there are some differences in the ordering of the variables.

Thus in comparing the two models we could say that even though there are some differences, the basic results from both models are quite similar. Most of the dissimilar results appear in the secondary key drivers.

Next, lets take a look at the regression models presented in Table 2. In comparing the regression results with those from Satiscan™ , the most obvious difference is in the richness of the models. Because Satiscan™ is able to identify relationships between independent variables, its models have much more detail than the regression models.

Further as it was in the case of the Satiscan™ models, the first regression model has more key drivers (8) than the second model (5). Again, the variable with the most impact is the same in both cases and has almost identical impact. There is however, less commonality between the two sets of key drivers as compared to the Satiscan™ key drivers. This could possibly be because of some amount of collinearity in the data, since collinearity tends to affect regression and not Satiscan™

To read the rest of this article and have access to the entire TRC white paper library, please click here.

 
Joomla Templates by Joomlashack