Spearman's Rho Correlation Coefficient is also known as Spearman's Rank Coefficient and is named after Charles Spearman (1863-1945), represented by the Greek letter rho.
Similar to Pearson's Correlation Coefficient (r) it is a measure of statistical dependence of two variables in matched pairs. It is a non-parametric test that will also have a value range from -1 to +1 and zero indicating no association.
Xi = the rank of the x value in the data set
Yi = the rank of the y value in the data set
See the example lower in the page to see the the x and y data ranked. It is the ranked values that are used as well as their mean (not the mean of the actual data but the ranking position).
A value of +1 indicates perfectly positive monotonic correlation. All data points with greater x values than that of a given data point will have greater y values.
A value of 0 indicates no correlation
A value of -1 indicates perfectly negative monotonic correlation. All data points with lower x values than that of a given data point will have lower y values.
In other words, a value of +1 indicates that both X and Y increase (or non-decreasing) in a data set throughout the inferred space and a value of -1 is the opposite.
MONOTONIC indicates the relationship has both variables increasing, or non-decreasing, OR has both variables decreasing, or non-increasing, throughout the inferred space.
Y = the dependent variable
X = the independent variable
The Spearman's Rho test works with continuous or discrete data of at least the ordinal level so that also includes interval or ratio types but EXCLUDES nominal data. See Data Classification for more information on the types of data.
How is this different from Pearson's Correlation Coefficient?
This Spearman's Rho test can determine association of non-linear relationships but it has its limitations too. Recall the Pearson's Correlation Coefficient only measure linear correlation.
The Spearman's Rho Correlation Coefficient can provide a perfect correlation value of +1 or -1 when X and Y are have a monotonic relationship where as the Pearson's Correlation Coefficient only gives value of +1 or -1 when there is perfect linear relationship. This is one of the reasons the Spearman's Rho Correlation Coefficient is called non-parametric.
While Spearman's Rho Correlation Coefficient can be more effective in finding a non-linear relationship that Pearson's, it doesn't find relationships that may exist in more complex associations such as parabolic, hyperbolic, or similar to the picture below.
Each calculation should be understood by a GB/BB to be able to explain to the rest of the team exactly what data is saying in a practical sense. Working through a couple calculations from start to finish will help understand their differences.
Before running this calculation or for Pearson's Correlation Coefficient (r) look for any outliers that are special cause and may be eliminated. However, this may not be possible. Obviously if they are explainable and removable, it will strengthen the value in terms of making it more accurate. The Spearman's Rho Correlation Coefficient calculation is not affected as much by outliers as Pearson's Correlation Coefficient.
In most cases, the Spearman's Rho Correlation Coefficient is about the same value as, or greater than, Pearson's Correlation Coefficient. If you are interested in strictly learning about the LINEAR relationship, use the Pearson's Correlation Coefficient.
Our suggestion is to run both calculations since it is quickly done using statistical software or a calculator. The more information a GB/BB can get with little relative cost is always preferred.
A Black Belt collected the data shaded in blue below. From there, the additional columns were calculated in Excel and filled in to the right. Notice the ranking of the data and those values are used in this final calculation.
This is a perfectly positive monotonic relationship. Notice that for each value of x that increases, the value of y increases. The graph is shown below.
Creating a template similar to that above can take time but allows a number of scenarios to be analyzed.
If the 2nd data point y-value was 0.69 instead of 0.26, then its rank in order of the y-values becomes 10 and the others after it change too. See below and look at the difference in the y rankings, the line chart, and the Spearman's Rho Correlation Coefficient. The yellow shaded cells changed from the previous example (and of course the subsequent calculations that use those values).
Six Sigma Modules
The following presentations are available to download.
Green Belt Program (1,000+ Slides)
Cause & Effect Matrix
Central Limit Theorem
1-Way Anova Test
Correlation and Regression