Home
Search Engine
Six Sigma STORE
WHAT'S NEW
What is Six Sigma?
Project Tutorials DMAIC
DFSS
LEAN Manufacturing
Basic Statistics
KAIZEN
5S
TQM
INVENTORY
Project Management Project Mgmt
Project Pitfalls
Financial Savings
Six Sigma Careers BLACK BELT Training
GREEN BELT Training
Certification
Six Sigma JOBS
Post a Job
Sample Problems
Extras TABLES
Downloads
Icebreakers
Glossary
BLOG
Disclaimer
Contact Us

Subscribe To This Site
XML RSS
Add to Google
Add to My Yahoo!
Add to My MSN
Subscribe with Bloglines

Correlation

Correlation measures the relationship of the inputs (x) on the output (y) of a process. Correlation is the degree or extent of the relationship between two variables.

Correlation studies are used to see if there is a predictive relationship of the input on the process. Predicting within the range of the data is "interpolating" and predicting outside the range is "extrapolating".

The picture below shows that there is very little, if any, correlation of the variables. They are independent variables at least within the range of inputs studied.

No Correlation

As expected, correlation studies and dependencies tend to be stronger with more more data and maximum range being used.

However visualization of the data set can also show that there may exist varying relationships with the range being samples.

For instance, a strong positive relationship may exist from one input to another input but beyond that there is no relationship at all.

An correlation value may be close to zero but closer review will indicate enlightening information. Sometimes too much data can hide relationships within.

The point is to run the correlation visually and mathematically.

"X" is considered the independent variable or predictor variable.

"Y" is the dependent variable or predicted variable.

Correlation studies are normally part of the ANALYZE phase of a DMAIC project.

Linear Correlation

There are several correlation coefficients in use but the most frequently used is is the Pearson Product Moment Correlation, also referred to as the Coefficient of Correlation that measures only a linear relationship between two variables and is denoted by an "r" value.

The "r" value is used to measure the correlation and it will always range from -1.0 (anticorrelation) to +1.0. As the value approaches 0 their is less linear correlation, or dependence) of the variables.

If the value of one variable increases when the value of the other increases, they are said to be positively correlated.

If the value of the output (y) decreases when the value of the input (x) increases, they are said to be negatively correlated.

If the value of the output increases when as the input value increase then they are said to be positively correlated. The degree of linear association between two variables is quantified by the coefficient of correlation.

The Pearson correlation does NOT assume that the data is normally distributed.

It represents a unitless translation of covariance, meaning the closer the value is to +1, the closer the linear relationship is between the x and y random variables.

As the value of "r" approaches zero from either side, the correlation is weaker. That is the input, x, has a lower correlation on the output, y.

This is normally shown by a x-y plot referred to as a Scatter Graph. This graph shows all the data points where the input, x, is varied systematically and the output, or the effect, of y is measured.

A "r" value of +1.0 indicates a perfect and strong POSITIVE correlation.

A "r" value of -1.0 indicates a perfect and strong NEGATIVE correlation or anticorrelation.

A data set that does not have a slope (slope = 0) will have a correlation coefficient that is undefined because the variance of Y is zero. In other words, the output is not affected by any of the input values.

Strong Linear Correlation Examples

Shown below in the video is an example starting with a simple set of data and the progessive steps to manually calculate the LINEAR correlation coefficient, "r".

This is a study between the number of caterpillars in his cabbage patch and the quantity of cabbages destroyed.



Non Linear Correlation

Correlation coefficients exist that are more sensitive to non-linear relationships.

The picture below indicates a strong relationship that would not be evident by simply analyzing the "r" value. However the "r" value is going to be close to zero which means the variables are independent.

This module doesn't get more involved with non-linear mathematical relationships but it is important to understand they exist as the picture below shows.

Strong Non-Linear Correlation




More about Correlation

This "r" value squared represents the Coefficient of Determination used in regression analysis.

What is the difference between the Coefficient of Correlation (COC) and Coefficient of Determination (COD)?

The COD ranges from 0-1 (0%-100%) and is the proportion of variability of the dependent variable (Y) accounted for or explained by the independent variable (x) equal to the coefficient of correlation value squared. In other words, it is the percentage of variation in Y explained by the linear relationship with X.

The COC is a value from -1 to +1 that describes the linear correlation of the dependent and independent variable. A value near zero indicates no linear relationship.

The sign is necessary to see if relationship is positive or negative so solving for COR by taking the square root of COD may not give the correct correlation since the sign can be positive or negative.

CAUTION:

Correlation interpretations from data or graphs can be wrong if it is purely coincidental.

For example, a chart or correlation value may indicate a strong relationship (linear or non-linear) but in reality there may be no relationship or dependency at all. Just like most statistical results they must be reviewed subjectively with consideration of common sense. This is done with the Six Sigma team. The GB/BB is responsible for sharing the results in any way to help the team make the right decisions.

It is possible to have the same "r" value and have several different graphical representations, another reason to review the scatter plot and "r" value together.







Return to BASIC STATISTICS

Return to the ANALYZE Phase

Shop at Six-Sigma-Material for related material, videos, and templates

Return to Six-Sigma-Material Home Page from Correlation