There is always an element of error associated with statisical tools and the same applies to the assumption of normality. It is virtually impossibe to collect data from an exact normal distribution. However, many naturally occurring phenomena follow a very close approximate normal distribution.
The normality assumption is an important topic in statistics, since the vast majority of statistical tools were built theoretically upon this assumption. For example, the 1-sample and 2-sample t-tests and Z-tests, along with the corresponding confidence intervals, assume that the data were sampled from populations having normal distributions. Most linear modeling procedures, such as Regression and ANOVA, also assume that the residuals (errors) from the model are normally distributed. In addition, the most widely used control charts and process capability statistics are based upon theoretical assumptions about the normality of the process data.
Since this assumption is often a prominent part of using many statistical tools, it is often suggested that tests be run to check on the validity of this assumption. When doing so, keep in mind the following points:
1. Relative importance of the normality assumption.
Most statistical tools that assume normality have additional assumptions. In the majority of cases the other assumptions are more important to the validity of the tool than the normality assumption. For example, most statistical tools are very robust to departures from normality, but it is critical that the data are collected independently.
2. The type of non-normality.
A normal distribution is symmetric, with a certain percentage of the data within 1 standard deviation, within 2 standard deviations, within 3 standard deviations, and so on. Departures from normality can mean that the data are not symmetric. However, this is not always the case. The data may be symmetric, but the percentages of data within certain bounds do not match those for a normal distribution. Both of these will fail a test for normality, but the second is much less serious than the first.
3. Data transformations.
In many cases where the data do not fit a normal distribution, transformations exist that will make the data “more normal”. The log transformation (or the Box-Cox power transformation) is very effective for skewed data. The arcsin transformation can be used for binomial proportions.
In all cases, non-normality affects the probability of making a wrong decision, whether it be rejecting the null hypothesis when it is true (Type I error) or accepting the null hypothesis when it is false (Type II error). The p-value (probability of making a Type I error) associated with most statistical tools is underestimated when the assumption of normality is violated. In other words, the true p-value is somewhat larger than the reported p-value.
There are many non-parametric alternatives to standard statistical tests. However, it should be noted these also assume that the underlying distributions are symmetric. Most standard statistical tools that assume normality will also work fine if the data are symmetric. Also, consider that the power of the non-parametric tests only approaches the power of the standard parametric tests when the sample size is very large. So, given the choice between the two, if the data are fairly symmetric, the standard parametric tests are better choices than the non-parametric alternatives.
For large samples (n >= 25), the effects of non-normality on the probabilities of making errors are minimized, due to the Central Limit Theorem. Sample size also affects the procedures used to test for normality, which can be very erratic for small samples. Recall, that normality is assumed for the population, not the sample. When data is sampled from a population, it is often collected in small amounts of measurements relative to the entire population.
In this phase, as for control charts, the effects of non-normality show up in the probabilities associated with falsely failing the tests for special causes. These effects are minimal for Xbar charts, even for very small (n < 5) subgroup sizes, provided the data are not extremely skewed. Even in cases of extremely skewed data, the effects are minimal if the subgroup size is 5 or more.
Process capability indices themselves do not assume normality – they are simply ratios of distances. However, if one makes inferences about the defect rate based upon these indices, these inferences can be in error if the data are extremely skewed. One fortunate aspect of process capability studies is that it is standard practice to base them upon at least 100 observations, therefore lessening the impact of the normality assumption.
Minitab’s capability study shows a histogram of the sample data, along with capability indices, which are used to estimate process performance. The histogram has a normal curve superimposed over it to make it easier to assess the normality of the data. Minitab also has a nice facility for finding the optimum power transformation for use with data in subgroups. For cases with heavily skewed data, Minitab will also calculate process capability assuming a Weibull distribution instead of a normal distribution.
In this phase, the Six Sigma Project Manager must isolate variables which exert leverage on the CTQ. These leverage variables are uncovered through the use of various statistical tools designed to detect differences in means, differences in variances, patterns in means, or patterns in variances, in the case where the CTQ is continuous.
Most of these tools assume normality. For many, there are non-parametric alternatives. As mentioned earlier, one should use good sense and judgement when deciding which test is more appropriate. It is a good idea to test for normality. If the test fails, check the symmetry of the data. With a large (n >= 25) sample, the parametric tools are better choices than the non-parametric alternatives, unless the data are excessively skewed.
For example, the 1-sample and 2-sample t-test (and Z-test) also assume that the samples are independent, that the measurement scale is continuous, and that the variances (for the 2-sample case) are equal. Of this set of assumptions, the least important is the assumption of normality, since these tests are very robust to the assumption of normality, provided the samples are large and that the underlying distribution(s) is (are) fairly symmetric. It is interesting to note that the non-parametric alternatives to the t and Z tests also assume that the underlying distributions are symmetric.
Similarly, linear modeling procedures such as ANOVA and Regression also assume that the residuals (errors) are independent, identically distributed, with a continuous measurement scale. Of these assumptions, the least important is again the assumption of normality. Both ANOVA and Regression are very robust to this assumption, provided the underlying distribution of the residuals is fairly symmetric.
When using these tools, don’t forget to try data transformations, especially when the data are skewed. Keep in mind that the p-values you see may be underestimated if the data are not from a normal distribution. The effect on the p-values is minimized for large samples. If a standard parametric test is used and the reported p-value is marginally significant, then the actual p-value may be marginally insignificant. When using any statistical tool, one should always consider the practical significance of the result as well as the statistical significance of the result before passing final judgement.
In the Improve Phase, the Six Sigma Project Manager will often use designed experiments to make dramatic improvements in the performance of the CTQ. A designed experiment is a procedure for simultaneously altering all of the leverage variables discovered in the Analyze Phase and observing what effects these changes have on the CTQ. The Six Sigma Project Manager must determine exactly which leverage variables are critical to improving the performance of the CTQ, and establish settings for those critical variables.
In order to determine whether an effect from a leverage variable, or an interaction between 2 or more leverage variables, is statistically significant, the Six Sigma Project Manager will often utilize an ANOVA table, or a Pareto chart of effects, or a normal plot of effects. The ANOVA table displays p-values for assessing the statistical significance of model components. The Pareto chart of effects has a cutoff line to show which effects are statistically significant. The normal plot of effects is another tool for judging which effects are statistically significant.
All of these methods assume that the residuals after fitting the model are from a normal distribution. Keep in mind that these methods are pretty robust to non-normal data, but it would still be wise to check a histogram of the residuals to be sure there are no extreme departures from normality, or more importantly, are not excessively skewed. If the data are heavily skewed, use a transformation. Also, bear in mind that the p-values in the ANOVA table may be slightly underestimated, and that the cutoff line in the Pareto chart is slightly higher than it should be. In other words, some effects that are observed to be marginally statistically significant could actually be marginally insignificant. In any case where an effect is marginally significant, or marginally insignificant, from a statistical point of view, one should always ask whether it is of practical significance before passing final judgement.
In this phase the Six Sigma Project Manager generates a control plan for each of the critical leverage variables from the Improve Phase. These are the variables which truly drive the relationship Y = f(X1,X2,…,Xn). In order to maintain the gains in performance for the CTQ, or Y, the X variables must be controlled. Part of the control plan is a sampling scheme for each X, coupled with a control chart to detect whether each X is staying on target.
Control charts for continuous data assume the data are from a normal distribution, although control charts have been shown to be very robust to the assumption of normality, in particular the Xbar chart. A simulation study found in Wheeler shows that even for subgroups of size 3, the Xbar chart is robust to non-normality except for excessively skewed data. For subgroups of size 5, the Xbar chart is robust even if the underlying data are excessively skewed.
If the I chart (subgroup size = 1) is used, the effects of non-normality will be seen in an elevated rate of false alarms from the tests for special causes. These tests are designed to have low probabilities of false alarms, based upon a normal distribution. Depending upon the type of departure from normality, certain tests will exhibit higher false alarm rates. As with the tools mentioned earlier, the data may be transformed, which is particularly effective for skewed data.
There are a number of methods to assess normality in Minitab and most other statistical software and often, one tool itself, does not paint the entire picture. Use a few of them to review the numbers and visual representation of the data such as through a histogram and probability plot.
The screenshots below may vary depending on the version of Minitab but generally the path will remain similar.
When using a statistical tool that assumes normality, test this assumption.
Use a normal probability plot, a statistical test, and a histogram. For large sample sizes, the data would have to be extremely skewed before there is cause for concern. For skewed data, try a data transformation technique.
With small samples, the risk of making the incorrect conclusion are increased with any tool, including the probability plot, the test for normality and the histogram. Also, non-parametric alternatives have little discriminatory power with small samples. If using a statistical tool that assumes normality, and the test fails, remember that the p-value you see will be smaller than it actually should be. This is only cause for concern when the p-value is marginally significant. For risk aversion, run the non-parametric alternative (if there is one) and see if the results agree. As with any statistical output, consider the practical significance and ensure it seems realistic.
Six Sigma Modules
The following presentations are available to download.
Green Belt Program (1,000+ Slides)
Cause & Effect Matrix
Central Limit Theorem
1-Way Anova Test
Correlation and Regression