
Normal Distribution 

The normal distribution is generally credited to PierreSimon de LaPlace. Karl Gauss is generally given credit for recognition of the normal curve of errors. This curve is also referred to as the Gaussian Distribution.
Manufacturing processes and natural occurrences frequently create this type of distribution, a unimodal bell curve. The distribution is spread symmetrically around the central location. This occurs when occurrences can occur equally above and below an average.
A normal distribution exhibits the following:
68.3% of the population is contained within 1 standard deviation from the mean.
95.4% of the population is contained within 2 standard deviations from the mean.
99.7% of the population is contained within 3 standard deviations from the mean.
These three figures should be committed to memory if you are a Six Sigma GB/BB.
These three figures are often referred to as the Empirical Rule or the 689599.5 Rule as approximate representations population data within 1,2, and 3 standard deviations from the mean of a normal distribution.
Over time, upon making numerous calculations of the cumulative density function and zscores, with these three approximations in mind, you will be able to quickly estimate populations and percentages of area that should be under a curve.
Most Six Sigma projects will involve analyzing normal sets of data or assuming normality. Many natural occurring events and processes with "common cause" variation exhibit a normal distribution (when it does not this is another way to help identify "special cause").
This distribution is frequently used to estimate the proportion of the process that will perform within specification limits or a specification limit (NOT control limits  call that specification limits and control limits are different).
However, when the data does not meet the assumptions of normality the data will require a transformation to provide an accurate capability analysis. We will discuss that later.
The mean is used to define the central location in a normal data set and the median, mode, and mean are near equal. The area under the curve equals all of the observations or measurements.
Throughout this site the following assumptions apply unless otherwise specified:
PValue < alpha risk set at 0.05 indicates a nonnormal distribution although normality assumptions may apply. The level of confidence assumed throughout is 95%.
PValue > alpha risk set at 0.05 indicates a normal distribution.
The zstatistic can be derived from any variable point of interest (X) with the mean and standard deviation. The zstatistic can be referenced to a table that will estimate a proportion of the population that applies to the point of interest.
Recall, one of two important implications of the Central Limit Theorem is, regardless distribution type (unimodal, bimodal, skewed, symmetric), the distribution of the sample means will take the shape of a normal distribution as the sample size increases. The greater the sample size the more normality can be assumed.
Some tables and software programs compute the zstatistic differently but will all get the correct results if interpreted correctly.
Some tables incorporate singletail probability and another table may incorporate doubletail probability. Examine each table carefully to make the correct conclusion.
The bell curve theoretically spreads from negative infinity to positive infinity and approaches the xaxis without ever touching it, in other words it is asymptotic to the xaxis.
The area under the curve represents the probabilities and the whole area is estimated to be equal to 1.0 or 100%.
The normal distribution is described by the mean and the standard deviation. The formula for the normal distribution density function is shown below (e = 2.71828):
Due to the time consuming calculations using integral calculus to come up with the area under the normal curve from the formula above most of the time it is easier to reference tables.
With prepopulated values based on a given value for "x", the probabilities can be assessed using a conversion formula (shown below) from the zdistribution, also known as the standardized normal curve.
The zdistribution is a normal distribution with:
A zscore is the number of standard deviations that a given value "x" is above or below the mean of the normal distribution.
A machining process has produced widgets with a mean length of 12.5 mm and variance of 0.0625 mm.
A customer has indicated that the upper specification limit (USL) is 12.65 mm. What proportion of the bars will be shorter than 12.65 mm.
From the table below which is a onetailed table it shows that 0.60 corresponds to 0.7257.
72.57% of the area under the curve is represented below the point of x = 12.65 mm.
The means that 72.57% of the widgets will be below the USL of the customer. This result will not likely meet the Voice of the Customer.
Use the formula:
True uses the Cumulative Density Function (CDF) and False uses the Probability Density Function (PDF).
Once the data is determined to take on a normal distribution (or assumed to be normal) it indicates that the center value for the distribution of data is the mean.
For nonparametric test the measure of central tendency for the distribution of data is the median.
Parametric tests are generally more powerful assuming the same amount of data that nonparametric test for ANOVA and ttest. It is easier (fewer samples) to determine a significant difference) using parametric tests.
Whenever possible (without forcing or skewing data) a GB/BB should try to satisfy the assumptions of normality. The tests are generally easier to apply and work through from a statistical perspective. Most of certification programs will focus more on the parametric tests.
Click here to access hypothesis test flowcharts for choosing the proper test to use for various parametric and nonparametric data.
We have an entire module dedicated to hypothesis testing of data to determine whether it can be assumed to be from a normal distribution. We also cover the various tests and applications of the normal distribution in a Six Sigma project.
Click here to open the webpage regarding the assumption of normality.
When the data set is not normally distributed, the Central Limit Theorem usually applies or a transformation of the data, such as a BoxCox or Johnson transformation applies. This determination MUST be done prior to using hypothesis testing tools.
There are cases when the data distribution will naturally not adhere to a normal distribution. Such as the:
In the first two cases, naturally there will be a lower bound (can not get lower) of 0 seconds but there will not be an upper bound. The data will not likely center around an average but most of the results will be toward the left side, toward 0 seconds and the tail will have those fewer instances that each took a long time.
In the last case, most employees will make within a certain range and then there will be directors, vicepresidents, and executives that gross higher incomes.
The likely output will look similar to the histogram below, a rightskewed distribution:
There are various functions used to transform data such as logarithm, power, square root, and reciprocal. Two of the most common are:
Use the help menu in the statistical software package to guide you in transforming data and it is also a good idea to consult with your mentor to ensure it is being done necessarily and correctly.
Using the reciprocal method is straightforward. Apply the equation y = 1/x. Each data point value becomes its reciprocal. If the original data points (x) were 5, 8, 10, 4, 6, then the transformed data (y) becomes 1/5, 1/8, 1/10, 1/4, and 1/6 respectively.
The BoxCox transformation uses a power transformation but it limited to positive data.
Shop for Six Sigma related materials.
Subscribe for complete access to SixSigmaMaterial.com
Return to the SixSigmaMaterial Home Page
Six Sigma
Six Sigma Modules
The following presentations are available to download.
Green Belt Program (1,000+ Slides)
Basic Statistics
SPC
Process Mapping
Capability Studies
MSA
Cause & Effect Matrix
FMEA
Multivariate Analysis
Central Limit Theorem
Confidence Intervals
Hypothesis Testing
T Tests
1Way Anova Test
ChiSquare Test
Correlation and Regression
SMED
Control Plan
Kaizen
Error Proofing