The Central Limit Theorem states that the distribution of the sample mean can be approximated by a normal distribution although the original population may be non-normal.
The grand average, resulting from averaging sets of samples or the average of the averages, approaches the universe mean as the number of sample sets approaches infinity.
For a given population standard deviation, as the sample size (n) increases, the Standard Error of the Mean (SE) decreases. SE is the standard deviation of the sampling distribution.
See the formula for calculating the Standard Error of the Mean below:
This formula implies that as the denominator increases (which is the sample size), then the SE of the Mean decreases.
Thinking about this practically, if you get more and more samples, the risk of error to the population should get smaller since you are getting closer and closer to actually using the entire population. Every one sample size more, gets you that much closer to the population.
But this is not linear. Plug in a few values and you will see the effect of diminishing returns up to a point. Using simple number to illustrate the concept, if the population standard deviation is 1.0 and the sample size (n) = 1.0, then the SE Mean = 1.0.
If n = 9, SE Mean = 0.333
If n = 64, SE Mean = 0.125
If n = 10,000, SE Mean = 0.01
If n = 1,000,000, SE Mean = 0.001
In other words, as the sample size approaches infinity, the SE Mean approaches zero. However, in reality it is rare to practically obtain these large samples and therefore some error will exists.
Large samples sizes:
- generate a better estimation of the population since the sampling error in minimized.
- distribution curve gets narrower
- the average difference from the statistic to the parameter decreases. The values of x-bar will have less variation and get closer to the population mean, m.
CLT says that as n increases the SE Mean goes towards zero (as shown above) and the distribution of sample means will approach a normal distribution.
Sampling distributions tend to become normal distributions when sample sizes are large enough which is generally at n > 30 for unknown distributions.
Six Sigma Modules
Green Belt Program (1,000+ Slides)
Cause & Effect Matrix
Central Limit Theorem
1-Way Anova Test
Correlation and Regression