Power and Sample Size


Show the relationship between Power and Sample Size. The Power of the comparison test refers to the likelihood the decision is made that there is a significant difference when it actually exist.


The Power of a test determines if there is enough sensitivity in the test to detect actual (true) differences. Understand that more power and sample size are necessary to detect smaller differences. The power quantifies the smallest difference the comparison test is capable of detecting.

Formula for determining Power

Power = 1 - Beta Risk (Type II Error)

Confidence Level = 1 - Alpha Risk (Type I Error)

Relationship of Alpha and Beta

How to read the table shown above:

The first row of the table indicates that as the probability of a Type I error increases (Alpha Risk), the Power increases and the probability of a Type II error (Beta Risk) decreases.

Power levels of 80-90% are typically considered to be effective, which is the same as Beta Risk of 10-20%,

In other words, as the Producer is willing to reject more non-defective parts to ensure the defective parts are rejected then the probability of Consumers getting any defects is reduced. This becomes more "powerful" in protecting the Consumers, perhaps the most important risk to protect.

Sample Size

Collecting data consumes time and resources; there is a tangible cost. It is important to collect enough data to detect the difference required but without creating waste by collecting excess data.

The level of Power needed should be determined by the GB/BB/MBB or combination along with input from the team. This value is normally higher as the application becomes more critical. Life dependent, regulatory, and safety applications would require higher levels of power.

Beta should be no higher than 5% to allow for a minimum Power level of 95% in critical applications. For example, choosing a Power of 99%, means that you are willing to accept a 1% chance of having Beta risk. There is a 1% chance that a decision is made that no parts are defective but there is are defective parts and the consumer will suffer.

Determining the level of Power is the starting point in determining the amount of samples to be collected. And getting this quantity of samples add confidence to the test results and inferences to the population.

Again, there is always an argument to achieve near perfect power (99.99999...%) because someone may feel their test or application is of premium importance. However, there is often a price for perfection and this would require an impractical data amount of resources. The more samples collected and analyzed the stronger the power will be to detect smaller differences.


Comparison Tests:

  • One sample Z test
  • One sample t test
  • Two sample t test
  • One sample proportion test
  • Two sample proportion test
  • One way ANOVA
  • Two way ANOVA
  • Factorial/Fractional Design of Experiments (DOE)  

  • Type I Error

    Type I Error = Alpha Risk = Significance Level = Producers Risk = False Positive.

    This is when the decision is made that there is a difference when the truth is there is not. In other words, parts have been determined defective (possibly scrapped) and they were not defective. The Producer suffered by losing stock and needing to make up the lost inventory. 

    Type II Error

    Type II Error = Beta Risk = Consumers Risk = False Negative

    This is when the decision is made that there is not a difference when the truth is there is a difference. In other words, parts have been determined not defective and sent to the customer (or downstream operation) and they were defective. The Consumer suffered by receiving defects.

    Sample Size calculation

    Other Example

    Determine the sample size needed to detect a mean shift of 0.049 on a process with standard deviation of 0.03924. Use alpha of 5% and beta of 10%. The mean of one set of 40 samples from a normally distributed set of data was 0.430 and the mean from another set of 40 samples from a normally distributed set of data was 0.381.

    Power and Sample Size using software


    From the top picture, notice that 15 samples were needed to detect difference of 0.049 at a Power of 90%. From the bottom picture, notice the true Power is >99% since the sample size was actually 40.

    Keep in mind this example focuses on statistically detecting a shift in the mean. This does not indicate anything about the variation between the two sets of data. If this were a before and after analysis, it is possible the variation increased after while still shifting the mean favorably. The emphasis of Six Sigma is on variation reduction and the F-test is used in this case (normal data) to determine if there is a statistical change in the variation.


    Return to the ANALYZE Phase

    Accelerate your Six Sigma Project

    Return to the Six-Sigma-Material Home page

    Site Membership
    Click for a Password
    to access entire site

    Six Sigma
    Templates & Calculators

    Six Sigma Certifications

    Six Sigma Black Belt accredited, online, self-paced course
    Six Sigma Green Belt accredited, online, self-paced course
    Six Sigma Yellow Belt accredited, online, self-paced course

    Six Sigma Modules

    The following presentations are available to download

    Click Here

    Green Belt Program 1,000+ Slides
    Basic Statistics
    Process Mapping
    Capability Studies
    Cause & Effect Matrix
    Multivariate Analysis
    Central Limit Theorem
    Confidence Intervals
    Hypothesis Testing
    T Tests
    1-Way Anova Test
    Chi-Square Test
    Correlation and Regression
    Control Plan