Calculating Sample Size

The number of samples required is dependent on a few factors:

  1. The desired detection difference you want. Do you want to see any difference or do want to detect a certain amount of difference and in which direction?
  2. The Confidence Level (or 1 - alpha risk, ∝). Typically, 95% (or 5% alpha-risk)
  3. The desired level of Power (or 1 - beta risk, β) - not always necessary
  4. The level of variability (variance) or standard deviation 



Calculating Sample Size - Variables Data

To determine the sample size, n, necessary when estimating μ, you would start by solving for n in using the z formula for sample means. 

 

zformula

The value of (x-bar - μ) is referred to as the error of estimation. The 'E" represents the error of estimation in the formula below.

The z-score (z) is selected that corresponds for a chosen level of confidence.

Most commonly assumed confidence level is 95% which is an alpha-risk of 0.05. Also, in most cases the question is two-tailed so shown below is the corresponding z value. 

Practice Question:

What sample size, n, is needed to specify a 95% confidence interval of +/- 1.5 units from the mean?

Assuming a normal distribution, the set of data of widget lengths has a historical variance of 13.4. 

In this case it is two-tailed and z = 1.96

The variance is 13.4 mm (which is the standard deviation squared).

Substituting:

n = (1.962 * 13.4) / 1.52

n = 3.8416 * 13.4 / 16 = 51.477 / 2.25 = 22.87

n = 23 samples (always round up to next sample to ensure enough Power)

23 randomly selected samples are needed to attain a 95% confidence level and produce an error within 1.5 units for a standard deviation of 3.661 mm (sq rt of 13.4 mm).



Calculating Sample Size - Poisson Data

Use the sample size shown above with the exception of substituting the Poisson average in place of the standard deviation

Poisson average = n*p-hat = mean (from the attribute C chart of the data)


Calculating Sample Size - Binomial Data

The following formula is used to determine the sample size required to estimate the population proportion with an unknown population. 

Such as responses of YES/NO or PASS/FAIL or HOT/COLD

The sample Proportion (p-bar) is expressed as a decimal from 0-1.0. This represents the percentage of population selecting a given choice. Notice that the value selected for p-bar is multiplied by (1 - p-bar), therefore the worst case scenario (to provide the highest number of samples to gather) is done by using a p-bar of 50% or 0.5. There it is also multiplied by 0.5 (since 1-0.5 = 0.5). 

0.5 * 0.5 = 0.25. Any other combination such as 75%/25% would be 0.75*0.25 = 0.1875 or 90%/10% = 0.9*0.1 = 0.09, is less than 0.25. Therefore the maximum value for p-bar(1 - p-bar) is 0.25. 

A higher numerator value increases the value of n, which is desirable to get a sample size that ensures an accurate result for the selected level of confidence an margin of error. 

The z-score (z) is selected that corresponds for a chosen level of confidence. The higher you want the level of confidence, the higher the z-score, the more samples are needed.

The denominator is the error of estimation (E), a.k.a. margin of error. If you are looking for a maximum of 2% error, the value of 0.02 is used for E. 

The lower the selected value for E, the higher the number of samples needed will be. 

Calculating Sample Size for Proportion with Unknown Population

Practice Question:

Sample size calculation for a discrete (binomial) proportion estimation with an unknown population. 

We are choosing z = 1.645 for a confidence level (α) of 90%, 

p-bar = 0.5

E = 2% = 0.02

Therefore, solving for the sample size 

n = 1.6452 * 0.5 * (1 - 0.5) / 0.022

n = 0.6765062 / 0.0004 = 1691.27 (always round up)

n = 1692. A minimum of 1,692 samples are needed. 

If you wanted a confidence level of 95%, the z-score is 1.96 and n becomes 2,401 samples.


The following formula has a 'small finite population correction factor' which means fewer samples are needed versus an unknown large infinite population.

This formula used to determine the sample size required to estimate the population proportion with a known small finite population, N. 

Sample Size Proportions Finite Population

Practice Question:

In this case, you choose a 95% confidence level, 3.5% ME, and the population is finite (and considered small) and known to be 10,000. 

z = 1.96

N = finite population size which = 10,000

E = Margin of Error = 0.035

p-bar = 0.5

Solving:

n = [1.962 * 0.5 * (1 - 0.5) / 0.0352] / [1 + (1.962 * 0.5 * (1 - 0.5) / (0.0352 * 10000))]

Numerator = 0.9604 / 0.001225 = 784

Denominator = 1 + (0.9604 / 12.25) = 1.078

n = 784 / 1.078 = 727.3

n = 728 samples

In summary, without the small finite population correction factor, the sample size needed would have been 784 samples. Knowing the sample size is finite and "small", the number of samples needed is 728.

In other words, you need to sample 7.28% of the total population to get the ME and 95% level of confidence desired.

And if N were to increase, then the number of samples, n, would increase too if you want to maintain the same E and confidence interval. 



Return to Power & Sample Size

Templates and Calculators

Subscribe to access the entire site

Search Six Sigma job openings

Return to the home page


Recent Articles

  1. Process Capability Indices

    Oct 18, 21 09:32 AM

    Determing the process capability indices, Pp, Ppk, Cp, Cpk, Cpm

    Read More

  2. Six Sigma Calculator, Statistics Tables, and Six Sigma Templates

    Sep 14, 21 09:19 AM

    Six Sigma Calculators, Statistics Tables, and Six Sigma Templates to make your job easier as a Six Sigma Project Manager

    Read More

  3. Six Sigma Templates, Statistics Tables, and Six Sigma Calculators

    Aug 16, 21 01:25 PM

    Six Sigma Templates, Tables, and Calculators. MTBF, MTTR, A3, EOQ, 5S, 5 WHY, DPMO, FMEA, SIPOC, RTY, DMAIC Contract, OEE, Value Stream Map, Pugh Matrix

    Read More

Custom Search


Site Membership
LEARN MORE


Six Sigma

Templates, Tables & Calculators


Six Sigma Slides

CLICK HERE

Green Belt Program (1,000+ Slides)

Basic Statistics

Cost of Quality

SPC

Process Mapping

Capability Studies

MSA

SIPOC

Cause & Effect Matrix

FMEA

Multivariate Analysis

Central Limit Theorem

Confidence Intervals

Hypothesis Testing

T Tests

1-Way ANOVA

Chi-Square

Correlation

Regression

Control Plan

Kaizen

MTBF and MTTR

Project Pitfalls

Error Proofing

Z Scores

OEE

Takt Time

Line Balancing

Yield Metrics

Sampling Methods

Data Classification

Practice Exam

... and more



Statistics in Excel


Need a Gantt Chart?

Click here to get this template