The number of samples required is dependent on a few factors:
To determine the sample size, n, necessary when estimating μ, you would start by solving for n in using the z formula for sample means.
The value of (x-bar - μ) is referred to as the error of estimation. The 'E" represents the error of estimation in the formula below.
The z-score (z) is selected that corresponds for a chosen level of confidence.
Most commonly assumed confidence level is 95% which is an alpha-risk of 0.05. Also, in most cases the question is two-tailed so shown below is the corresponding z value.
Practice Question:
What sample size, n, is needed to specify a 95% confidence interval of +/- 1.5 units from the mean?
Assuming a normal distribution, the set of data of widget lengths has a historical variance of 13.4.
In this case it is two-tailed and z = 1.96
The variance is 13.4 mm (which is the standard deviation squared).
Substituting:
n = (1.96^{2} * 13.4) / 1.5^{2}
n = 3.8416 * 13.4 / 16 = 51.477 / 2.25 = 22.87
n = 23 samples (always round up to next sample to ensure enough Power)
23 randomly selected samples are needed to attain a 95% confidence level and produce an error within 1.5 units for a standard deviation of 3.661 mm (sq rt of 13.4 mm).
Use the sample size shown above with the exception of substituting the Poisson average in place of the standard deviation.
Poisson average = n*p-hat = mean (from the attribute C chart of the data)
The following formula is used to determine the sample size required to estimate the population proportion with an unknown population.
Such as responses of YES/NO or PASS/FAIL or HOT/COLD
The sample Proportion (p-bar) is expressed as a decimal from 0-1.0. This represents the percentage of population selecting a given choice. Notice that the value selected for p-bar is multiplied by (1 - p-bar), therefore the worst case scenario (to provide the highest number of samples to gather) is done by using a p-bar of 50% or 0.5. There it is also multiplied by 0.5 (since 1-0.5 = 0.5).
0.5 * 0.5 = 0.25. Any other combination such as 75%/25% would be 0.75*0.25 = 0.1875 or 90%/10% = 0.9*0.1 = 0.09, is less than 0.25. Therefore the maximum value for p-bar(1 - p-bar) is 0.25.
A higher numerator value increases the value of n, which is desirable to get a sample size that ensures an accurate result for the selected level of confidence an margin of error.
The z-score (z) is selected that corresponds for a chosen level of confidence. The higher you want the level of confidence, the higher the z-score, the more samples are needed.
The denominator is the error of estimation (E), a.k.a. margin of error, If you are looking for a maximum of 2% error, the value of 0.02 is used in the formula for E.
The lower the selected value for E, the higher the number of samples needed will be.
Example of sample size calculation for a discrete (binomial) proportion estimation with an unknown population.
We are choosing z = 1.645 for a confidence level (α) of 90%,
p-bar = 0.5
E = 2% = 0.02
Therefore, solving for the sample size, n =
n = 1.645^{2} * 0.5 * (1 - 0.5) / 0.02^{2}
n = 0.6765062 / 0.0004 = 1691.27 (always round up)
n = 1692. A minimum of 1,692 samples are needed.
If you wanted a confidence level of 95%, the z-score is 1.96 and n becomes 2,401 samples.
The following formula is used to determine the sample size required to estimate the population proportion with a known finite population, N.
Example:
In this case, you choose a 95% confidence level, 3.5% ME, and the population is finite and known to be 10,000.
z = 1.96
N = 10,000
E = 0.035
p-bar = 0.5
n = [1.962 * 0.5 * (1 - 0.5) / 0.0352] / [1 + (1.962 * 0.5 * (1 - 0.5) / (0.0352 * 10000))]
n = 784 / 1.0784 = 727.003
n = 728 samples
In other words, you need to sample 7.28% of the total population to get the ME and 95% level of confidence desired.
Subscribe to access the entire site
Oct 18, 21 09:32 AM
Sep 14, 21 09:19 AM
Aug 16, 21 01:25 PM
Six Sigma
Templates, Tables & Calculators
Six Sigma Slides
Green Belt Program (1,000+ Slides)
Basic Statistics
Cost of Quality
SPC
Process Mapping
Capability Studies
MSA
SIPOC
Cause & Effect Matrix
FMEA
Multivariate Analysis
Central Limit Theorem
Confidence Intervals
Hypothesis Testing
T Tests
1-Way ANOVA
Chi-Square
Correlation
Regression
Control Plan
Kaizen
MTBF and MTTR
Project Pitfalls
Error Proofing
Z Scores
OEE
Takt Time
Line Balancing
Yield Metrics
Sampling Methods
Data Classification
Practice Exam
... and more