
Objective:
This section will explain the meaning of the Confidence Interval (CI) in statistical analysis. The calculations will be shown for the:
This site assumes that for means and standard deviation the sample data comes from a normal distribution. For proportions, the normal distribution approximates the binomial for n x P(hat) is greater than or equal to 5.
Most common confidence interval selections are 90%, 95%, or 99% but are dependent on the voice of the customer, your company, project, and other factors.
Sample statistics such as the mean, standard deviation and proportion (xbar, s, pbar) are only estimates of the population parameters.
Confidence Intervals are used to quantify the uncertainty by providing a lower limit and upper limit that represent a range of values that will represent the true population parameter with a specified level of confidence.
Selecting a 99% CI suggests that approximately 99 out of 100 CI's will contain the population parameter. A 0.99 confidence interval states that there is 99% probability that the interval contains the population parameter, and that there is a 1.0% risk that the population parameter is not contained within the interval.
Confidence Interval = CI = 1  alpha
(1  alpha) is called the probability content or level of confidence.
Alpharisk is known as the significance level; the probability of being making an incorrect decision, in other words, being wrong.
A specified value of the CI signifies that probability of the interval containing the population parameter, and that there is an alpharisk (1CI) that the population parameter is not contained within the interval.
CI's are applied in statistical test for means, standard deviations, proportions, capability indices, regression analysis, and upper/lower control limits on control charts.
In regression analysis, the CI is based on a provided value of X for a given level of confidence. This CI is likely to contain the true best fit line.
There are three factors that impact the confidence interval:
Sample Size
CI's are used when you are unable to capture and analyze an entire population (census) and the sample (statistics) to infer statements about a population. The larger your sample size, the more confidence one can be that their answers represent the population. Though the relationships are not linear, the larger the sample size the smaller the confidence interval (more confident you can be that it the true population parameters will fall within a tighter spectrum).
Population Size
The size of the population is a factor when working with a relatively small and known group of data (such as the number of pieces of candy in a bag versus the number of fish in the ocean).
The CI calculations assume you have a true random sample of the population. If the sample is not then one cannot rely on the confidence intervals calculated, because you can no longer rely on the measures of central tendency and dispersion.
Sampling plans are an important step to ensure the data taken within is reflective and meaningful to represent the population. Click here for information regarding sampling plans.
Percentage
The accuracy of the CI also depends on the percentage of your sample that picks a particular answer. If 99.9% of the parts sampled PASSED and the 0.1% FAILED, the chances of error are very low regardless of sample size.
However, if the percentages are 51% and 49% the chances of error are much greater. It is easier to be sure of extreme answers than those aren't, thus the interval is not linear.
The CI for the mean represents the sample mean +/ confidence factor * a measure of variability.
If the population standard deviation IS known, reference the Zdistribution table.
If the population standard deviation IS NOT known, reference the tdistribution table.
As
the tdistribution sample size increases it behaves like zdistribution
and tvalue approaches 0. For a given level of confidence, the
tdistribution becomes a flatter "bell" curve with the tstatistic
increasing as the sample size decreases.
EXAMPLE:
A
sampling of 22 patients that came into the emergency room showed that
they waited on average 45 minutes with a standard deviation of 5.8
minutes. Estimate the 99% confidence interval for the average wait time
of the patients. Assume population is normally distributed and the
population standard deviation is not known.
Given:
n = 22
sample mean = 45 minutes
sample standard deviation = 5.8 minutes
Degrees of freedom (dF) = n1 = 21
Alpharisk = 1CI = 10.99 = 0.01
The critical tvalue from the table using two tailed is 2.831
(Remember to take the alpharisk/2 when using the ttable)
This is telling us that the point
estimate of the average wait time is 45 minutes with an error of +/ 5.8
minutes. There is 99% certainty that the interval {41.5 minutes to 48.5
minutes} contains the true process mean. There is a 1% chance that this
decision is wrong.
What would happen to the width of the CI if you selected to use only a level of 90%?
Think
about what is occurring. You are willing to accept a much lower level
of confidence that the interval will contain the true population mean (the
actual waiting time of all patients coming into the emergency room), so
you can tighten the range of values.
If you want to nearly guarantee that your interval contains the true population mean, then you
would want to include every value in the interval, so the interval spreads as
the the level of confidence desired increases.
The only value that changes is the critical tvalue, it is now 1.721, and the CI is now (42.87,47.12).
As
CI increases, the interval spreads. As sample size increases, the
interval narrows (more representative of the entire population).
You can use Excel to
find only the CI for a population mean. The population standard
deviation must be known. Excel uses the Ztable to reference in its
calculation.
You must determine the sample mean (xbar) for the result to make sense and using 95% confidence level.
Suppose you have the following data:
Alpha risk = Level of Significance = 1  Confidence Level = 0.05
Population Standard Deviation = 6.48
Sample Size = 27
Sample Mean (xbar) = 50
The data would be entered in as shown and the result is 2.44 as shown in cell A1.
Knowing that you determined your sample mean (xbar) to be 50, add 2.44
to get the upper limit of the interval and subtract 2.44 to get the
lower limit of the interval and that becomes the CI for the population
mean.
The interval is 50 +/ 2.44, or 47.56 to 52.44
The chisquared distribution is not symmetrical and each varies according the degrees of freedom, dF.
The degrees of freedom equals n1, dF = n1.
This
technique lacks robustness, in that it is very important that the
population is known to be normally distributed when using it to estimate
the population variance or standard deviation.
EXAMPLE:
Twentyfive
assembly line workers throughout the Southwest United States were found to have a
standard deviation in their total compensation of $2.43. The average
total compensation of an assembly line worker in the that was published
by Bureau of Labor and Statistics was $38.73 for a similar worker in the
Southwest.
Calculate the population standard deviation using a 95%
confidence level. Assume the population is known to be normally
distributed.
Sample standard deviation = $2.43
The average wage of $38.73 is not needed for the CI calculation.
n = 25
dF = n1 = 24
Many business decisions
involve population proportions such as estimating market share and
proportions of goods that are acceptable or defective.
EXAMPLE:
A
survey was conducted on 300 emerging, domestic, small capital companies
and found that 153 had an Emergency Action Plan that detailed reaction
plans to maintain operations and customer service in the event of major
illness or outbreak such as the swine flu.
Calculate the 92%
confidence interval to estimate the proportion of emerging domestic,
small capital companies that have an adequate Emergency Action Plan.
n = 300
phat = 153/300 = 51% = 0.51
The critical Z(0.04) value = 1.75
The CI states that with 92% confidence, the proportion of all similar companies with the plan will between 46% and 56%.
To determine the CI for process capability use the formula provided below where:
USL = customer upper specification limit
LSL = customer lower specification limit
Pp is a process index that numerically describes the long term capability.
Cp is the short term indicator, Cp should always be analyzed with Cpk, as Pp should always be analyzed with Ppk.
Both
Cp and Pp are a function of the process standard deviation, not a
nominal (target) value that may be historical or provided by the
customer.
Click here to purchase slides that offer more information regarding confidence intervals. Often, statistics are not expressed in terms of one number but rather as a range or an interval with a given level of confidence. 
Search for active Six Sigma related job openings
Return to the SixSigmaMaterial Home page
Six Sigma
Six Sigma Modules
Green Belt Program (1,000+ Slides)
Basic Statistics
SPC
Process Mapping
Capability Studies
MSA
Cause & Effect Matrix
FMEA
Multivariate Analysis
Central Limit Theorem
Confidence Intervals
Hypothesis Testing
T Tests
1Way Anova Test
ChiSquare Test
Correlation and Regression
Control Plan
Kaizen
Error Proofing