Confidence Interval

Objective:

This section will explain the meaning of confidence intervals in statistical analysis. The calculations will be shown for the:

  • Mean
  • Standard Deviation
  • Proportion

    This site assumes that for means and standard deviation the sample data comes from a normal distribution. For proportions, the normal distribution approximates the binomial for n x P(hat) is greater than or equal to 5.

    Most common confidence interval selections are 90%, 95%, or 99% but are dependent on the voice of the customer, your company, project, and other factors.





    Sample statistics such as the mean, standard deviation and proportion (x-bar, s, p-bar) are only estimates of the population parameters.

    Confidence intervals are used to quantify the uncertainty by providing a lower limit and upper limit that represent a range of values that will represent the true population parameter with a specified level of confidence.

    Selecting a 99% confidence interval suggests that approximately 99 out of 100 confidence intervals will contain the population parameter. A 0.99 confidence interval states that there is 99% probability that the interval contains the population parameter, and that there is a 1.0% risk that the population parameter is not contained within the interval.

    Confidence Interval = CI = 1 - alpha

    (1 - alpha) is called the probability content or level of confidence.

    Alpha-risk is known as the significance level; the probability of being making an incorrect decision, in other words, being wrong.

    A specified value of the confidence interval signifies that probability of the interval containing the population parameter, and that there is an alpha-risk (1-CI) that the population parameter is not contained within the interval.

    Confidence Intervals are used when you are unable to capture and analyze an entire population and the sample (statistics) to infer statements about a population.

    Confidence Intervals are applied in statistical test for means, standard deviations, proportions, capability indices, regression analysis, and upper/lower control limits on control charts.

    In regression analysis, the confidence interval is based on a provided value of X for a given level of confidence. This confidence interval is likely to contain the true best fit line.

    Confidence Interval for the Mean

    The Confidence Interval for the mean represents the sample mean +/- confidence factor * a measure of variability.

    If the population standard deviation IS known, reference the Z-distribution table.

    If the population standard deviation IS NOT known, reference the t-distribution table.

    As the t-distribution sample size increases it behaves like z-distribution and t-value approaches 0. For a given level of confidence, the t-distribution becomes a flatter "bell" curve with the t-statistic increasing as the sample size decreases.

    EXAMPLE:

    A sampling of 22 patients that came into the emergency room showed that they waited on average 45 minutes with a standard deviation of 5.8 minutes. Estimate the 99% confidence interval for the average wait time of the patients. Assume population is normally distributed and the population standard deviation is not known.

    Given:

    n = 22
    sample mean = 45 minutes
    sample standard deviation = 5.8 minutes
    Degrees of freedom (df) = n-1 = 21
    Alpha-risk = 1-CI = 1-0.99 = 0.01
    The critical t-value from the table using two tailed is 2.831
    (Remember to take the alpha-risk/2 when using the t-table)

    Example calculation of CI for the Mean

    What does this tell us?

    This is telling us that the point estimate of the average wait time is 45 minutes with an error of +/- 5.8 minutes. There is 99% certainty that the interval {41.5 minutes to 48.5 minutes} contains the true process mean. There is a 1% chance that this decision is wrong.

    What would happen to the width of the interval if you selected to use only a level 90% confidence interval?

    Think about what is occurring. You are willing to accept a much lower level of confidence that the confidence interval will contain the true population mean (the actual waiting time of all patients coming into the emergency room), so you can tighten the range of values.

    If you want to almost guarantee that your interval contains the true popluation mean, then you would want to include every value in the interval, so the confidence interval spreads as the the level of confidence desired increases.

    The only value that changes is the critical t-value, it is now 1.721, and the CI interval is now (42.87,47.12).

    As confidence level increases, the interval spreads. As sample size increases, the interval narrows (more representative of the entire population).



    Using Excel to calculate Confidence Interval

    You can use Excel to find only the confidence interval for a population mean. The population standard deviation must be known. Excel uses the Z-table to reference in its calculation.

    You must determine the sample mean (x-bar) for the result to make sense and using 95% confidence level.

    Suppose you have the following data:

    Alpha = Level of Significance = 1 - Confidence Level = 0.05
    Population Standard Deviation = 6.48
    Sample Size = 27
    Sample Mean (x-bar) = 50

    The data would be entered in as shown and the result is2.44 as shown in cell A1.

    Confidence Interval example using Z-distribution

    Knowing that you determined your sample mean (x-bar) to be 50, add 2.44 to get the upper limit of the interval and subtract 2.44 to get the lower limit of the interval and that becomes the confidence interval for the population mean.

    The interval is 50 ± 2.44, or 47.56 to 52.44



    Confidence Interval for the Standard Deviation

    The chi-squared distribution is not symmetrical and each varies according the degrees of freedom, df.

    The degrees of freedom equals n-1, df = n-1.

    This technique lacks robustness, in that it is very important that the population is known to be normally distributed when using it to estimate the population variance or standard deviation.

    EXAMPLE:

    Twenty-five assembly line workers throughout the Southwest were found to have a standard deviation in their total compensation of $2.43. The average total compensation of an assembly line worker in the that was published by Bureau of Labor and Statistics was $38.73 for a similar worker in the Southwest. Calculate the population standard deviation using a 95% confidence level. Assume the population is known to be normally distributed.

    Sample standard deviation = $2.43
    The average wage of $38.73 is not needed for the CI calculation.
    n = 25
    DF = n-1 = 24

    Confidence Interval calculation for standard deviation



    Confidence Interval for Proportions

    Many business decisions involve population proportions such as estimating market share and proportions of goods that are acceptable or defective.

    EXAMPLE:

    A survey was conducted on 300 emerging, domestic, small capital companies and found that 153 had an Emergency Action Plan that detailed reaction plans to maintain operations and customer service in the event of major illness or outbreak such as the swine flu.

    Calculate the 92% confidence interval to estimate the proportion of emerging domestic, small capital companies that have an adequate Emergency Action Plan.

    n = 300
    p-hat = 153/300 = 51% = 0.51
    The critical Z(0.04) value = 1.75



    Confidence Interval calculation for Proportion

    The confidence interval states that with 92% confidence, the proportion of all similar companies with the plan will between 46% and 56%.


    Confidence Interval for Capability

    To determine the confidence interval for process capability use the formula provided below where:

    USL = customer upper specification limit LSL = customer lower specification limit

    Pp is a process index that numerically describes the long term capability.

    Cp is the short term indicator, Cp should always be analyzed with Cpk, as Pp should always be analyzed with Ppk.

    Both Cp and Pp are a function of the process standard deviation, not a nominal (target) value that may be historical or provided by the customer.

    Confidence Interval calculation for process capability








    Confidence Interval Roadmap

    Calculating Confidence Intervals

    CI Chart







    Return to BASIC STATISTICS

    Download related materials for your Six Sigma project

    Search for active Six Sigma related job openings

    Return to the Six-Sigma-Material Home Page from Confidence Interval



    footer for Confidence Interval page