
This module covers hypothesis testing of Proportions involving one factor and with one, two, or more samples. These tests assume a Binomial Distribution.
Assumptions:
The following two test will be covered below and chisquare is within another module.
1 Proportion Test Examples
Determine if there is a difference in the percentage of voters that turn out for an election compared to a target (or claim)
Determine if a percentage success rate claim is actually as claimed
Determine if proportion of vegetarians has grown compared to the proportion from two years ago. Furthermore, could evaluate this by female or male.
2 Proportion Test Examples
Determine if there is a difference in the percentage of voters that turn out for two elections
Determine if there is a difference in the defect rates between two operators (or two machines, or two shifts)
Determine if there is a difference in percentage of delivery performance among two suppliers
Compare proportion of females and males that are vegetarians
Chi Square Test Example
Determine if there is a difference in percentage of delivery performance among six suppliers
The first step in any hypothesis test is to follow the visual tools of hypothesis testing to help break down the problem to simpler steps.
A Six Sigma Black Belt gathers data that shows 27,798 out of 112,561 registered voters voted in the last election. One party believes that if there is a lower turnout they will have a better chance to win the next election.
The goal for that party is to have fewer than 25% of registered voters to vote in the next election. Test the hypothesis that at least 25% of the people turned out for the last election.
1) Practical Problem:
Is the proportion from the population different from the target value? The target value may be an actual target, a historical value, or some other given value.
Is the percentage of voters from the last election <25%?
2) Statistical Problem:
Ho: P = Target (or given) value
HA: P ≠ Target value (twotailed), P < Target value (onetail) or P > Target value (onetail)
Assume the αrisk is 0.05. Reject the null hypothesis if pvalue < αrisk. Calculate the pvalue applying the Binomial Distribution. A registered voter did one of two things: Voted or Did not Vote (binomial).
Is population proportion of voters <25%?
Ho: P = 25%
HA: P < 25% (onetailed test)
3) Statistical Solution:
Using Minitab to determine the exact pvalue. When entering the data in Minitab for this example the "events" are sometimes referred to as the "successes" although that may not be the correct term to apply to each problem. The term "trials" is the denominator of the percentage (or the population of registered voters in this case).
The pvalue is 0.009. Since the pvalue < αrisk of 0.05, reject the null hypothesis.
Recall, the pvalue represents the probability of being wrong if the null hypothesis is rejected.
4) Practical Solution:
Infer HA: There is sufficient evidence that less than 25% of the registered voters will vote.
This is the conclusion that the party desires; however, if 27,901 people voted that is the critical value were the pvalue = αrisk of 0.05. That represents a difference of only 103 people and the statistical conclusion reverses.
Reminder, before reaching any statistical conclusion verify the assumptions are satisfied.
The proportion (or percentage) of voters in the above example is 24.696%. Lets assume the same approximate percentage but from a smaller population of registered voters. Say that 409 out of 1,656 registered voters voted (which is 24.698%) and run the same test.
The results is a pvalue of 0.401.
The result is the opposite from above. The pvalue is > 0.05, infer the null hypothesis, HO.
There is insufficient evidence that <25% of registered voters will vote. Practically speaking, the party should be concerned if there is a similar turnout for the next election and the population is this small (1,656 registered people).
In this case, it is the sample size that is impacting the hypothesis test. There is not enough of a sample size to bring enough strength to the conclusion that was derived earlier.
There is strength in numbers and logically, the more samples, the more representative of the population, the more confident the result.
A large supplier of components is advertising that at least 98% percent of their customers are pleased with the quality of their components. With >500,000 customers and growing skepticism, a research firms decides to conduct their own analysis. The research firm is careful to use simple random sampling which allows the use of this statistical test.
They sampled 500 customers and 485 of them agreed that they are pleased.
Assuming a 0.025 level of significance and these figures from the sample what conclusions can be drawn?
State the Practical Problem:
Is the claim that >98% of their customers are pleased an accurate statement?
Write the Practical Problem:
Null Hypothesis, HO: P = 0.98
Alternative Hypothesis, HA: P > 0.98
Keep in mind the Null and Alternative must be mutually exclusive, which means they can not overlap.
This is a onetailed test since the words "at least" were used instead of saying just 98% (which would leave the test open both ways becoming twotailed). The words "at least" also means "greater than or equal to".
Alpha risk = Level of Significance = 0.025 (or 2.5%)
485 of 500 customer in the sampled were found to be pleased, equaling 97%. Based on the sample size the initial conclusion may be that the supplier's claim is not true since 97% is <98%.
However, the sample proportion may be too small to statistically conclude that the suppliers claim is valid.
For alpha risk of 0.025, we shall reject the Null Hypothesis if p lies more than 1.96 standard deviations above P = 0.98.
Find the Statistical Solution:
P represents the hypothesized population proportion
p represents the sample proportion that are pleased.
P = 0.98
1  P = 0.02
p = 0.97
Calculate the standard deviation and zscore test statistic to lead to a pvalue.
Sample standard deviation = sqrt(P * (1P) / n) = sqrt(0.98 *0.02 / 500) =0.00626
z = (p  P) / st. dev = (0.97  0.98) / 0.00626 = 1.597 or 1.60
Since this is a onetailed test, the pvalue represents the probability that the zscore is greater than 1.60.
P(z > 1.60) = 0.9452 using the table shown below.
State the Practical Conclusion:
The approximate pvalue using normal distribution table is 0.9452 and the exact pvalue is 0.953. Both are close to the same which is > than the αrisk of 0.025.
Therefore infer the Null Hypothesis, Ho. There is insufficient evidence to support the claim that >98% of the customers are pleased.
Using the Normal Approximation option (under Method) shows the zvalue and show the approximate pvalue. The exact pvalue is 0.953 which has the same practical conclusion as the approximate pvalue of 0.945
State the Practical Problem:
Is these a difference between the two population proportions using the data gathered from two independent samples.
Write the Practical Problem:
Null Hypothesis, Ho: P1 = P2 (sometimes written as P1  P2 = 0 which is the same)
Alternative Hypothesis, HA: P1 ≠ P2
The level of significance, αrisk, chosen is 0.05.
Find the Statistical Solution:
In this case we will use Minitab for simplicity.
For your information, the formula to calculate the zscore test statistic is shown below when testing for no difference. The standard error represents the denominator of the formula.
In this case, the data is provided as summarized. However, you may have it in columns within Minitab or all in one column.
The Number of Events is sometimes referred to as "successes" but that isn't always the most appropriate word depending on your situation.
The Trials value is the total number of samples for each proportion being studied.
Select Options to bring up a dialog box to fill in the remaining selections and click OK a couple times to run the test. The following results appear:
The pvalue < αrisk of 0.05. Therefore, reject the HO (null hypothesis) and infer the HA.
State the Practical Conclusion:
There is a difference between the proportion of vegetarians less than 30 years of age compared to the proportion greater than 30 years of age.
That result is somewhat obvious due to quick glance at the numbers and the fact that the sample sizes are large. This result alone isn't very meaningful. You may want to continue running scenarios to understand how much different the proportions are that can be statistically supported.
Lets test a new claim for further review on this topic:
Let's say the Alternate Hypothesis was that the to prove the proportion of vegetarians under the age of 30 was greater than the proportion of vegetarians over the age of 30.
HA: P1 > P2
The selection is shown below in Minitab:
The final results are a pvalue = 0.000 which is the same practical conclusion as above, which is not a complete surprise either based on quick review of the proportions themselves.
Lets try yet another claim.
There is a claim that the proportion of women that are vegetarians under the age of 30 is >6% than the proportion of women over the age of 30. Using the same samples and risk, the Minitab entries are shown as below:
The pvalue when testing whether the difference is >6% = 0.470, so in this case there is not enough evidence to reject the Null Hypothesis, Ho.
Therefore, there is not enough evidence to support a claim that the proportion of women under the age of 30 that are vegetarians is >6% than the proportion of women that are vegetarians over the age of 30.
So, what is the critical value, given these same samples and risk?
See the findings below. The percentage of 4.07% is the critical value where the pvalue = level of significance or the alpha risk (αrisk).
An approximation for the Confidence Interval for a population proportion is shown below. Many business decisions involve population proportions such as estimating market share and proportions of goods that are acceptable or defective.
EXAMPLE:
A survey was conducted on 300 emerging, domestic, small capital companies and found that 153 had an Emergency Action Plan that detailed reaction plans to maintain operations and customer service in the event of major illness or outbreak such as the swine flu.
Calculate the 92% confidence interval to estimate the proportion of emerging domestic, small capital companies that have an adequate Emergency Action Plan.
n = 300
phat = 153/300 = 51% = 0.51
The critical Z(0.04) value = 1.75
The Confidence Interval states that with 92% confidence, the proportion of all similar companies with the plan will between 46% and 56%.
The above equation is most effective for an approximation when
Click here for more about calculating confidence intervals for means & standard deviation.
Return to the SixSigmaMaterial Home page
Jul 17, 16 12:12 AM
Proper data classification is necessary to select correct statistical tools
Jun 22, 16 07:13 PM
Description of the 7Wastes, also called Muda
Feb 03, 16 10:43 PM
Determing the process capability indices, Pp, Ppk, Cp, Cpk, Cpm
Six Sigma
Six Sigma Modules
The following presentations are available to download.
Green Belt Program (1,000+ Slides)
Basic Statistics
SPC
Process Mapping
Capability Studies
MSA
Cause & Effect Matrix
FMEA
Multivariate Analysis
Central Limit Theorem
Confidence Intervals
Hypothesis Testing
T Tests
1Way Anova Test
ChiSquare Test
Correlation and Regression
SMED
Control Plan
Kaizen
Error Proofing