
The Chisquare distribution is a measure of difference between actual (observed) counts and expected counts.
It is most often for hypothesis tests (such as when there are >2 samples and comparing proportions) and in determining confidence intervals (such as the confidence interval for the standard deviation).
Unlike the normal distribution, the chisquare distribution is not symmetric. Separate tables exist for the upper and lower tails of the distribution.
This statistical test can be used to examine the hypothesis of independence between two attribute variables and determine if the attribute variables are related and fit a certain probability distribution.
Ideally used when comparing more than two samples otherwise use the 2Proportions Test (with two samples) or 1Proportion Test (with one sample).
See Hypothesis Test Flow Chart as a reference.
The sum of the expected frequencies is always equal to the sum of the observed frequencies. The chisquared statistic can be used to:
The GOF test compares the
frequency of occurrence from an observed sample to the expected
frequency from the hypothesized distribution.
As in all
hypothesis tests, craft a statement (without numbers) and use simple
terms for the team's understanding and then create the numerical or
statistical version of the problem statement.
The Degrees of Freedom = (# of Rows  1) * (# of Columns  1)
Two methods can be applied to test the hypotheses. The decision to reject the null (and infer the alternative hypothesis) if:
Using Excel to determine the pvalue is done by:
Pvalue = CHISQ.TEST(Observed Range, Expected Range)
Excel asks for the "Actual" range which is the "Observed" range.
See the table below of the Observed values, Expected Values, and each chisquare value calculated using the formula above. The sums of all of them is the total chisquare statistic.
If the Level of Confidence is 95% (alpha risk = 0.05), then the pvalue calculated above is > than 0.05.
The decision is to fail to reject the null hypothesis and to infer the null hypothesis. There is insufficient evidence that results are not due to random chance.
Tests the hypothesis of independence between two attribute variables. The test does not require an assumption of normality.
dF = (# of Rows  1) * (# of Columns  1)
As in all hypothesis tests, craft a statement (without numbers) and use simple terms for the team's understanding and then create the numerical or statistical version of the problem statement.
Translate the statisticalal results into the practical result.
Create the table of Observed values and create the table of Expected values.
Creating a table helps visualize the values and ensure each condition is calculated correctly and then the sum of those is equal to actual chisquare calculated value.
This also helps to understand how difference observed values can affect the results of the test. In the example above, a small change in one or a couple of the data points (a higher separation between the observed and expected value) can create a pvalue less than 0.05.
Calculate Expected values for each condition (fe).
fe = (row total * column total) / grand total.
The
chisquare calculated value is compared to the chisquare critical
value depending on the Confidence Level desired (usually 95% which is
alpharisk of 5% or 0.05)
Notice that the example begins with the table to help visually explain the problem and makes it easier to follow the problem solving process.
Return to BASIC STATISTICS
Return to the ANALYZE phase
Search for more material at the SixSigmaMaterial store
Return to the SixSigmaMaterial home page
Six Sigma
Six Sigma Modules
Green Belt Program (1,000+ Slides)
Basic Statistics
SPC
Process Mapping
Capability Studies
MSA
Cause & Effect Matrix
FMEA
Multivariate Analysis
Central Limit Theorem
Confidence Intervals
Hypothesis Testing
T Tests
1Way Anova Test
ChiSquare Test
Correlation and Regression
Control Plan
Kaizen
Error Proofing