# Chi-square distribution

The Chi-square distribution is a measure of difference between actual (observed) counts and expected counts.

It is most often for hypothesis tests (such as when there are >2 samples and comparing proportions) and in determining confidence intervals (such as the confidence interval for the standard deviation).

1. Chi-square test for independence in a "Row x Column" contingency table.
2. Chi-square test to determine if the standard deviation of a population is equal to a specified value.

Unlike the normal distribution, the chi-square distribution is not symmetric. Separate tables exist for the upper and lower tails of the distribution.

This statistical test can be used to examine the hypothesis of independence between two attribute variables and determine if the attribute variables are related and fit a certain probability distribution. ## Assumptions

• Chi-square is the underlying distribution for these tests
• Attribute data (X data and Y data are attribute)
• Observations must be independent
• Works best with > 5 observations

Ideally used when comparing more than two samples otherwise use the 2-Proportions Test (with two samples) or 1-Proportion Test (with one sample).

See Hypothesis Test Flow Chart as a reference.

### Formula

The sum of the expected frequencies is always equal to the sum of the observed frequencies. The chi-squared statistic can be used to:

1. Test if a distribution is a good fit for population (Goodness-of-Fit)
2. Test association of two attribute variables (Test for Independence) ### Possible Applications

• Test to see if a particular region of the country is an important factor in the number wins for a baseball team
• Determine if the number of injuries among a few facilities is different
• Determine if a coin or a set of dice is biased or fair

### Goodness of Fit (GOF) Hypothesis Test

The GOF test compares the frequency of occurrence from an observed sample to the expected frequency from the hypothesized distribution.

As in all hypothesis tests, craft a statement (without numbers) and use simple terms for the team's understanding and then create the numerical or statistical version of the problem statement.

• State the practical problem
• State the statistical problem
• Develop null and alternative hypotheses
• Create table of observed and expected frequencies
• Calculate the test statistic or p-value

The Degrees of Freedom = (# of Rows - 1) * (# of Columns - 1)

Two methods can be applied to test the hypotheses. The decision to reject the null (and infer the alternative hypothesis) if:

1. Calculate the critical value the chi-squared test statistic and reject the null hypothesis if the calculated value is GREATER THAN the critical value

OR
2. Reject the null hypothesis if the p-value is LESS THAN the alpha-risk. For a Confidence Level of 95% the alpha-risk = 5% or 0.05.

## Chi square in Excel

Using Excel to determine the p-value is done by:

P-value = CHISQ.TEST(Observed Range, Expected Range)

Excel asks for the "Actual" range which is the "Observed" range.

See the table below of the Observed values, Expected Values, and each chi-square value calculated using the formula above. The sums of all of them is the total chi-square statistic. If the Level of Confidence is 95% (alpha risk = 0.05), then the p-value calculated above is > than 0.05.

The decision is to fail to reject the null hypothesis and to infer the null hypothesis. There is insufficient evidence that results are not due to random chance.

### Test for Independence

Tests the hypothesis of independence between two attribute variables. The test does not require an assumption of normality.

dF = (# of Rows - 1) * (# of Columns - 1)

As in all hypothesis tests, craft a statement (without numbers) and use simple terms for the team's understanding and then create the numerical or statistical version of the problem statement.

• State the practical problem - Is Y variable independent of the X variable
• State the statistical problem
• Develop null and alternative hypotheses.
• Ho: Y is INDEPENDENT of X (no difference)
• Ha: Y is DEPENDENT of X and at least on combination is different
• Create table of observed and expected frequencies
• Calculate the test statistic or p-value

Translate the statisticalal results into the practical result.

### Observed and Expected Values of Attribute Data

Create the table of Observed values and create the table of Expected values.

Creating a table helps visualize the values and ensure each condition is calculated correctly and then the sum of those is equal to actual chi-square calculated value.

This also helps to understand how difference observed values can affect the results of the test. In the example above, a small change in one or a couple of the data points (a higher separation between the observed and expected value) can create a p-value less than 0.05.

Calculate Expected values for each condition (fe).

fe = (row total * column total) / grand total.

The chi-square calculated value is compared to the chi-square critical value depending on the Confidence Level desired (usually 95% which is alpha-risk of 5% or 0.05)

### Example Test for Independence

Notice that the example begins with the table to help visually explain the problem and makes it easier to follow the problem solving process.

Chi-Square Table

### Chi-square "Independence" Test on TI-84 Calculator

#### Six Sigma Online Courses

Six Sigma

Templates & Calculators Six Sigma Modules

Green Belt Program (1,000+ Slides)

Basic Statistics

SPC

Process Mapping

Capability Studies

MSA

Cause & Effect Matrix

FMEA

Multivariate Analysis

Central Limit Theorem

Confidence Intervals

Hypothesis Testing

T Tests

1-Way Anova Test

Chi-Square Test

Correlation and Regression

Control Plan

Kaizen

Error Proofing