The F-Test used in the hypothesis testing of variances (not means) as in ANOVA.

The F-Test assumes a normally distribution, as well as Bartlett's Test. The samples should exhibit normal conditions within each set of experiments (or trials). 

Levene's Test is similar, but is used when analyzing data that is not from (or can not be assumed) a normal distribution. This test can be used for any continuous distribution to compare variances.

The value of F represents the ratio of two variances, and comparing the F-test value to the F-critical value is used to make a decision on the null hypothesis.

It is used to compare:

  • 1 Sample Variance to a Target
  • 2 Sample Variances
  • >2 Variances use ANOVA

In ANOVA, the value of F is the ratio of treatment variance to the error variance.

2-Sample Variance Comparison

Remember that is not acceptable to try and make a decision by simply looking at the data in numerical format to determine if there is a statistical difference (whether testing for difference in means, median, or variances).  Nor should a statistical decision be concluded based on a graph or visual model of the data such as a box plot.

These tools can provide a very good idea of the final result but a Six Sigma project manager must base conclusions from statistical results and provide the team members the practical results in terms they can best understand. 

Create a visual representation of the test and start with practical study and work your way though the statistical study. 

The two sets of data must be statistically independent.

There are a couple of methods to get a statistical conclusion:

1) Compare F observed value from the two samples to the F-critical value.


2)  Use the p-value. Reject the null and infer the alternative if the p-value < alpha risk. 

For the first option: 

In other words, the test is significant if the F-observed (calculated) value is greater than the F-critical value. The F-critical values can also be found in tables that have the most common values for alpha risk and degrees of freedom, dF

There is an example below of how to use the F-table.


The "F-observed" value is also referred to as the "F-calculated" value. 

Shown below is a set of BEFORE and AFTER data on Moving Range chart of a normally distributed data set both before and after. The visual indicators show that there is not likely a change in the variation but it must be statistically verified. 

Example of visual indicators

Another visual indicator to compare variances is done by comparing the overlap in charts below for the BEFORE and AFTER data. Usually if the dots are within each of the other's alpha-value confidence interval, there is likely not a statistical difference. 

When comparing two samples (such as above), between the Levene's Test and the F-test, the F-test is more dependent on the assumption of normality and is a more accurate test when the data is actually normal.

F-test results are used because both samples (subgroups) are normal.

Notice from above, the p-value of 0.202 is not less the 0.05 (or 95% confidence level) so therefore failed to reject the null hypothesis and therefore confirmed that there is not a difference in the variation BEFORE and AFTER.

We can not conclude with 95% confidence that the variance changed from BEFORE and AFTER. The same test can be used to compare variance between two machines, two operators, two plants, etc (assuming data is normal).

Keep in mind this is only testing the variances. This does not indicate whether there was a statistical change in the mean from BEFORE and AFTER. Use the paired t-test to test a change in means of the group BEFORE and AFTER.

When comparing >2 samples, Bartlett's Test is more dependent on the assumption of normality and is a more accurate test when the data is actually normal.

Using the F-table

Assume the alpha risk chosen is 0.05. The dF for the numerator is 15 and the denominator is 10. Therefore, the F-critical value = 2.85.

Other F-tables, t-tables, and Chi-squared tables are put together in one Excel file for members. Members will be able to download this file for reference upon logging in and going to the 'Member Offers' section.

Click here to learn more about becoming a member of a growing community and the options available. We are regularly adding new modules and free downloads for members. 

ANOVA example showing F values

The example below illustrates some uses and practical meaning of the F values within an ANOVA test. 

The results of a mock study where four appraisers were timed to make an inspection decision on a 13 widgets. 

Determine if there is a significant difference of means in two or more appraisers. 

All other criteria are equal.

Since TIME is the only factor, this is a One-Factor or One-Way ANOVA. There are four levels that are controlled in the experiment, one being each appraiser.

The first step is to create the test. In general, if the p-value is lower than the alpha-risk then the alternate hypothesis is inferred (reject the null).

Hypothesis Test:

Null Hypothesis: Population means of the different appraisers are equal.
Alternate Hypothesis: One of the means are not the same

There are a total of 51 Degrees of Freedom computed from (13*4) - 1.

Using a One-Way test with an alpha-risk of 0.05, the p-value is well above 0.05 at 0.847 (see results table below).

The F-statistic, and heavily overlapping confidence intervals are also evidence that there is no difference among any pairs or combinations of them.

It is concluded that there is not a statistical difference between any of the appraisers.

What if?

If the p-value was <0.05, then at least one group of data is different than at least one other group. It doesn't conclude which one...only states that at least one of the four is different than the others. 

The low F-statistic of 0.27 says the variation within the appraisers is greater than the variation between them. The F-critical value is 2.81. 

You can use the F-table above to get a close estimate of the F-critical value. One downfall with tables is sometimes you may not get a precise number since not every combination is shown. However, the table can provide a fairly good estimate and at least allow a decision to be very conclusive.

The numerator has 3 degrees of freedom and the denominator has 48 degrees of freedom. Using the table below shows that the F-critical value is going to be between 2.76 and 2.84. And in this case, both values are much higher than the F-calculated value of 0.27 so the conclusion is the same. 

As a Six Sigma project manager it may be worth re-running (depending on cost and time) the trial with a larger sample size and additional appraiser training to reduce the variation within each one.

The variation is fairly consistent among each of them so it appears there is a systemic issue that is causing nearly similar amounts of variation within each appraiser.

It is possible that one or a few of the widgets are creating the similar spread in the timing for each appraiser. You may examine the timing performance of each widget and run an ANOVA among the 13 widgets and see if one or more stands out. 

Epsilon-squared is the % of variation related to the Factor, which is the Appraiser. This is 4.84 / 291.69 = 0.01659 = 1.7%. This is a low value so it is possible that other Factors exist that are creating the variation. 

Return to the ANALYZE phase


Subscribe to access all pages within this site

Templates and Calculators

Return the Six-Sigma-Material Home Page

 Site Membership

Six Sigma Green Belt Certification
Black Belt Certification

Six Sigma

Templates & Calculators

Six Sigma Modules

The following presentations are available to download.

Click Here

Green Belt Program (1,000+ Slides)

Basic Statistics


Process Mapping

Capability Studies


Cause & Effect Matrix


Multivariate Analysis

Central Limit Theorem

Confidence Intervals

Hypothesis Testing

T Tests

1-Way Anova Test

Chi-Square Test

Correlation and Regression


Control Plan


Error Proofing

Statistics in Excel

Six Sigma & Lean Courses

Agile & Scrum Online Course