William Sealy Gosset is credited with first publishing the data of the test statistic and became known as the Student's t-distribution.
The t-test is generally used when:
The t-distribution bell curve gets flatter as the Degrees of Freedom (dF) decrease. Looking at it from the other perspective, as the dF increases, the number of samples (n) must be increasing thus the sample is becoming more representative of the population and the sample statistics approach the population parameters.
As these values come closer to one another the "z" calculation and "t" calculation get closer and closer to same value. The table below explains each test in more detail.
T-tests are very commonly used in Six Sigma as a hypothesis test for determining if:
The word different could be greater than, less than, or a certain value different than a target value. You can run statistical test in software usually be easily configuring the parameters to look for certain types of differences as were just mentioned.
For example, instead of just testing to see if one group mean is different than another, you can test to see if one group is a greater than the other and by a certain amount. You can get more information by adding more specific criteria to your test.
Before running test, try to visualize your data to get a better understanding of the projected outcome of expected result. Using tools such as Box Plot can provide a wealth of information.
Also, if the confidence interval contains the value of zero then insufficient evidence exist to suggest their is a statistical difference between the null and alternative hypotheses and accept the null.
Download this t-test module of >80 slides for a deeper dive and more examples. This is a very common testing route that Green Belts and Black Belts will encounter in a Six Sigma project.
There are a variety of other Six Sigma topics also available.
This test compares a sample to a known population mean, historical mean, or targeted mean. The population standard deviation is unknown and the data must satisfy normality assumptions.
n = sample size
Degrees of freedom (dF) = n-1
Most statistical software will allow a variety of options to be examined from how large a sample must be to detect a certain size difference given a desired level of Power (= 1 - Beta Risk). You can also select various levels of Significance or Alpha Risk.
For a given difference that you are interested in, the amount of samples required increases if you want to reduce Beta Risk (which seems logical). However, gathering more samples has a cost and that is the job of the GB/BB to balance getting the most info to get more Power and highest Confidence Level without too much cost or tying up too many resources.
The following example shows the step by step analysis of a One Sample t-test. The example uses a sample size of 51 so usually the z-test would be used but the result will be very similar.
The sample standard deviation would be the population standard deviation since the sample size is large enough (>30). Also the degrees of freedom (dF) are not applicable. The point to take away are the steps applied and the interpretation of the final results.
This test is used when comparing the means of:
1) Two random independent samples are drawn, n1 and n2
2) Each population exhibit normal distribution
3) Equal standard deviations assumed for each population
The degrees of freedom (dF) = n1 + n2 - 2
The overall length of a sample of a part running of two different machines is being evaluated. The hypothesis test is to determine if there is a difference between the overall lengths of the parts made of the two machines using 95% level of confidence.
Sample Size: 22 parts
Mean: 28.4 mm
Sample standard deviation: 3.4 mm
Sample Size: 20 parts
Mean: 27.6 mm
Sample standard deviation: 2.2 mm
dF = n1 + n2 - 2 = 22+20-2 = 40
Alpha-risk = 1-CI = 1-0.95 = 0.05
Establish the hypothesis test:
Null Hypothesis (HO): MeanA = MeanB
Alternative Hypothesis (HA): MeanA does not equal MeanB
This is two-tailed example since the direction (shorter or longer) is not relevant. All that is relevant is if there is a statistical difference or not.
Now, determine the range for which t-statistic and any values outside these ranges will result in rejecting the null and inferring the alternative hypothesis. Using the t-table below notice that:
-t(0.975) to t(0.975) with 40 dF equals a range of -2.021 to 2.021.
If the calculated t-value from our example falls within this range then accept the null hypothesis.
NOTE: The table below is a one-tailed table so use the column 0.025 that corresponds to 40 dF and include both the positive and negative value.
The display above is a common output of running a Two Sample t-test.
In this example, both sample exhibit normal behavior and it was assumed that the variance are equal and the dF = 20 + 25 - 2 - 43 The hypothesized difference is 0.
Assumed equal variances so the pooled standard deviation is used. The estimate for the difference is the difference from the OLD to NEW mean. With an alpha-risk of 5% (or CL of 955) the difference will be between 1.36 and 8.58.
The answer isn't as straightforward as one might hope. For the sake of keeping it simple and understanding there may be exceptions, generally you can assume equal variances unless:
Some statisticians suggest taking the more care and conservative approach and assume unequal variances all the time. This method covers for the worst case scenairo that the variance are truly unequal and only forgoes a minute amount of statistical power. In other words, sacrifice a little power to protect for the worst case.
Use this test when analyzing the samples of a BEFORE and AFTER situation and the number of samples must be the same. Also referred to as "pre-post" test and consist of two measurements taken on the same subjects such as machines, people, or process.
This option is selected to test the hypothesis of no difference between two variables. The data may consist of two measurements taken on the same machine (or subject) or one before and after measurement taken on a matched pair of subjects.
For example, if the Six Sigma team has implemented improvements from the IMPROVE phase they are expecting a favorable change to the outputs (Y). If the improvements had no effect the average difference between the measurements is equal to 0 and the null hypothesis is inferred.
If the team did a good job making improvements to address the critical inputs (X's) to the problem (Y's) that were causing the variation (and/or to shift the mean in unfavorable direction) then their should be a statistical difference and the alternative hypothesis should be inferred.
dF = n - 1
The "Sd" is the standard deviation of the difference in all of the samples. The data is recorded in pairs and each pair of data has a difference, d.
Another application may be to measure the weight or cholesterol levels of a group of people that are given a certain diet over a period of time.
The before data of each person (weight or cholesterol levels) are recorded to serve as the basis for the null hypothesis.
With the other variables controlled and maintained consistent for all people for the duration of the study, then the after measurements are taken.
The null hypothesis infers that there is not a significant difference in the weights (or cholesterol levels) of the patients.
HO: Meanbefore = Meanafter
Again, this test assumes the data set are normally distributed and continuous.
Practice Certification Question: Find the Degrees of Freedom if running a paired t test with samples of 15.
The answer is DF = n-1 = 15-1 = 14
If sample are 13 (n=13) and the alpha-risk is chosen to be 0.05, the critical t-value for a two tailed paired t test is what value?
The DF = 12 and since it is two-tailed you look at the column below that is 0.025 (the alpha-risk divided by two).
tcritical = 2.179
See the table below and see that a DF of 12 and alpha-risk of 0.025 = 2.179
The formula returns the probability associated with the Student's t-test.
T.Test (array1, array2, tails, type)
Each array (or data set) must have the same number of data points.
The "tails" represents the number of distribution tails to return:
1 = one-tailed distribution
2 = two-tailed distribution
The "type" represent the type of t-test.
2 = two sample equal variance
3 = two sample unequal variance
The Z test uses a set of data and test a point of interest. An example is shown below using Excel. This function returns the one-tailed probability.
The sigma value is optional. Use the population standard deviation if it is known. If not, the test defaults to the sample standard deviation.
Running the Z test at the mean of the data set returns a value of 0.5, or 50%.
Recall the sample sizes are generally >30 (the snapshot below uses fewer only to illustrate the data and formula within Excel within a reasonable amount of space) and there is a known population standard deviation.
The data below uses a point of interest for the hypothesized population mean of 105.
This corresponds to a Z test value of 0.272 indicating that there is a 27.2% chance that 105 is greater than the average of actual data set assuming data set meets normality assumptions.
The Z test (as shown in the example below) value represents the probability that the sample mean is greater than the actual value from the data set when the underlying population mean is μ0.
The Z-test value IS NOT the same as a z-score.
The z-score involves the Voice of the Customer and the USL, LSL specification limits.
Six Sigma projects should have a baseline z-score after the Measure phase is completed and before moving into Analyze. The final Z-score is also calculated after the Improve phase and the Control phase is about instituting actions to maintain the gains.
There other metrics such as RTY, NY, DPMO, PPM, RPN, can be used in Six Sigma projects as the Big "Y" but usually they can be converted to a z-score (except RPN which is used within the FMEA for risk analysis of failure modes).
A Green Belt wants to evaluate the output of a process before and after a set of changes were made to increase the productivity. The data acquired meets the assumption of normality. Which hypothesis test is best suited to determine if the changes actually improved the productivity?
A) Paired-t test
B) Two-sample t test
Six Sigma Modules
The following presentations are available to download
Green Belt Program 1,000+ Slides
Cause & Effect Matrix
Central Limit Theorem
1-Way Anova Test
Correlation and Regression