Data Collection


Documented procedure for standardized and efficient data collection of the process, collecting data that will be used to describe the Voice of the Process (VOP).


Ensure data collection is complete, realistic, and practical. Many times this can be costly and resisted by those involved. Strive to minimize the costs and impact to those involved while obtaining as much accurate data in reasonable amount of time.

STEP 1: When setting up the collection system it is important to collect data only once and minimize the burden on the operators, team, and the GB/BB. Ensure to capture all the families of variation (FOV) that should be analyzed or relevant.

A sample Families of Variation diagram is shown below.

Families of Variation

The entire amount of variation found in a set of data can be broken down to the variation from the Process + the variation from the Measurement System, which should be calculated from the MSA.


The %Study Variation in the MSA was found to be X, the Process Variation = 1 - X.

An FOV diagram starts drilling into the sources of Process Variation. These sources include:

  • Part-Part
  • Shift-Shift
  • Operator-Operator
  • Machine-Machine
  • Form-Form
  • Tool-Tool, etc.

Using statistical software, it is helpful to use the Multi-Vari charts to test and eliminate FOV's. The goal is to determine which FOV(s) are contributing the most variation to the process.

Another possibility is shown below. Each one of the families will contribute to the overall process performance. The combination (not necessarily the sum) of all their variances represents the overall process variance. These are all "short" term sources that make up the "long" term variation.

Another example of an FOV tree

Complete a simple but comprehensive form that elaborates on the data and collection plan. This may later be used as an attachment used in the
Control Plan when handing-off the project to the Process Owner.

Other items in addition to the FOV tree when completing the Data Collection Plan are:

  1. SIPOC
  2. CTQ Linkage
  3. High Level Process Map

Items to include:

  • What is the question the team is trying to answer?
  • Metrics being measured
  • Sampling strategy
  • Sample size (may need visit Power and Sample Size)
  • Where observations are collected
  • How the observations are measured, what devices and units?
  • How the observations are recorded
  • Recording frequency
  • Data Classification
  • Pictures, screenshots, macros, other attachments, reference documents, and other helpful information.

Review the types of data and necessary sample sizes (observations) needed to create control charts and hypothesis testing coming up later in the MEASURE and ANALYZE phases. Meeting or exceeding the minimum (without be too costly) can lead to better analysis and stronger decision making.

Data collected through automated methods often have the most accuracy and bias. Even as simple as them seem to collect data it is important to clearly identify the source, system, menus, files, folders, etc that the information lies within. Any macros and adjustments and such should be written out with clear step-by-step instructions.

These systems if not in place, can be timely to install in addition to being expensive. However, improving a data collection system can be a very successful part of the project improvement process in itself.

Manual methods require more instructions and training. Minimize the amount of people involved to reduce risk of introducing variation. The higher level of instruction and detail provided to the data collectors and appraisers (those collecting the measurements) will reduce the variation component contributed from the measurement system. This amount of error will be quantified and examine in the Gage R&R.

This is usually inexpensive to put in place and should be suited to fit exactly to what is needed. However, it can be costly in terms of being labor intensive and prone to recording errors and troubleshooting suspicious data.

Adding videotape and recordings are excellent ways to capture data and have the advantage of replay and removes uncertainty in what actually occurred.

Attribute Data Collection

Attribute data are fixed gauges that provide limited information but can be cheaper and quicker devices to obtain a decision that meets the Voice of the Customer. They are used to make decisions such as:

  • Pass / Fail
  • Go / No-go
  • In / Out
  • Hot / Cold
  • Good / Bad

...but they don't tell how good or how bad the measurement is relative to specification limits. Each decision is given the same weight but some GO decisions may be actually better than others so that is where more discrimination (or resolution) in the measurement system can help and that comes with variable data.

Types of attribute measurement devices are:

  • Plug Gauges
  • Gage Blocks
  • Flush Pin
  • Bench Gauges
  • Sight Gauges
  • Positional Gauges
  • Thread Gauges
  • Limit Length Gauges

Variable Data Collection

These measurement devices provide more information that attribute gauges and should be used to measure critical characteristics at a minimum. They provide a measured dimension.

Types of variable measurement devices are:

  • Hardness Tester (Mohs, File, Sonodur, Rockwell, Vickers, etc)
  • Calipers
  • Micrometers
  • Tensile Tester
  • Ruler
  • Bore Gauges
  • Indicator Gauges
  • Height Gauges
  • Amp Meter
  • Ohm Meter
  • Air gauging

Rational Subgrouping

Why is rational subgrouping important?

These represent small samples within the population that are obtained at similar settings (inputs or condition) over short period of time. In other words, instead of getting one data point on a short term setting, obtain 4-5 points and get a subgroup at that same setting and then move onto the next. This helps estimate the natural and common cause variation within the process. 

Individual data (I-MR) is acceptable to measure control; however, it usually means that more data points (longer period of time) are necessary to ensure that all the true process variation is captured. The subgroup size = 1 when using I-MR charts.

Sometimes this can be purposely controlled and other times you may have to recognize it within data. Often times, a Six Sigma Project Manager will be given some data with no idea on how it was collected. 

The data table below shows that 50 samples were collected and measured within 10 subgroups of 5 measurements each. The appraiser took five measurements at a particular moment (same tool, same operator, same machine, same short term time frame) and recorded measurements of the diameter of each part. This extra data (yes, it is more work and time) allows much stronger capability analysis rather than just collecting 10 data points, one reading from each part where a subgroup size = 1. 

The x-bar value for each subgroup is the average reading of each subgroup and the range is easily calculated for each subgroup as the difference between the maximum and minimum value within each subgroup. 

Rational Subgroups

From here you can use the data to manually create x-bar control charts and calculate control limits. The average of the averages is 27.9083.

If there was a known standard value for the diameter, then the Gauge Bias can also be determined by taking 27.9083 minus the Standard Value. Assume the Standard (Reference) is 27.9050 then:

Gauge Bias = 27.9083 - 27.9050 = 0.0033 (this value is used when assessing gauge bias during a Stability review as part of MSA)

Another Example: Here we discover subgroups within the data (a good thing).

A Black Belt (BB) is provided data (different data than used above) from the team and begins to assess control. Without understanding the data and how it was collected, the BB generates the following Individuals chart indicating the Miles Per Gallon (MPG) of a vehicle from 23 observations. 

The visual representation makes it clearer that there are likely subgroups within the data. The control chart appears to be out of control with a lot of special cause variation but there is likely a good explanation. 

The BB talks to the team and learns that the MPG were gathered at different slopes of terrain. The higher MPG readings were achieved on downhill slopes and vice versa. The data is more appropriately shown below. 

Subgroups found within Individuals data

This shows each subgroup being in control. There were short term shifts in the inputs or conditions. Assessing normality or capability  on the entire group of data is not meaningful since the inputs were purposely changed to gather data on different conditions. Therefore, this is not a "naturally" occurring process. There is going to be an appearance of "special cause" variation when in fact it is not. 

Try to break down the data into the subgroups and analyze the data for normality and capability of each subgroup.

The next important measurement for someone looking at this data could be to understand those incline and decline measurements for each subgroup and determine the correlation between MPG and angle of incline or decline.

The point is to look for subgroups within the data and this provide a plausible explanation of what initially appears to be special cause variation. 


Measurements are the basis for everything in quality systems for two primary reasons:

  1. To make an assessment or decision
  2. To measure process improvement or lack thereof

Without measurement there can not be objective proof or statistical evidence of process control, shift, or improvement. It is critical to get the most informative data that is practical and time permitting.

This data is required to understand the measurement system variation (done via MSA) and the process variation.

Once the MSA is concluded (and hopefully passed) the remaining variation is due to the process. The Six Sigma team should focus on reducing and controlling the process variation. However, sometimes fixing the measurement system itself can be a Six Sigma project if the potential is significant and the current system is so poor that is prohibits process capability analysis.

Gauges require care and regular calibration to ensure the tolerances are maintained and ensure the amount of variation they are contributing to the total variation is constant and not changing or that could affect decisions being made about the process variation (that are not actually occurring since the variation changes are stemming from the measurement devices).

Some gauges require special storage or have standard operating procedures for themselves. Any time there is suspected damage or a unique event that will use a device to a higher than usual degree,  then a calibration should be done. Such as in a physical inventory event, all weigh scales should be calibrated prior to the event. All personnel should be trained on the same procedure and understand how the scales work.

Spend the time up front, to remove the sources of variation to get the most meaningful data that has sources of variation from only the process and not the measurement system. Try to capture the data using rational subgrouping across entire spectrum of sources of process variability.

Return to DEFINE Phase

Find Six-Sigma-Material related materials

Search for active Six Sigma related job openings

Return to Six-Sigma-Material Home Page

 Site Membership

Six Sigma Green Belt Certification
Black Belt Certification

Six Sigma

Templates & Calculators

Six Sigma Modules

The following presentations are available to download.

Click Here

Green Belt Program (1,000+ Slides)

Basic Statistics


Process Mapping

Capability Studies


Cause & Effect Matrix


Multivariate Analysis

Central Limit Theorem

Confidence Intervals

Hypothesis Testing

T Tests

1-Way Anova Test

Chi-Square Test

Correlation and Regression


Control Plan


Error Proofing

Statistics in Excel

Six Sigma & Lean Courses

Agile & Scrum Online Course