**Description:**

Proper data classification is the first step to ensure the correct statistical tools are used to analyze baseline and final performance.**Objective:**

Six Sigma projects can start out with the wrong baseline sigma scores or control charts as a result of improper data classification. The goal is not only selecting the correct data type but to collect data that provides the most information at the least expense.

DISCRETE DISTRIBUTIONS:

CONTINUOUS DISTRIBUTIONS:

- Uniform Distribution
- Normal Distribution
- Exponential Distribution
- t Distribution
- Chi-square Distribution
- F Distribution

**Continuous Data**

Theoretically has an infinite number of measurements depending on the resolution of the measurement system. There are no limits to the gaps between the measurements. It is data that can be expressed on an infinitely divisible scale.

Even if the measurements range from 0-1 there may be an infinite number of measurements within (0.000000000000... to 0.999999999999...)

The continuous random variables can be any of the infinite number of values over a given interval. These variables generally represent things that are measured, NOT counted.

Examples are:

- Temperature
- Height
- Money
- Weight
- Pressure
- Force
- Lumens
- Hardness
- Length
- Decibels
- Time
- Ohms
- Watts
- Amperage
- Voltage
- Torque
- Tension
- Distance
- Volume
- Area
- Tensile Strength

**Discrete Data:**

Data types that have a finite number of measurements and are based on counts. Data that can be sorted into distinct, countable, and in completely separate categories. The count value can not be divided further on an infinite scale with meaning.**Example:** How many people can comfortably fit into an airplane? It doesn't make sense to say 129.7632213 people. It is either 129 or 130, in this case you would round down to 129. Attribute and discrete do not mean exactly the same when describing data, discrete has more than two outcomes.

- rating 1-10 (whole numbers with 1 being LOWEST - 10 being HIGHEST)
- ratings provided on a FMEA for Severity, Occurrence, and Detection
- color designation
- gender
- rating 1-10
- race
- # of defects on an order form or in a batch of parts
- political party affiliation
- types of defects on an order form
- number of late deliveries

**Attribute Data:**

Used to represent the presence or lack of a certain characteristic. A binomial measurement has two characteristics. This is the lowest level of data type due to low level of information provided.

An attribute data measurement systems analysis (MSA) compares how often each appraiser repeats his/her own answer each time analyzing the same unit/part and how often the answer matches an known or master answer (when one exists), and how often the appraiser response reproduces the other appraiser responses.

- go/no-go
- pass/fail
- on/off
- correct/incorrect
- full/empty
- hot/cold
- small/big
- paper/plastic

The measures of central tendency are the mean, variance, and standard deviation.

The mean is the expected value over a long run of occurrences. The standard deviation is simple the square root of the variance. The mean is the most commonly used measure of central tendency. It is the measurement used when analyzing a normal distribution of data.

**Locational Data:**

Data that uses concentration charts and answers the question “where”. Such as the concentrations of home foreclosures by regions in the United States.

COMPARING DATA CLASSIFICATION TYPES

- Continuous data is more precise than discrete data.
- Continuous data provides more informative than discrete data.
- Continuous data can remove estimation and rounding of measurements.
- Continuous data often more time consuming to obtain.

**NOTE:**
Convert to continuous data when possible as shown in the table below
are a few examples to obtain a higher level of information and detail:

**Example One:**

Instead of a shipment being late or on time, it is
better to know how late of how early the shipment arrived from the due
date. It may be acceptable if shipments are within +/- 2 days, but the
best score is given when shipments arrive at due date or +1 day.

Recording YES/NO for on-time delivery will not provide the level of
detail to make the best decisions.**Example Two:**

Instead of recording just the dollars or pieces scrapped, it is more valuable to know the scrap per unit or scrap per sales.

If
Plant A had molding scrap cost of $63,000/month and Plant B scraps
$48,000/month, which performed better? With a denominator such as sales
dollars a better conclusion can be made.

If Plant A had $1,000,000 in sales in the same month, and Plant B had $50,000 in sales in the month, it is obvious that Plant B scrapped a much higher percentage of its product.

__1. Nominal Data:__

The **lowest
level** of data classification. A numerical label that represents a
qualitative description. These numbers are labels or assignments of
numbers that represent a category or classification. This is also
referred to a categorical data usually of more than two categories and
is a form of discrete data and should apply nonparametric test to
analyze. The number assignment does not reflect that one category is
better or worse than another.

- Political Party Affiliation

1 = Independent

2 = Democratic

3 = Republican - Gender

1 = Male

2 = Female - Geographical Location

1 = Midwest

2 = South

3 = Northeast

4 = East coast - Marital Status

1 = Single

2 = Married

3 = Divorced

4 = East coast

The __mode__ is the measure of central tendency

Other
types of variables that often result in nominal data are religion, zip
code numbers, birth dates, telephone numbers, federal tax ID number,
ethnicity, and social security numbers. There limited statistical
techniques to analyze this type of data, but chi-square statistic is
most common.

The average of the data or variance of the data is
meaningless and values and quantitative descriptions are not
appropriate. There is also no priority or rank based on these numbers.__2. Ordinal Data:__

The
next level higher of data classification than nominal data. Numerical
data where number is assigned to represent a qualitative description
similar to nominal data. These are measures by only the rank order.

However, these numbers can be arranged to represent worst to best or vice-versa. Ordinal data is a form of discrete data and should apply non-parametric test to analyze.

- Ratings provided on a FMEA for Severity, Occurrence, and Detection

DETECTION

1 = detectable every time

5 = detectable about 50% of the time

10 = not detectable at all

(All whole numbers from 1 - 10 represent levels of detection capability that are provided by team, customer, standards, or law) - Classifying households as low income, middle-income, and high income
- Master Black, Black Belt, Green Belt, Yellow Belt, etc.
- Lower Class, Middle Class, Upper Class

Nominal and ordinal data are from imprecise measurements and are referred to as non metric data, sometime referred to as qualitative data.

The __median or mode__ are measures of central tendency.

Ordinal data is sorted into categories and the categories can be put in a logical order but the intervals between categories is not defined.

Ordinal data is also round when ranking sports teams, ranking the best cities to live, most popular beaches, and survey questionnaires.

__3. Interval Data:__

The
next higher level of data classification. Numerical data where the data
can be arranged in a order and the differences between the values are
meaningful but not necessarily a zero point. These are measures using equal intervals.

Interval data can be both
continuous and discrete. Zero degrees Fahrenheit does not mean it is the
lowest point on the scale, it is just another point on the scale.

The lowest appropriate level for the __mean__ is interval data.

Parametric AND nonparametric statistical techniques can be used to analyze interval data.

Examples in temperature readings, percentage change in performance of machine, and dollar change in price of oil/gallon.__4. Ratio Data:__

Similar
to interval data EXCEPT has a defined absolute zero point and is the **
highest level** of data measurement. Ratio data can be both continuous and
discrete.

Ratio level data has the highest level of usage and can be analyzed in more ways than the other three types of data.

Interval data and ratio data are considered metric data, also called quantitative data.

Examples
include time, income, volume, weight, voltage, height, pieces/hour, force,
defects per million opportunities, resistance, watts, per capita income, items sold, years of education, and lumens.

Return to the MEASURE phase

Link to newest Six Sigma Material

Templates, Tables, and Calculators

Return to Six-Sigma-Material Home Page

Custom Search

**Six Sigma**

**Templates, Tables & Calculators**

**Six Sigma Certification**

**Six Sigma** Modules

*Green Belt Program (1,000+ Slides)*

*Basic Statistics*

*SPC*

*Process Mapping*

*Capability Studies*

*MSA*

*Cause & Effect Matrix*

*FMEA*

*Multivariate Analysis*

*Central Limit Theorem*

*Confidence Intervals*

*Hypothesis Testing*

*T Tests*

*1-Way Anova Test*

*Chi-Square Test*

*Correlation and Regression*

*Control Plan*

*Kaizen*

*Error Proofing*