Histograms

Components of a Histogram

Components of a Histogram
  • Each vertical bar represents an interval of data or a category of data
  • The x-axis are the measurements
  • The y-axis is the frequency
  • All bars are adjacent and will not overlap since they represent a certain interval (group) of measurements at a specified frequency.


The histogram, when made up of normally distributed data, will form a "bell" curve when a smooth probability density function is produced using kernel smoothing techniques. This line that generalizes the histogram appears to look like a bell.

Often the more data being analyzed and with more resolution will create more bars since more intervals or categories of data are available to plot. The more measurements at various frequencies will create more bars and fill up more of the area under the probability density function.

To assess the data there should be at least 5 bars or intervals and at least 30 data points.

There are a variety of histograms with some explained below. This is also a useful visual tool to depict the skewness and kurtosis of a distribution. 



Left-Skewed Distribution (Negatively Skewed):

These histograms have the curve on the right side or the most common values on the right side of the distribution. The data extends much farther out to the left side. These distributions are common where there is an upper specification limit or it is not possible to exceed an upper value, also known as boundary limit. This may occur if a customer has requested the process run at towards the upper specification limit as opposed to targeting the mean.

The measure of central location is the median.

Mean < Median < Mode

Left Skewed Distribution


Right Skewed Distribution (Positively Skewed):

The distribution of the data reaches far out to the right side. This may be caused by a process having a lower boundary. Cost or time plots commonly exhibit this behavior.

The measure of central location is the median.

Mode < Median < Mean

If most common value is 10, the middle most value is 15, and the average of the data set is 20, then the distribution is right skewed.

Mode = 10
Median = 15
Mean = 20

Right Skewed Distribution


Bi-modal Distribution:

These histograms appear to have two or more (polymodal) behaviors occurring in one process and appear to have two points of central location. This can be caused by two sets of data being analyzed as one that are from different populations such as plotting the heights of females and males as one distribution.

Bi-Modal Distribution


Uniform Distribution:

The distribution is flat or not exhibiting much of a bell shape and has no appearance of a central location. This may occur when all values between a lower specification limit (LSL) and upper specification limit (USL) are weighted equally acceptable. In other words, values very close to the limits are as a good as a value in the middle.

Click here for more information on the Uniform Distribution.

Uniform Distribution


Normal Distribution:

Points are evenly distributed among a central value or location.

The mean is used to describe the central location of distribution. The median, mode, and mean are all close to the same value AND the Coefficient of Skewness is close to zero.

Click here for more information on the Normal Distribution.

Normal Distribution

Coefficient of Skewness

Karl Pearson is credited with developing the formula below to measure the Coefficient of Skewness. The formula compares the median with the standard deviation of the same distribution.

Coefficient of Skewness

If:

Sk > 0 then skewed right distribution
Sk = 0 then normal distribution
Sk < 0 then skewed left distribution




Return to BASIC STATISTICS

Return to the MEASURE phase


Find a career in Six Sigma

Return to the Six-Sigma-Material Home Page



 Site Membership
CLICK HERE


Six Sigma Green Belt Certification
Black Belt Certification

Six Sigma

Templates & Calculators


Six Sigma Modules

The following presentations are available to download.

Click Here

Green Belt Program (1,000+ Slides)

Basic Statistics

SPC

Process Mapping

Capability Studies

MSA

Cause & Effect Matrix

FMEA

Multivariate Analysis

Central Limit Theorem

Confidence Intervals

Hypothesis Testing

T Tests

1-Way Anova Test

Chi-Square Test

Correlation and Regression

SMED

Control Plan

Kaizen

Error Proofing


Advanced Statistics
in Excel

Advanced Statistics in Excel

Six Sigma & Lean Courses

Agile & Scrum Online Course