Pareto Diagram
Description:
A Pareto Diagram is used to prioritize direction and focus on the vital few instead of targeting all the categories. The graph is a column graph where all categories are normally on the x-axis and add up to represent 100% accumulation of all the categories. The left vertical axis is the number of instances in category. The vertical column on the right is the percentage that category contains out of all the instances.
Objective:
The Pareto principle states that in many cases 20% of the categories will represent 80% of the opportunity. This chart is used frequently and will be one of the most necessary tools to familiarize yourself with. It is also used in several phases and can be used in all phases.
Below is an example where a study was done to find primary dominant colors of SUV's on the road. A random sampling of the first 34 SUV's were recorded in the table on the left side.

It is important to become highly fluent and quick at creating a Pareto diagram. Learn to sort, filter, and use formulas in a data processing package is usually easier than attempting that work in the statistical software. Once familiar with these tips in a software package, it is just as easy to analyze a few data points as it is thousands.
Sometimes it is as simple as pasting the data into the statistical software program and it can take care of the rest.
Garbage IN - Garbage OUT (GIGO)
Large amounts of data often have errors that need to be cleaned up for consistency, such as spelling errors, decimal mislocation, long numbers are entered wrong and more. This affects sort and filter functions since they depend on exact replications and formats.
For clarity, lets say 1 of the 10 BLACK observations was mis-typed as BLCK. This will be analyzed as a separate category not knowing that it was meant to be BLACK. So BLCK will have a value of 1, and BLACK will have a value of 9.
Sorting the data above by incident is not always required by a statistical software program but for clarity it is done here to show its relationship into a column graph.

When there are many categories it may not be possible to view all of them and their values across the x-axis since there isn't enough room. Let's say you evaluated 100 different color of SUV's, it would take an very large graph to be able to read all the colors and values along the x-axis.
In order to simplify this and focus on the vital few which is the objective in the first place, the trivial many have their values summed up and put in a category called "OTHERS" and this is placed at the right side of the column graph. These can not be eliminated from the data set and they play a role in the percentage and cumulative analysis of each color compared to entire data set.
Shown below are two Pareto diagrams showing the same information in two formats. The one with the OTHERS category allow more visibility and readability on the vital few.
The add-on that a Pareto diagram has that a typical column chart does not is a cumulative line in the graph and numerical data on category contribution on the data set as individual and sum of the next.
Examine the BLACK category. There are 10 findings out of the 34 total SUV's analyzed. Therefore BLACK SUV's made up 29.4 % of the total.
The 80/20 Principle
Adding up the top three categories, BLACK 10, WHITE 8, and GREEN 7, equals 25. So 25 of the 34 total observations are in the top three catories. This means 73.5% of the color were in 25% of the categories (there are 12 total categories or colors.
The Pareto principle suggests 80%/20%, the above example is close to the 80/20 principle (73.5%/27.5%).
Many statistical software programs will provide a final Pareto analysis that looks similar to these one below.
Many times there will be a second y-axis labeled on the right side of the graph that indicates the % contribution as a cumulation of the category values.
The red line sums the categories until it reaches 100%, notice the red line tops out at 34.
Doing all of these calculations a couple times on your own will solidify the meaning and interpretation of these diagrams.
The output of the tool is getting the team to target their improvements on the most important categories, those on the left side of the chart with the highest or most instances.
With this example, the team may conclude that BLACK, WHITE and GREEN SUV's are in higher demand than others.
Moreover, as it will also be covered in the MEASURE phase, the data and the method of categorization must be accurate or this could provide misleading information. Sometimes, these diagrams are used to determine new projects and help write contracts with refined scopes.
Many software package can quickly generate Pareto diagrams with little difficulty that provide all of this information. Understanding this diagram, the limitations, and interpreting this information is the job of the GB/BB for the team.
Return to the DEFINE phase
Return to the DMAIC project starting page
Downloads available for your Six Sigma project
Return to the Six-Sigma-Material Home Page from Pareto Diagram
|