MTBF

Facility Maintenance Metrics

Mean Time Between Failures (MTBF)

The Mean Time Between Failures (MTBF) is a metric used in a Total Productive Maintenance program which represents the average time between each failure. Its counterpart is the MTTR (mean time to repair) is discussed later.

FORMULA:

MTBF = (Total uptime) / (number of failures)

MTBF = 1 / Failure Rate

What exactly is a "failure"?

A complete stoppage is one obvious answer. Some may also consider a "failure" once the item or equipment experiences a slowdown or reduced performance from an ideal level, but doesn't actually stop the machine. Again, whatever the definition is for failure, it should be uniformly applied to all pieces of equipment.

What exactly is "uptime"?

Obviously when a machine is not running it is down, and not up. But also a machine running at a fraction of its intended performance is likely not acceptable to be considered "uptime". Whatever decision is made ensure it is applied consistently across all pieces of equipment.

Your company needs to decide on the definition for "failures" and "uptime" and apply it uniformly across all plants, divisions, etc.

NOTE:

There are some items that are not repairable but they are replaced. Such examples are light bulbs, switches, torn belts. In such cases, the term "Mean Time To Failure (MTTF)" is used.

There is also the debate of planned downtime. Robust TPM programs have planned downtime for maintenance and predictive tools may create planned replacements or repairs in effort to reduce unplanned downtime and variability in uptime performance.

Ideally, the higher the MTBF the better. However, it is likely to plateau at a certain point due to planned downtime and intended maintenance.

But don't give up there. Then the challenge becomes how to reduce the planned outages and get better life out of the components or items involved so these planned intervals can be expanded.

IMPORTANT:

The higher the value for MTBF, the higher the OEE. This leads to higher profitability and improved customer satisfaction. Be careful, pushing MTBF too high may lead to more problems, Let the data drive the setting for MTBF, do NOT adjust MTBF as a lever and expect the data to follow....that will lead to problems.

Mean Time To Repair (MTTR)

The Mean Time To Repair is the average time to make a repair after a failure. As with any metric it is important to clarify what exactly constitutes a failure and downtime vs. uptime.

"Uptime" at a significantly compromised rate of production due to poor maintenance is usually not acceptable. For instance a machine may be running but at a very slow rate due to maintenance issues and this type of behavior is not accounted for in MTTR or MTBF but it's obviously a problem (OEE will catch this). Allowing this to continue can show a better MTBF than the story in its entirety should show.

Mean Time To Repair = (Total downtime) / (number of failures)

The MTTR puts an emphasis on predictive and preventive maintenance. Some ideas to reduce the MTTR are better preparation, spare parts program, and the use of predictive maintenance tools.

Not all repairs are equal. What constitutes an acceptable repair?

This should be defined in the definition of a failure as well. The machine should not only be "up", but it should be up to a certain level of sustained performance before the time can be counted as "uptime".

The GB/BB should help (allow a team member to be the author) develop a Standard Operating Procedure or a Work Instruction to clearly define the variable and metrics. As part of the CONTROL phase this is the type of deliverable that would be expected from the Six Sigma Project Manager.

IMPORTANT:

The lower the value for MTTR the better. This should result in higher the OEE which correlates to higher profitability and improved customer satisfaction.

The goal of a predictive maintenance program is to increase OEE. The can be done by increasing the MTBF and reducing the MTTR.

Goal: Higher MTBF and Lower MTTR

MTBF and MTTR Example

MTBF Example

Given: Over a period of time the following information is available:

Total Production Time (PT): 1,240 minutes

Total Downtime (DT): 1.5 hours (watch the unit of measures)

Number of Failures (F): 25

Determine the MTBF:

The first step is to determine the Uptime (UT) which = PT - DT

Uptime (UT) = 1,240 minutes - 90 minutes = 1,150 minutes

MTBF = UT / F = 1,150 / 25 = 46 minutes

There is another method to represent MBTF which equate to the same result.

MTBF = 1 / Failure Rate

where

Failure Rate = the # of failures divided by the total uptime = F / UT

The Failure Rate = 25 / 1,150 minutes = 0.02174 Failures / Minute

The inverse of the Failure Rate = MTBF = 46 minutes

MTBF and MTTR Calculator

This calculator and more templates, calculators and statistical tables are available to help Project Managers. Click here for this calculator and others.

The downloads are in a .zip format. A extractor such as WinZip is required to unzip the package. Winzip can be downloaded for free here.

MTTR Example

Using the same information from above, determine the MTTR:

MTTR = Total Downtime / # of Failures

MTTR = 90 minutes / 25 failures

MTTR = 3.6 minutes

A machine is down on average 3.6 minutes to be repaired.

NOTES:

As a GB/BB, you should examine the data in its entirety. Perhaps the mean does not represent the measure of central tendency.

Examine every time interval between failure for MTBF. Each amount of time between each failure is one data point. For MTTR, analyze the amount of time it took for a repair. Each time to repair is one data point.

If the data set is normal, then apply the mean.
If the data set is not normal, then the median or mode may be more appropriate.

Just as important is look for outliers.

When studying the data you may find outliers such as a period of time that was unusually long or short between failures or repair times that were extremely quick or took unusually long. Perhaps the team can brainstorm the causes using the 5-WHY.

Was the repair done differently?
Was the repair done be a different person or group of people?
Was a different part(s) used?

This can shed light on best practices or components that should be used again for a closer Design of Experiments (DOE) to find the optimal combination or best procedure.

It may be worth spending a little more money up front to use quality parts or perform a longer PM to save more time in the long run. Perhaps, a minor increase in the MTTR equates in a significant increase in MTBF. The team will have to determine if this is acceptable.

Remember the goal of Six Sigma, is not just to shift the mean to a more favorable outcome but to make the performance more reliable and predictable.....in others words with minimal variation (consistency)!

IATF 16949: 2016

TPM has an increasing role in this international automotive standard found within Section 8.5.1.5. The intention is to strengthen the requirement for equipment maintenance and overall proactive management.

This standard involves tracking TPM and usually metrics such as OEE, MTBF, and MTTR are applied. The results of these metrics are inputs to the Management Review section, 9.3

Given that most of the automotive supply chain runs in a JIT environment, it is critical to have reliable and predictable machine performance to provide a consistent and reliable flow of parts.

How do MTBF and MTTR relate to OEE?

Recall that OEE is made up of the product of:

Performance * Availability * Quality

Availability is the amount of time the machine is available to run as scheduled.

Availability is the unit of time the machine is available to run divided by the total possible available time. This metric does not include any performance numbers relative to how the machine runs while it is running.

AVAILABILITY = Operating Time / Planned Production Time

A 30 minute scheduled interval to replace a belt is much better than a 40 minute unscheduled interval to replace a torn belt that could tear and rip apart an oil line or result in other unintended consequences.

Assuming the belt replacement has been studied and the proper interval for useful life has been predicted (in other words, not over-changing and spending too much money and time or excess belt replacements), then a scheduled event is obviously more predictable and favorable then hoping and not knowing when the next failure will take place.

A scheduled event such as a PM, break, safety meeting, Gemba walk, is NOT in the denominator and does not penalize the metric of AVAILABILITY.

Again, the team should also try to minimize these "planned" events to try and get the machine(s) more time to be utilized. But this affect Utilization which is different than the metric of AVAILABILITY (go to the OEE page to learn more).

As related to the metrics above:

AVAILABILITY = MTBF / (MTBF + MTTR)

for Planned Production Time

An unscheduled belt change would be in the figure of Planned Production Time; however, a scheduled period of downtime (again the schedule downtime should be minimal and strategically determined) would not be in this figure of Planned Production Time.

Return to the IMPROVE phase

Templates, Tables, and Calculators

Search Six Sigma related job postings

Click here to access the entire site

Return to the Six-Sigma-Material Home Page

ShareFacebook X WhatsApp Reddit