Mean time between failures

Mean time between failures (MTBF) is the mean (average) time between failures of a system, and is often attributed to the "useful life" of the device i.e. not including 'infant mortality' or 'end of life'. Calculations of MTBF assume that a system is "renewed", i.e. fixed, after each failure, and then returned to service immediately after failure. The average time between failing and being returned to service is termed mean down time (MDT) or mean time to repair (MTTR).

Overview



 * $$\text{Mean time between failures}

= \text{MTBF} =\frac{\Sigma{(\text{downtime} - \text{uptime})}}\text{number of failures}. \!$$

Formal definition of MTBF
The MTBF is the sum of the MTTF (mean time to failure) and MTTR (mean time to repair). The MTTF is simply the reciprocal of the failure rate,


 * $$\text{MTTF} = \frac{1}{\lambda}. \!$$

The MTTF is often denoted by the symbol $$\! \theta$$, or


 * $$\text{MTTF} = \theta. \!$$

Since failure rate and MTTF are simply reciprocals, both notations are found in the literature, depending on which notation is most convenient for the application.

The MTTF can be defined in terms of the expected value of the failure density function f(t)


 * $$\text{MTTF} = \int_{0}^{\infty} tf(t)\, dt \!$$

with


 * $$\int_{0}^{\infty} f(t)\, dt=1. \!$$

The MTTR can be similarly derived from the repair rate.

A common misconception about the MTBF is that it specifies the time (on average) when the probability of failure equals the probabiliity of not having a failure. This is only true for certain symmetric distributions. In many cases, such as the (non-symmetric) exponential distribution, this is not the case. In particular, for an exponential failure distribution, the probability that an item will fail after a MTBF is approximately 0.63. For typical distributions with some variance, MTBF only represents a top-level aggregate statistic, and thus is not suitable for predicting specific time to failure, the uncertainty arising from the variability in the time-to-failure distribution.

On commercial product descriptions, the "MTTF lifetime" is the amount of time the product should last, assuming that it is used properly.

Variations of MTBF
There are many variations of MTBF, such as mean time between system aborts (MTBSA) or mean time between critical failures (MTBCF). Such nomenclature is used when it is desirable to differentiate among types of failures, such as critical and non-critical failures. For example, in an automobile, the failure of the FM radio does not prevent the primary operation of vehicle. Mean time to failure (MTTF) is sometimes used instead of MTBF in cases where a system is replaced after a failure, since MTBF denotes time between failures in a system which is repaired.

Problems with MTBF
As of 1995, the use of MTBF in the aeronautical industry (and others) has been called into question due to the inaccuracy of its application to real systems and the nature of the culture which it engenders. Many component MTBFs are given in databases, and often these values are very inaccurate.

This has led to the negative exponential distribution being used much more than it should have been. Some estimates say that only 40% of components have failure rates described by this. It has also been corrupted into the notion of an "acceptable" level of failures, which removes the desire to get to the root cause of a problem and take measures to erase it. The British Royal Air Force is looking at other methods to describe reliability, such as maintenance-free operating period (MFOP).

MTBF and life expectancy
MTBF is not to be confused with life expectancy. MTBF is an indication of reliability. A device (e.g. hard drive) with a MTBF of 100,000 hours is more reliable than one with a MTBF of 50,000. However this does not mean the 100,000 hours MTBF HD will last twice as long as the 50,000 MTBF HD. How long the HD will last is entirely dependent on its life expectancy. An 100,000 MTBF HD can have a life expectancy of 2 years while a 50,000 MTBF HD can have a life expectancy of 5 years yet the HD that's expected to break down after 2 years is still considered more reliable than the 5 years one. Using the 100,000 MTBF HD as an example and putting MTBF together with life expectancy, it means the HD system should on average fail once every 100,000 hours provided it is replaced every 2 years. Another way to look at this is, if there are 100,000 units of this drive and all of them are in use at the same time and any failed drive is put back in working order immediately after the failure, then 1 unit is expected to fail every hour (due to MTBF factor).

Mean time between failures versus mean operating time
Some confusions may arise when "operating time" is used as the parameter to obtain the MTBF calculation.



The graph could give an impression that


 * $$\text{MTBF} =\frac{\Sigma{(\text{downtime} - \text{uptime})}}\text{number of failures} =\frac{\Sigma{(\text{operating time})}}\text{number of failures}.$$

Example:



Assuming each the item is fully running 100% during the operating hours:


 * $$\text{Mean operating time} =\frac{168+72+168+168+120}{5} =\frac{696}{5} = 139.2\text{ hours}.$$

Correct MTBF calculation:


 * $$\text{MTBF} =\frac{72+120}{2+2} =\frac{192}{4} = 48\text{ hours}.$$

Wrong MTBF calculation:


 * $$\text{MTBF} =\frac{168+72+168+168+120}{0+2+0+0+2} =\frac{696}{4} = 174\text{ hours}.$$


 * (maximum "Mean Time" Between Failures will never exceed 168 hrs in one week)

For this case, there is no the actual MTBF for total 5 items (as the remaining data from 3 items are not available yet). In other word, there is only actual MTBF 48 hrs from 2 items.

Some industries may use certain formula to calculate "estimated MTBF" from all collected operating hours. This may include estimation of zero failure condition for the items that have not got any failures yet.