Failure rate

Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. It is often denoted by the Greek letter λ (lambda) and is important in reliability theory. In practice, the reciprocal rate MTBF is more commonly expressed and used for high quality components or systems.

Failure rate is usually time dependent, and an intuitive corollary is that both rates change over time versus the expected life cycle of a system. For example, as an automobile grows older, the failure rate in its fifth year of service may be many times greater than its failure rate during its first year of service&mdash;one simply does not expect to replace an exhaust pipe, overhaul the brakes, or have major power plant-transmission problems in a new vehicle. So in the special case when the likelihood of failure remains constant with respect to time (for example, in some product like a brick or protected steel beam), failure rate is simply the inverse of the mean time between failure (MTBF), expressed for example in hours per failure. MTBF is an important specification parameter in all aspects of high importance engineering design&mdash; such as naval architecture, aerospace engineering, automotive design, etc. &mdash;in short, any task where failure in a key part or of the whole of a system needs be minimized and severely curtailed, particularly where lives might be lost if such factors are not taken into account. These factors account for many safety and maintenance practices in engineering and industry practices and government regulations, such as how often certain inspections and overhauls are required on an aircraft. A similar ratio used in the transport industries, especially in railways and trucking is 'Mean Distance Between Failure', a variation which attempts to correlate actual loaded distances to similar reliability needs and practices. Failure rates and their projective manifestations are important factors in insurance, business, and regulation practices as well as fundamental to design of safe systems throughout a national or international economy.

Failure rate in the discrete sense
In words appearing in an experiment, the failure rate can be defined as


 * The total number of failures within an item population, divided by the total time expended by that population, during a particular measurement interval under stated conditions. (MacDiarmid, et al.)

Here failure rate $$\lambda (t)$$ can be thought of as the probability that a failure occurs in a specified interval, given no failure before time $$t$$. It can be defined with the aid of the reliability function or survival function $$R(t)$$, the probability of no failure before time $$t$$, as:


 * $$\lambda = \frac{R(t_1)-R(t_2)}{(t_2-t_1) \cdot R(t_1)}

= \frac{R(t)-R(t+\triangle t)}{\triangle t \cdot R(t)} \!$$

where $$t_1$$ (or $$t$$) and $$t_2$$ are respectively the beginning and ending of a specified interval of time spanning $$\Delta t$$. Note that this is a conditional probability, hence the $$R(t)$$ in the denominator.

Failure rate in the continuous sense
By calculating the failure rate for smaller and smaller intervals of time $$\scriptstyle\Delta t $$, the interval becomes infinitely small. This results in the hazard function, which is the instantaneous failure rate at any point in time:


 * $$h(t)=\lim_{\triangle t \to 0} \frac{R(t)-R(t+\triangle t)}{\triangle t \cdot R(t)}.$$

Continuous failure rate depends on a failure distribution, $$\scriptstyle F(t)$$, which is a cumulative distribution function that describes the probability of failure prior to time t,


 * $$P(\mathbf{t}\le t)=F(t)=1-R(t),\quad t\ge 0. \!$$

The failure distribution function is the integral of the failure density function, f(x),


 * $$F(t)=\int_{0}^{t} f(x)\, dx. \!$$

The hazard function can be defined now as


 * $$h(t)=\frac{f(t)}{R(t)}. \!$$

There are many failure distributions (see List of important probability distributions). A common failure distribution is the exponential failure distribution,


 * $$F(t)=\int_{0}^{t} \lambda e^{-\lambda x}\, dx = 1 - e^{-\lambda t}, \!$$

which is based on the exponential density function. For an exponential failure distribution the hazard rate is a constant with respect to time. For other distributions, such as a Weibull distribution or a log-normal distribution, the hazard function is not constant with respect to time.

Failure rate data
Failure rate data can be obtained in several ways. The most common means are:
 * Historical data about the device or system under consideration.
 * Many organizations maintain internal databases of failure information on the devices or systems that they produce, which can be used to calculate failure rates for those devices or systems. For new devices or systems, the historical data for similar devices or systems can serve as a useful estimate.


 * Government and commercial failure rate data.
 * Handbooks of failure rate data for various components are available from government and commercial sources. MIL-HDBK-217, Reliability Prediction of Electronic Equipment, is a military standard that provides failure rate data for many military electronic components. Several failure rate data sources are available commercially that focus on commercial components, including some non-electronic components.


 * Testing.
 * The most accurate source of data is to test samples of the actual devices or systems in order to generate failure data. This is often prohibitively expensive or impractical, so that the previous data sources are often used instead.

Units
Failure rates can be expressed using any measure of time, but hours is the most common unit in practice. Other units, such as miles, revolutions, etc., can also be used in place of "time" units.

Failure rates are often expressed in engineering notation as failures per million, or 106, especially for individual components, since their failure rates are often very low.

The Failures In Time (FIT) rate of a device is the number of failures that can be expected in one billion (109) hours of operation. This term is used particularly by the semiconductor industry.

Additivity
Under certain engineering assumptions, the failure rate for a complex system is simply the sum of the individual failure rates of its components, as long as the units are consistent, e.g. failures per million hours. This permits testing of individual components or subsystems, whose failure rates are then added to obtain the total system failure rate.

Example
Suppose it is desired to estimate the failure rate of a certain component. A test can be performed to estimate its failure rate. Ten identical components are each tested until they either fail or reach 1000 hours, at which time the test is terminated for that component. (The level of statistical confidence is not considered in this example.) The results are as follows:

Estimated failure rate is


 * $$\frac{6\text{ failures}}{7502\text{ hours}} = 0.0007998 \frac\text{failures}\text{hour} = 799.8 \times 10^{-6} \frac\text{failures}\text{hour}, $$

or 799.8 failures for every million hours of operation.

Print

 * Blanchard, Benjamin S. (1992), Logistics Engineering and Management, Fourth Ed., pp 26-32, Prentice-Hall, Inc., Englewood Cliffs, New Jersey.
 * Ebeling, Charles E., (1997), An Introduction to Reliability and Maintainability Engineering, pp 23-32, McGraw-Hill Companies, Inc., Boston.
 * Federal Standard 1037C
 * Kapur, K.C., and Lamberson, L.R., (1977), Reliability in Engineering Design, pp 8-30, John Wiley & Sons, New York.
 * Knowles, D.I.,(1995), Should We Move Away From "Acceptable Failure Rate", Communications in Reliability Maintainability and Supportability, Vol. 2, No. 1, P. 23, International RMS Committee, USA
 * MacDiarmid, Preston; Morris, Seymour; et al., (no date), Reliability Toolkit: Commercial Practices Edition, pp 35-39, Reliability Analysis Center and Rome Laboratory, Rome, New York.
 * Turner, T., Hockley, C., and Burdaky, R., (1997), The Customer Needs A Maintenance-Free Operating Period, 1997 Avionics Conference and Exhibition, No. 97-0819, P. 2.2, ERA Technology Ltd., Leatherhead, Surrey, UK

Online

 * Mondro, Mitchell J, (June 2002), "Approximation of Mean Time Between Failure When a System has Periodic Maintenance", IEEE Transactions on Reliability, v 51, no 2. (available from MITRE Corp.)
 * Reliability Prediction of Electronic Equipment, MIL-HDBK-217F(2), (DOD download site.)
 * Bathtub curve issues by ASQC.