General Solutions for MTTF and Steady-State
Availability of NMR Systems
Moslem Amiri, V´aclav Pˇrenosil
Abstract—Voting redundancy is a well-known technique to
improve the fault tolerance of digital systems. Calculation of
reliability and availability is a necessary part of every fault
tolerant system design. Conventional methods, including block
diagram method, are unsuccessful when there is dependent
failure, repair, or standby operation in the system. Use of Markov
models also almost breaks down when system is composed
of many elements or is repairable. However, the alternatives
to reliability and availability, i.e. mean time to failure and
steady state availability, respectively, are easily measurable using
Markov graphs. In this paper, novel general form solutions for
calculation of mean time to failure, and steady-state availability
of N-modular redundancy systems are presented.
Keywords—NMR systems, Mean time to failure, Steady state
availability, Markov graphs, General solutions.
I. INTRODUCTION
VOTING redundancy is a well-known technique to improve
the fault tolerance of digital systems. Although
parallel and standby systems are also used for this purpose,
the complexity involved with designing coupler in parallel
and switch in standby systems makes the efﬁcient implementation
of them difﬁcult. Applied to digital systems, voting
redundancy takes advantage of the digital nature of elements’
outputs to remove some of these problems.
Implementation of voting redundancy concept, called Nmodular
redundancy (NMR), is shown in Fig.1. This system
consists of 2n+1 parallel digital circuits, all having equivalent
logic (generally, identical digital circuits) with the same input
applied to all elements. The outputs of the 2n + 1 circuits
are compared by the voter and the majority is given as the
system output. The basic NMR system is called triple modular
redundancy (TMR) with only three parallel digital elements
(n = 1). TMR is the most common implementation of majority
voting systems due to its lower cost. However, as the cost of
digital circuits are reduced, NMR systems with higher number
of replicated digital components gain popularity to increase the
fault tolerance of systems further.
When designing a fault tolerant system, several features
need to be evaluated and a trade-off among them is required.
These features include cost, weight, volume, reliability, and
availability. Reliability is the probability of no failure in a
given operating period, while availability is the probability that
the system is up at any point in time. Availability is generally
measured to ﬁnd the effect of repair on a system. Calculation
Manuscript received March 31, 2014. This work was supported by Technology
Agency of the Czech Republic under contract No. TA01011383/2011.
M. Amiri and V. Pˇrenosil are with the Faculty of Informatics,
Masaryk University, Brno, Czech Republic (e-mail: amiri@mail.muni.cz;
prenosil@ﬁ.muni.cz).
Digital circuit 1
Digital circuit 2
Digital circuit 2n+1
Voter
(0, 1) (0, 1)
Fig. 1. Majority voting.
of reliability and availability is a necessary part of any fault
tolerant design process.
Ideally, we would prefer to ﬁnd general solutions for
reliability and availability of NMR systems, but mathematically
this is not achievable. Two alternatives, however, are
commonly used in fault-tolerant systems design; mean time
to failure and steady state availability. In this paper, we will
seek general solutions for these features of repairable and nonrepairable
NMR systems. Throughout this article, we assume
that the parallel digital components are identical and have the
same constant failure rate, λ, and that the voter does not fail.
II. RELIABILITY AND MEAN TIME TO FAILURE (MTTF)
NMR systems were ﬁrst introduced and discussed in the
1960s [1] [2]. For a system consisting of 2n+1 parallel digital
circuits and a perfect voter, the reliability without repair is
R =
2n+1
i=n+1
B(i : 2n + 1) =
2n+1
i=n+1
2n + 1
i
pi
(1 − p)2n+1−i
(1)
where p is the success probability (reliability) of any digital
circuit. Eq. 1 is in fact the reliability expression for an ”n + 1
out of 2n + 1” system. If the system is repairable, Markov
models are generally used to compute various features of the
system, including reliability. To compute reliability of a system
with n states, formulation of Markov models always leads to
n ﬁrst-order differential equations. Repair rates couple these
equations, creating an nth order differential equation. Even
if Laplace transform theory is applied, the solution involves
ﬁnding the roots of an nth-order polynomial. For n ≥ 5, it is
proven that there is no closed-form solutions [3].
Although reliability is obtained only if the inverse Laplace
transform is done, it is still possible to extract partial information
from the transformed equations in order to dispose
of inverse transform operations. This easily obtained feature
Zero failures
0s
t
1s 2s nstn  )12( tn )2( tn  )12(
tt
tn  )2(
t
1nstn  )1(
tn  )12(1 tn  )2(1  tn  ))12((1  tn  ))1((1  1
One failure Two failures n failures n + 1 or more
failures
Fig. 2. A Markov reliability model for a 2n+1-level majority voting system
with repair.
is MTTF. MTTF can be used as an alternative to reliability
when several systems should be compared.
The Markov reliability diagram for an NMR system, composed
of 2n + 1 digital components, is given in Fig. 2. In this
Figure, state sn+1 is an absorbing state; this state represents
n + 1 or more failures. Also in moving from one state to
the next, from left to right, the coefﬁcients of λ express the
number of possible ways to experience a single failure. The
repair rate, µ, is always the same, meaning a single repairman
is present for repair.
The relationship between MTTF and reliability is as fol-
lows:
MTTF =
∞
0
R(t)dt = lim
t→∞
t
0
R(τ)dτ (2)
Knowing
L
t
0
R(τ)dτ =
R (s)
s
(3)
and
L lim
t→∞
f(t) = lim
s→0
sF(s) (4)
then MTTF can be calculated as:
MTTF = lim
s→0
sL
t
0
R(τ)dτ = lim
s→0
s
R (s)
s
= lim
s→0
R (s)
(5)
For the Markov model of Fig. 2, the following time-domain
differential equations are obtained:
dPs0
(t)
dt
= −(2n + 1)λPs0
(t) + µPs1
(t)
dPs1 (t)
dt
= (2n + 1)λPs0
(t) − (2nλ + µ)Ps1
(t) + µPs2
(t)
dPs2
(t)
dt
= (2n)λPs1 (t) − ((2n − 1)λ + µ)Ps2 (t) + µPs3 (t)
· · ·
dPsn
(t)
dt
= (n + 2)λPs(n−1)
(t) − ((n + 1)λ + µ)Psn
(t)
dPs(n+1)
(t)
dt
= (n + 1)λPsn (t) + Ps(n+1)
(t)
(6)
If the system is initially good, then
Psk
(t = 0) =
1, if k = 0.
0, otherwise.
(7)
After taking the Laplace transforms of the Eqs. 6, and incorporating
the initial conditions (Eq. 7), and letting s approach
0, the transformed equations will be as follows:
−(2n + 1)λPs0 (s) + µPs1 (s) = −1
(2n + 1)λPs0 (s) − (2nλ + µ)Ps1 (s) + µPs2 (s) = 0
(2n)λPs1 (s) − ((2n − 1)λ + µ)Ps2 (s) + µPs3 (s) = 0
· · ·
(n + 2)λPs(n−1)
(s) − ((n + 1)λ + µ)Psn (s) = 0
(n + 1)λPs(n)
(s) + Ps(n+1)
(s) = 0
(8)
Solving Eqs. 8 for the state probabilities in Laplace domain
will result in the following general equations:
Psk
(s) =
µPs(k+1)
(s) + 1
(2n + 1 − k)λ
if 0 ≤ k ≤ n − 1 (9)
and
Psn
(s) =
1
(n + 1)λ
(10)
Manipulating Eqs. 9 and 10, for 0 ≤ k ≤ n − 1:
Psk
(s) =
µn−k
+ λ
n−k−1
i=0 µn−k−1−i
λi i
j=0(n + 1 + j)
λn−k+1 n
i=k(2n + 1 − i)
(11)
Therefore, MTTF with non-zero repair will be equal to
MTTFrepairable =
1
(n + 1)λ
+
n−1
k=0
µn−k
+ λ
n−k−1
i=0 µn−k−1−i
λi i
j=0(n + 1 + j)
λn−k+1 n
i=k(2n + 1 − i)
(12)
If there is no repair in the system, then from Eqs. 9 and 10
we can obtain the MTTF as:
MTTFnon-repairable =
n
k=0
1
(2n + 1 − k)λ
(13)
For example, for a TMR system (n = 1):
MTTFrepairable =
1
2λ
+
µ + 2λ
6λ2
=
5λ + µ
6λ2
MTTFnon-repairable =
1
3λ
+
1
2λ
=
5
6λ
(14)
III. AVAILABILITY AND STEADY STATE AVAILABILITY
(SSA)
If a system can tolerate brief failures and continue its
operation after it is repaired, availability is a useful measure
of its performance. Similar to the reliability computation of
NMR systems, there is no general solution for the availability
computation. However, as MTTF (with general solution) could
be used as an alternative to reliability, the SSA could be used
as the alternative to availability. In this section, we will seek
a general solution for SSA.
A(t) is the probability that the system is up at any point in
time, i.e.
A(t) =
Up time (t)
Up time (t) + Down time (t)
(15)
An important difference between reliability R(t) and availability
A(t) is their steady state behavior; as time approaches
)(tAtyAvailabili
SSA
tTime
1
Fig. 3. Availability function.
Zero failures
0s
t
1s 2s nstn  )12( tn )2( tn  )12(
tt
tn  )2(
t
1nstn  )1(
tn  )12(1 tn  )2(1  tn  ))12((1  tn  ))1((1 
One failure Two failures n failures n + 1 or more
failures
t
t 1
Fig. 4. A Markov availability model for a 2n + 1-level majority voting
system with repair.
inﬁnity, R(t) approaches zero while A(t) approaches some
ﬁxed value. This ﬁnal value of the availability function is SSA.
Fig. 3 shows this behavior for a system which is initially good.
A Markov availability model for an NMR system is given in
Fig. 4. In this model, after complete failure, the system could
be restored to operational status.
A set of time-domain differential equations can be written
for Markov availability model of Fig. 4. We will use the
Laplace transform theorems, assuming that the system is
initially good, to simplify the solution. SSA can be calculated
through A(s) when s → 0. Laplace transformed equations for
availability measurement will be as follows:
−(2n + 1)λPs0 (s) + µPs1 (s) = −1
(2n + 1)λPs0 (s) − (2nλ + µ)Ps1 (s) + µPs2 (s) = 0
(2n)λPs1 (s) − ((2n − 1)λ + µ)Ps2 (s) + µPs3 (s) = 0
· · ·
(n + 2)λPs(n−1)
(s) − ((n + 1)λ + µ)Psn (s) + µPs(n+1)
(s) = 0
(n + 1)λPsn (s) − µPs(n+1)
(s) = 0
(16)
This is a system of n + 2 linear equations. If this system
is represented in matrix multiplication form as APs(s) = b
where A is an n+2 by n+2 matrix with a nonzero determinant,
and the vector Ps(s) = (Ps0
(s), Ps1
(s), . . . , Ps(n+1)
(s))T
is
the column vector of the variables, then applying Cramer’s
rule will give:
Psk
(s) =
det(Ak)
det(A)
0 ≤ k ≤ n + 1 (17)
where Ak is the matrix formed by replacing the kth column
of A by the column vector b.
If Eq. 17 is applied to Laplace transformed form of Eq. 15,
we obtain:
A(s) =
det A0 + · · · + det An
det A0 + · · · + det A(n+1)
(18)
Hence the calculation of det(A) need not be done. The matrix
Ak(ij), whose rows and columns are 0 ≤ i ≤ n + 1 and
0 ≤ j ≤ n + 1 respectively, has the following general form
(obtained from Eqs. 16):
Ak(ij) =



−1, if i = 0, j = k.
−(2n + 1)λ, if i = 0, j(= k) = 0.
−µ, if i = n + 1, j(= k) = n + 1.
µ, if i = j − 1, 1 ≤ j(= k).
−((2n + 1 − j)λ + µ), if i = j, 1 ≤ j(= k) ≤ n.
(2n + 1 − j)λ, if i = j + 1, j(= k) ≤ n.
0, otherwise.
(19)
Once this matrix is formed, the SSA is computed as:
Asteady state =
n
k=0 det Ak(ij)
n+1
k=0 det Ak(ij)
0 ≤ i and j ≤ n + 1 (20)
For example, for a TMR system, we will have:
det A0(ij) =
−1 µ 0
0 −2λ − µ µ
0 2λ −µ
= −µ2
det A1(ij) =
−3λ −1 0
3λ 0 µ
0 0 −µ
= −3λµ
det A2(ij) =
−3λ µ −1
3λ −2λ − µ 0
0 2λ 0
= −6λ2
(21)
Therefore, SSA of TMR system will be as follows:
Asteady state =
µ2
+ 3λµ
µ2 + 3λµ + 6λ2
(22)
If a system is not repairable, its SSA will be zero, as Eq. 22
proves this for a TMR system when µ = 0.
IV. CONCLUSION
A simple calculation of reliability or availability is done by
means of Markov models and Laplace transforms. However,
if there are a large number of components in the system, or
system is repairable, inverse transform requires the nontrivial
partial-fraction-expansion calculation. Although reliability and
availability are obtained only if the inverse transform is done,
it is still possible to extract the partial information MTTF and
SSA from the transformed equations. MTTF and SSA can be
used as alternatives to reliability and availability when several
systems should be compared. The general solutions for MTTF
and SSA of NMR systems were presented in this article.
Although the computation of MTTF for an NMR system
is simpler than reliability, comparing MTTFs of two systems
should be done with some considerations. The relation between
MTTF and reliability is as follows
MTTF =
∞
0
R(t)dt (23)
For an NMR system containing 2n + 1 circuits, as n is
increased, the area under the curve of reliability function is
increased too but only within the region of primary interest,
0 < λt < 0.69 [1]. Outside this region, the area under
reliability curve is decreased. Since in Eq. 23 the integral is
taken from 0 to ∞, the MTTF comparisons are valid when
a system is signiﬁcantly superior than another. MTTF is also
used when the reliability functions compared have the same
shape. The MTTF will decide which system is superior in such
cases.
REFERENCES
[1] J. K. Knox-Seith, “A redundancy technique for improving the reliability
of digital systems,” Stanford Electronics Laboratories, Tech. Rep.
[2] W. H. Pierce, “Improving reliability of digital systems by redundancy and
adaptation,” diploma thesis, Stanford University.
[3] S. Iyanaga and Y. Kuwada, Encyclopedic Dictionary of Mathematics.
Cambridge, MA: Mit Press, 1980.