DESIGN OF A SIMPLE RELIABLE VOTER FOR MODULAR REDUNDANCY IMPLEMENTATIONS Moslem Amiri, Václav Přenosil Faculty of Informatics, Masaryk University Brno, Czech Republic, amiri@mail.muni.cz, prenosil@fi.muni.cz Abstract: This article deals with modeling and design of a new fault-tolerant voter circuit. Majority voted redundancy is increasingly implemented in fault-tolerant design today. A voter is used in these implementations to determine a possibly correct result through the majority vote. The reliability of the voter circuit should be much higher than that of the other circuit elements; otherwise it will wipe out the gains of the redundancy scheme. Since almost all the circuit elements are fabricated with the same technology, the voter circuit itself needs to be fault-tolerant. In this paper, we present a novel fault-tolerant voter circuit design with a simple structure, so that it can easily be used for N-modular redundancy implementations as well as for systems with more than a single bit output. Keywords: reliable voter model; triple modular redundancy; TMR; N-modular redundancy; NMR; fault-tolerant voter circuit design INTRODUCTION Any system which could function correctly while there exist some faults in it is called a fault tolerant system. Some reasons to build fault tolerant systems are harsh environments, novice users, high repairing costs, and large systems which should always be kept up. Adding redundant components or functions is the most common approach to acquiring fault tolerant systems. When designing a fault tolerant system, several features need to be evaluated and a trade-off among them is required. Some of these features are cost, weight, volume, and reliability. Reliability is the probability of no failure in a given operating period. Calculation of reliability is a necessary part of any fault tolerant design process. Several techniques are available for introducing redundancy and hence improving system reliability. Underlying all these techniques is providing parallel paths to allow the system to continue its operation even when one or more paths fail. The system is called a “parallel system” when all parallel components are powered up, and it is called a “standby system” when only the online component is powered up and the rest are powered down. Practically, in any parallel system, a circuitry, called coupler or switch, is needed to implement redundancy. Couplers reconfigure various parallel components of the system after a detected failure. Since the coupler is added in series to the parallel components, its reliability significantly affects the reliability of the whole system. In order to improve reliability of digital systems, a technique known as “voting redundancy” is used; a voter is put in series with the parallel digital components. The voter receives parallel bits from an odd number of digital components, and votes for the majority. If more than half of the digital elements work properly, the voter will decide correctly. This redundancy technique, called N-modular redundancy (NMR), alleviates the problems associated with couplers or switches in parallel or standby systems. 1. TRIPLE MODULAR REDUNDANCY AND CLASSICAL VOTER CIRCUIT The basic modular redundancy circuit is triple modular redundancy (TMR), as shown in Fig. 1. TMR consists of three parallel digital components (modules), all of which have equivalent logic and the same truth tables. The same input is fed to the three modules and a voter gives the majority as the system output. One usage of TMR is for the protection of combinational and sequential logic in reprogrammable logic devices, called Functional Triple Modular Redundancy (FTMR) [1]. If any two of the three modules in the TMR system work, assuming the voter does not fail, the system output will be correct. This equals the reliability of a two-out-of-three system. Thus, the reliability of a TMR system, TMRR , based on the reliability of a module, MR , is: 32 23)3:2()3:3()( MMTMR RRBBCBCABAPR   (1) where B is the binomial (Bernoulli) distribution. The assumption that the system fails when a majority of modules fail is pessimistic; there are instances where a majority of the modules fail but the network is nonfailed [2]. If we assume a constant failure-rate λ for each module and a perfect voter for the TMR system, then each module will have the reliability t M eR   , and the reliability of the network will be: tt TMR eeR  32 23    (2) In the development of (2), it is assumed that the voter circuit is an ideal one with reliability 1VR . However, practically, all the circuit elements including modules and the voter are fabricated with the same technology and therefore all have almost the same failure rate. Since the voter is placed in series with the modules, its reliability is a limiting factor on the reliability of the network: )23()23( 3232 ttttt VTMR eeeeeRR V     (3) The conventional TMR voter circuit is shown in Fig. 2. In design of this circuit, no provisions are made to make it 100% reliable. We need a voter circuit which itself can tolerate faults, and hence improve the reliability of the whole system. Fig. 1. Triple modular redundancy. Fig. 2. Circuit realization of a classical TMR voter. 2. DESIGN OF A FAULT-TOLERANT VOTER CIRCUIT Failure of the voter circuit results in failure of the TMR system, while failure of one module can be tolerated. Therefore, a fault-tolerant voter will improve the reliability of the system significantly. There are several fault-tolerant voter circuit designs in the literature (e.g., [3,4]), which generally suffer from the following problems:  Complexity: inclusion of components like priority encoders or multiplexers in the voter circuit reduces its reliability and makes the voting process time-consuming.  Inextensibility: a good design should make it possible to extend the voter circuit to be used with more modules easily.  Dependency on one component: while the purpose of introducing redundancy to the voter circuit is that the series-connected voter decreases the reliability of the TMR, the existing voter designs extend this problem to the inside of the voter circuit. In these designs, one final component like multiplexer, whose failure will result in the failure of the network, is needed to be put in series with the rest of the components to produce the final output of the TMR. In this paper we introduce a novel fault-tolerant voter circuit design that overcomes the above-mentioned problems. This voter circuit implementation uses wired logic, and is shown in Fig. 3(a). In order to observe the source of the defects that may occur in this circuit and hence make it unreliable, the transistor level of the open-drain gates used in wired logic is shown. Fig. 3. Novel fault-tolerant voter circuit: (a) circuit realization; (b) open drain CMOS NAND gate. Almost all the single fault types occurring in a gate include only one transistor in the gate, e.g., open p-MOS drain, grounded n-MOS gate, broken p-MOS gate bridged, and etc. If a gate is always supposed to have equal input values, then the series-connected transistors can be used in a simple reliable design, while the parallel-connected transistors, which directly connect the output to their sources, make the design unreliable. Therefore, an open-drain CMOS NAND gate shown in Fig. 3(b), which possesses only 2 series-connected n-channels, can be used in design of a simple fault-tolerant voter. As shown in Fig. 3(a), we use open-drain CMOS NAND gates to build a fault-tolerant voter circuit. This voter circuit votes for majority when it is fault-free; if the three inputs are low, all n-channel transistors are on and hence the wired output is low, and in case the three inputs are high, the n-channel transistors are off and the wired output is high. This voter circuit also works properly when one inverting gate and/or the two transistors connected to that inverting gate fail; when input values are low, there is at least one fault-free open-drain gate to ensure that the low value is provided for the wired output, and when input values are high, the second transistor (which is fault-free) will block any connection to ground. The structure of Fig. 3(a) can always tolerate the failure of one component. This structure will fail if any two series-connected transistors stop functioning (the failure of an inverting gate can be regarded as the failure of the two connected transistors.) Assuming the constant failure-rate of λ for every transistor, and using the first two terms of Maclaurin series expansion of z e about 0z (which results in the simplification, te t   1 ), the reliability of our voter circuit design in high-reliability region will be 1. The reliability of a classical voter circuit, shown in Fig. 2, is low. Pessimistically, the structure of a NAND gate, which is composed of two parallel-connected p-channel and two seriesconnected n-channel transistors, will malfunction if any of the p-channels or n-channels fail. Since all NAND gates should function properly for the voter circuit to be reliable, the reliability of this voter circuit after application of Maclaurin series simplification will be: 4 )( )41( tR classicV   (4) where failure rates of both n- and p-channels are assumed to be λ. Comparison of the classical voter reliability (RV(classic)) and our novel voter circuit reliability (RV(novel)) is illustrated in Fig. 4. Fig. 4. Reliability comparison of classic and novel voters. 3. FAULT-TOLERANT VOTER CIRCUIT FOR NMR Our novel voter circuit design can easily be expanded to be used in NMR systems where N is an odd integer ( 12 n ) greater than or equal 5. To this end, the number of open-drain NAND gates needed will be the combination of 1n out of 12 n , with each gate having 1n inputs. Such a structure can tolerate faults in n out of 1n series-connected transistors in an opendrain NAND gate. CONCLUSION The technique presented in this paper provides a reliable voter circuit design for TMR and NMR systems using wired-logic. This design is very simple requiring only twelve transistors and one pull-up resistor for a TMR voter. In addition, using wired logic, the dependence of the network on one gate in the voter to produce the final output is removed. One drawback of this method is that the rise time (transition from 0 to 1) will take longer than the fall time (transition from 1 to 0). For the highest possible speed, pull-up resistor should be as small as possible; the minimum resistance is determined by an open-drain output’s maximum sink current. However, this longer rise time is usually shorter than the clock period of the synchronous system in which the modular redundancy system is used. Another issue is the reliability of the pull-up resistor. But since the reliability of this resistor is usually higher than the other types of components in the system, this is not a matter of concern. Our novel voter circuit, however, has higher power consumption. LITERATURE [1] S. HABINC. Functional Triple Modular Redundancy (FTMR), Design and Assessment Report. Gaisler Research, FPGA-003-01, ver. 0.2, pp. 1-55, December 2002. [2] D.P. SIEWIOREK. Reliability Modeling of Compensating Module Failures in Majority Voted Redundancy. Computers, IEEE Transactions on, vol. C-24, no .5, pp. 525-533, May 1975. [3] R.V. KSHIRSAGAR, R.M. PATRIKAR. Design of a novel fault-tolerant voter circuit for TMR implementation to improve reliability in digital circuits. Microelectronics Reliability, vol. 49, iss. 12, pp. 1573-1577, December 2009. [4] T. BAN, L.A. DE BARROS NAVINER. A simple fault-tolerant digital voter circuit in TMR nanoarchitectures. NEWCAS Conference (NEWCAS), 2010 8th IEEE International, vol., no., pp. 269-272, June 2010. Acknowledgement The work presented in this paper has been supported under the research project SPECTRUM, No. TA01011383 by Technology Agency of the Czech Republic.